queue-array-representation broken

Hi,
the file-representation of the queue of one of my agents (named “.tmp/[hash].queue”) was an empty file. The effect was an error message “Sorry, that test location already has too many tests pending. Pleasy try again later.” (from runtest.php) every time a test for respective agent was run.
I just deleted the hash-file representing the queue and everything works again (file was written newly).

Any experience with that? Any idea which process could clear the file-representation of queue?

Maybe line 1586 in function LoadJobQueue (common.inc) if(is_file($queueFile)) should proof whether file is empty, too:

if(is_file($queueFile) && !(file_get_contents($queueFile) == ''))

Regards Nils

The inodes on harddisk of the server run dry last week and wpt-server/-monitor stood. So maybe a try to run a test on respective agent didn’t finishes successfully.

BTW:
If somebody is planning to monitor webperformance with wpt and wants to hold ready results for a longer period: You should think about archiving older results and prepare your filesystem appropriately.
With jpg’s taken for videos and screenshots on a host controlling and saving results of three busy wpt-agents we have an amount of more than 40.000 files per day.

fwiw, there is a php script in the SVN repository that can archive tests (and they will be automatically restored when accessed): http://webpagetest.googlecode.com/svn/trunk/www/webpagetest/cli/archive.php

By default it archives anything that hasn’t been touched in the last 3 days. Each test will be saved to a single zip file but it needs to be written somewhere reachable from the filesystem (can be a remote mount point though). If you set it up in Cron it can mostly take care of itself (WebPagetest keeps recent tests on SSD but archives them off to ^TB worth of magnetic disks after 3 days and it works quite well.

Since each test gets archived as a single zip it also reduces the inode overhead significantly (it also only keeps the video for the median run instead of all of the runs).

Hi Patrick - first post here, so thank you for such an excellent tool.

Server is getting pretty full, so very interested in getting the archive script running. Any chance that you’d be able to offer a brief run down of how to run this?

There are 2 basic ways you can archive - you either need a mountpoint on the filesystem that points to your archive storage or you can archive to S3 or Google Storage.

Assuming you are archiving to a mount point you add something like this to your settings.ini on the server:

archive_dir=/data/archive/

Then you can just manually run an archive from a console:
cd [wptdir]/cli
php archive.php

It will zip each test directory to a single file, store it to the archive directory and if it was successful and the test hasn’t been accessed by a user in at least 2 days it will delete the test directory. When users request tests that have been archived it will automatically restore them.

Once you get comfortable with it you can schedule it in cron and just make sure to keep an eye on the disk utilization for your archive.

If you want to archive to S3 or Google Storage it’s similar but there are more settings in the settings.ini that you will need to specify but the general process is the same. I haven’t tested it recently with S3 but we use the Google Storage option for another system and it works great (and it uses the same code as the S3 archive so it should work ok).

Cheers Patrick, that worked like a dream!

I’m now migrating the archive to Amazon S3 (it seems far more future proof)- and I’m having a few issues… Hopefully someone else has experienced this and will be able to get me back on track!

The Archive process appears to be running fine, and is saving files to S3. However, it would appear that the files are not being split down to the directory as they were when saved to a mount point - they are simply added to the top level of the bucket. For example:

Instead of the tests being saved as
/12/08/22/ZZ/3fc07abb98122344b87243ce7abf0bad.zip

The tests are saved as
/120822_ZZ_3fc07abb98122344b87243ce7abf0bad.zip.zip

I’ve checked the S3 access logs, and the following requests show as a 301 redirect:
GET /mywptbucket/mywptbucket/120822_YW_37255cb2375ef390621327c6901ef8ce.zip.zip HTTP/1.0

My settings.ini file looks like this:

;directory to archive test files (must include trailing slash)
;archive_dir=/data/archive/

; archiving to s3 (using the s3 protocol, not necessarily just s3)
archive_s3_server=s3.amazonaws.com
archive_s3_key=*****************
archive_s3_secret=************************
archive_s3_bucket=mywptbucket
archive_s3_url=http://s3.amazonaws.com/mywptbucket/

=======

Update:

I’ve had some success. I am now able to get to the archived tests stored since switching to S3 (but not the tests I have migrated from the mount point archive.)

To achieve this I did the following steps.

Changing line 173 in archive.inc from:
$url = trim($settings[‘archive_s3_url’]) . “$bucket/$file.zip”;
to:
$url = trim($settings[‘archive_s3_url’]) . “$file.zip”;

Changing the archive_s3_url in settings.ini to:
archive_s3_url=http://mywptbucket.s3.amazonaws.com/

=======

Update 2:

Doesn’t look as though the files are deleted from archive when access, or deleted from server when archived…

S3 is a flat structure which is why all of the files are right at the root of the bucket. It is designed that way and should be able to handle billions of objects in a bucket so don’t worry about that part.

How did you migrate the tests from the mount point archive? Things that come to mind:

1 - Make sure the file name structure is the same as it is in S3
2 - Make sure the assets in S3 are marked as public (this may be the issue)

I don’t use the API keys when downloading from S3 to restore tests so the files need to be ACL’d for public access.

Thanks Patrick.

I suppose the bit that was confusing me most was the difference in how the files are stored at the mount point compared to S3, ie:

Mount point, date is split into folders:
12/05/18/0Z/7048f08c236e38141dab9be478d3d366.zip

S3, date is part of file name :
120518_0Z_7048f08c236e38141dab9be478d3d366.zip.zip

I have been wondering how to take the old archive folder structure and concatenate the folder names into each file name - however, I think that I may take the less elegant approach of manually unzipping the files and restoring to WPT, then rerunning the archive process… not pretty, but it should work!

The folders in the mount point are ////.zip

On S3 it is collapsed to .zip

If you look at archive.php, the same logic that walks therough the results/ directory and builds up the test ID’s should be able to be applied to the archive directory since the structure is the same.

Thanks again Patrick, was the bit I was struggling to articulate clearly!

I’m not the best coder in the world - so a fix that avoids tinkering with any of the core files was certainly my preferred choice… a quick hunt online has revealed this:

Looks like an amazingly useful bit of software for some very niche tasks (such as this!)

Cool. let me know if it doesn’t work and I can whip up a quick php script that you could use from the cli to rename & move the files to a single directory (suitable for uploading to S3). It’s just a tweak to the existing code so it shouldn’t take too long to throw together. You’ll want to make sure you have a filesystem that can handle more than 32k files in a single directory though (depending on how many tests you have). ext3 throws up but ext4 can handle it ok.