Dumb question, but on the server AWS AMI is there a CRON job set up for regular archiving? My crawls of the user crontabs and cron.php + dependencies suggests not… just want to check (having no confidence in my Linux abilities!).
Thanks
Dumb question, but on the server AWS AMI is there a CRON job set up for regular archiving? My crawls of the user crontabs and cron.php + dependencies suggests not… just want to check (having no confidence in my Linux abilities!).
Thanks
No. There is a crontab that kicks off the auto-scaling cron scheduling but nothing for the archiving. I’ve been planning on tying it into the existing cron processing so it will just start working automatically.
Are you just using the normal cron.php with the archive settings defined in settings.ini? If so I can go ahead and push an update today that kicks it off automatically.
Right now, I’ve been kicking it off manually via cli/archive.php.
Before an update to include an archive cron job in the AMI, any chance you could look at these two issues:
https://github.com/WPO-Foundation/webpagetest/issues/395
https://github.com/WPO-Foundation/webpagetest/issues/394
Without those being fixed, an automated archive job may result in results being lost.
Good timing - I just fixed both of those. You mind giving it a try and seeing if it looks ok now before I tie it in to cron?
Something funky going on - tests are being archived to S3 automatically - as soon as the test completes, the ZIP file appears in S3, but the test files remain on the WPT server. No entries appear in cli/archive.log.
Running cli/archive.php manually, and these tests are ignored (i.e. don’t appear in the log at all).
Not yet sure if this is down to my local set-up or the archiving changes.
archive_days is set to 30.
I have the same behaviour as Kevin.
There’s the cron job that call’s getwork, and I’m pretty sure that somewhere in this the archive process is being triggered - though I haven’t worked out quite which one of the possibilities it is.
I wonder if it’s in the validating of the archived test. It won’t delete the test if it can’t validate that the archived version is valid and something funky may be going on there.
work/workdone.php calls work/postprocess.php, which in turn calls ArchiveTest(id, false) in archive.inc. That second parameter is crucial - it means the test info isn’t saved into testinfo.json. This, I presume, is why these tests are ignored by cli/archive.php - it must use the testinfo.json objects for identifying tests.
So, there are two outcomes of this:
The consequence is that eventually the WPT server will run out of disk space, even with archiving switched on.
I’m wondering whether postprocess.php should call ArchiveTest(id, true), or indeed not call ArchiveTest() at all and leave it up to an archive cron job?
My plan is to have the regular internal cron processing (triggered every 5, 10 and 60 minutes from agents polling getwork.php and from a cron job on the AMI) to trigger cli/archive.php to run once we’re confident that it actually works correctly (or the equivalent code factored out into a common location).
btw, archive.php not deleting the tests is the thing I’m concerned about and need to look into. It will only delete the test if it verifies that the archive is valid: webpagetest/archive.php at master · WPO-Foundation/webpagetest · GitHub
I’m concerned that the validation check isn’t working correctly for S3 private buckets. The code looks reasonable but I want to do some testing and maybe move the archive code to the new S3 libraries before I’m sure: webpagetest/archive.inc at master · WPO-Foundation/webpagetest · GitHub
Does cli/archive.php job runs in hourly cron job ?
or we need to trigger it whenever we need to push the results to S3
Is there a way to push the test results to S3 as soon as test completes ?.
The results should get pushed to S3 as soon as a test completes but you need to schedule archive.php to run periodically as well to clean up local tests and to push any tests that did not complete successfully.
Thanks for your input,
We have added the following s3 settings in settings.ini file
archive_s3_server=s3.amazonaws.com
archive_s3_key=AWS_key
archive_s3_secret=AWS_secret
archive_s3_bucket=
archive_s3_url=https://.s3.amazonaws.com/
archive_days=0
and we have (from 3 days ) tests completed successfully but not pushed to s3,
If we run cli/archive.php our results will get pushed to s3.
is it because of archive_days=0 ?
is there anything that we might have missed ?