Restoring archives from S3

I have built up a private instance using CloudFormation, with archiving to S3.

This is working well, but I want to be able to blow away the EC2 Server instance, recreate and restore from the archive.

Is there functionality built into WebPageTest to do this? I can’t find much about this and the zip looks simply like the contents of individual tests from /results/ (without recursive dirs)

Any help greatly appreciated, at the moment I have hacked around it by using an EBS volume as a www mountpoint.

Thanks!

If the tests are archived to s3 you should be able to bring up a clean new instance (with nothing in results/) that has the same s3 archive config and it will automatically restore any tests as they are accessed.

Hrm. Sounds straight-forward, yet how are they ‘accessed’. I browse to results-history and there is nothing in the list after recreating the instance with the same S3 settings. New .zips appear in the bucket when I run a test.

I was wondering if it had anything to do with trying to do a dir style listing over https. Testing with links I can retrieve a zip, but get access denied when trying to load the base URL. I think however that this is a feature of s3 and you would simply be making API calls to s3 yeah?

My settings.ini looks like:

[Settings]
product=WebPagetest
contact=<my_email>
optLinks=0
maxruns=10
countTests=1
allowPrivate=0
enableVideo=0
archive_s3_server=s3-ap-southeast-2.amazonaws.com
archive_s3_key=<access_key>
archive_s3_secret=<sec_key>
archive_s3_bucket=<bucket_name>
archive_s3_url=https://s3-ap-southeast-2.amazonaws.com/

In the meantime I will go off and browse the source to try to work this out. Thanks :wink:
[hr]
Additionally:

I have also tried writing the URL to http://webpagetest:8888/result/130812_PT_1/ where 130812_PT_1.zip exists on the s3 bucket and can be retreived using wget, which brings me to a ‘test not found’ page.

I am bringing down version 2.12.
[hr]
OK, I just looked at the source and found that archive_s3_url should be the base url, and does not include the bucket name. Corrected this and now results are displayed upon requesting them in the url.

This leaves the desire to still have these listed in the results history. My work around at the moment is to mount logs/ on an EBS volume. Any suggestions of a nicer way much appreciated.

Sorry for the delay. You are correct, the logs don’t get archived right now. I’ve generally considered the logs to be lossy but I’ll see if I can think about any way to get them archiving cleanly as well.

To archive logs to s3 I ended up basically copying some of your code and hacking it into archive.inc at line 94:

$targetDate = new DateTime(‘now’, new DateTimeZone(‘GMT’));
$logfileName = ‘logs/’ . $targetDate->format(“Ymd”) . ‘.log’;
$s3->putObject($s3->inputFile($logfileName, false), $bucket, $logfileName, S3::ACL_PUBLIC_READ, $metaHeaders, $requestHeaders);

I am actually using Python’s boto to drive, which takes care of restoring the logs with cloudformation. I’ll hopefully have this up on github in coming days.

Did you have time to add something for archiving the logs to S3? Would love that functionality :slight_smile: