New Windows AMIs Changing Performance

Hey Pat -

We noticed a pretty significant performance change (some urls 2x slower, a few urls a little faster) when we updated to the latest version, and I was able to trace it back to using the “New Windows AMIs”, from this commit: Switched the Windows AMI's to the new agent · WPO-Foundation/webpagetest@26e3cf0 · GitHub

Can you elaborate a little more on how these new AMI’s are different? I know the previous AMIs were 2 over years old, but they also updated themselves upon startup (using autoscaling), so I didn’t anticipate a big difference.

If it helps, here’s a result on old AMI, and here’s one on new AMI - it appears the CPU is working much harder with the latest stuff (using c4.large in both cases)

(I did see the thread about Chrome changing in mid-2018, and I’m assuming this is what’s going on here? Just wanted to double check that significant performance change is expected as well)

Thanks so much for your time.

We’ve seen a similar performance hit. In my experience I’ve seen it more heavily affect Firefox, but Chrome, IE 11 and Edge all show significant slowdowns compared to the older instances.

I’ve even tested with a manual install of wptagent on Windows 10 without the secondary RDP session, but it didn’t seem to make any difference even on larger servers (c5.xlarge). At this point, we’re going to have to fall back to the old wptdriver until there is better performance parity between wptdriver and wptagent.

So glad I’m not the only one, thanks for commenting. And thanks for testing it out on larger instances - I was about to try that next.

I may be reverting as well - although I know that’s not a long term solution due to Chrome’s impending security updates. Hopefully one of the developers behind the new agent can weigh in soon.

One other piece of evidence is that our AWS hourly usage for agents is up about 60%, showing it’s taking longer time in general for all of our tests to run.

Here’s a graph showing the trend. Here is doesn’t look like it’s that much higher but this graph includes some other always-on servers too.

I don’t have a good answer for why the CPU usage is higher but it is probably a combination of running over rdp to localhost and the switch to ffmpeg for video capture. Neither of which can really be avoided. wptdriver isn’t a viable fallback because it already doesn’t get visibility into HTTPS and soon it won’t work at all with Chrome.

If at all possible, I HIGHLY recommend using the Linux AMI’s for Chrome and Firefox and only use Windows if you REALLY need IE testing. The performance is much better on c4.large but even more importantly, the spot pricing is around 1/5 the cost of Windows instances.

Thanks for the response, Pat.

I have purposely used Windows in the past because I wanted the performance to be representative of what most people browse on, but if Linux will show more stable performance then I’ll give them a try. Will report back in a couple days with an update.

As far as taking longer to process, that is expected (independently of higher CPU use during testing). The new agent needs to post-process the videos and trace logs while the old agent could observe everything directly so there’s a fair bit of post-processing the new agent needs to do that the previous didn’t.

As far as Linux goes, I could actually switch the Chrome and Firefox in the default config to Linux so it would be transparent but I wanted people to be able to opt-in to Linux.

FWIW, the public WebPageTest locations mostly use the linux agent (except for the Thinkpads) and so does Speedcurve so it gets quite a bit of testing and eyballs inspecting it.

We do need IE and would like to support Edge but right now the priority is performance metric parity between the old wptdriver and the new wptagent instances in Firefox on Windows.

In our side-by-side comparisons, wptagent reports our website is 2x slower than wptdriver does in both Firefox and Chrome on c5.large instances. In many cases, this is partially due to SSL Negotiation taking longer. CPU utilization of the Firefox process itself spikes and hangs the server at 100% for a lot longer than than Chrome does on Windows, but both peg the CPU at 100% while the test is running.

For what it’s worth, I built a Win10 image using WPTAgent without the extra RDP session, and I did not see any performance difference in the results so I don’t think it’s related to the RDP session.

We’re going to test the Linux agents with Chrome and FF next. Will report back on that but ultimately if we can do all the tests on Windows and get IE and Edge support while we’re at it, that would be preferable for us.

I was searching around the documentation and couldn’t find any info on configuring WPT’s EC2 support for Spot, though I did see the ec2.sample.ini file had a few entries related to it. Am I missing the config doc somewhere?

Update - I switched to the Linux AMIs about a week ago, and I’m very happy with them. Thanks for the recommendation, Pat! Sorry this doesn’t help you doc31.

I posted some before and after data if you’re interested in the details of Windows vs. Linux AMI performance.