Private instance of webpagetest slows down after a couple days of saturation

Hi everybody. For a few years now I’ve been monitoring our private instance of webpagetest. (2.x, then 3.0, running on Windows) Every once in a while we’ll need enough runs that we keep the server busy for a couple days in a row.

I’ve noticed that when that happens the server itself slows down. It doesn’t seem to alter our test results, but the test runs themselves take longer and longer to complete.

In the past it was relatively rare that we’d run enough runs to saturate the server for that long but lately our run count has ramped up a bit and I’m seeing it more often. We also just transitioned to a private instance in AWS (latest version I think, running on Ubuntu 14.04.1 LTS) and had the same problem.

So my questions are: Has anyone run into this before? Is there guidance I should be following? Is there a log I should check that would show what’s going on?

Thanks in advance!

We’ve run into it, but we also do a lot more on top of what WPT does so it’s been hard to tell where the cause is. One thing that helped a lot is to set the polling rate of the agents. I think it defaults to 5s, but setting it to 10 helped a lot. I suspect that since all of the queues and test info is stored on disk, each time WPT needs to check for jobs is quite heavy, so reducing this helped us.

There are also instances where it fails to load these files from disk, so I’ve added a simple retry in these areas.

Also, since I don’t get to speak to others running a private instance where they increase the allowed number of runs higher than 9, have you run into any cases where an agent continues running already completed tests after it should be done? I submitted the issue here: Agents can get assigned all runs in sharded tests erroneously · Issue #1074 · WPO-Foundation/webpagetest · GitHub but it doesn’t seem like anyone has run into it.

Thank you! We’ll take a look at that. We haven’t hit your issue so far, but if we do I’ll comment on the issue thread.