API results too variable to be useful

Hello the group.

Firstly, thank you for making this resource, and acompanying forum, available.

I am new to performance testing of websites, and evaluating how useful webpagetest.org may be to use going forward. I’ve written a python script to call the WebPageTest API and get back results. Spent a while getting the script to work, and getting results back well enough to allow me to just run the script and show the SpeedIndex.

But the results I see make me wonder how useful WebPageTest could be at all. Running the script 7 times showed me SpeedIndex values of

1909
1935
1860
1675
1707
1318
1388
1388

These were all for the avergage speedindex over 3 runs, and using the value from the first view, with all other parameter just set to defaults. They are all from the same time, and within seconds of each other (time between tests is the time it took to hit and ).

From those numbers it seemed that 3 runs was leaving too much to chance, so I upped the test to average over 9 runs instead. And now the results for speed index were

2144
2031
1846
1328
1751

And then I ran over the daily limit for number of tests.

I’m at a loss as to how to interpret the fact that the fastest Speed Index shows only ~60% of the time for the slowest. That’s a helluva variation for no change to the website.

There’s a general downward trend in the results, but not monotonic. Is this expected?

Is the Speed Index not a reliable stat - should I be relying on TTFB, or some other stat instead?

Is there variation being introduced by leaving “all other parameter just set to defaults” (i.e. unspecified). Do I need to specifify location / browser / etc exactly for each test? And if so, which params need to be set to reduce this apparent variability?

Thanks again for the resource, and in advance for any advice you can offer this newb.

Do you have links to the test results? Variability usually comes from the site itself having variable performance. If the server response time varies or if there are ads that serve different content you will see swings like that. It’s hard to say without seeing what actually varied between the tests.

FWIW, with a static page served from Google infrastructure we have been able to get results within a few ms across runs so stability is possible but it’s not usually THAT stable.

No ads, very little dependence on 3rd party anything at all really.

And I was testing against our dev server on a Sunday night, so very little else happening on the server.

I was hoping you’d be able to claim along those lines :slight_smile:

How are you aggregating the results from each run? I clicked on a bunch of them and didn’t see any 1300-range Speed Indexes for the first-view tests. Any chance you are mixing and matching first/repeat view? Also, make sure to throw out any 0 values. It looks like 2 of the VM’s had a display issue and the video capture was blank, returning 0’s for the visual metrics.

The one case where I saw a few long outliers was runs 7 and 8 from this one: http://www.webpagetest.org/result/170213_0Y_GY/ and in that case it looks like the JS took longer than usual to download. Could have been a lost packet sowehere in the path or something else but for the most part I was seeing results in the 1500-1700 range.

A couple of things to be aware of with the visual metrics:

  • The video capture on desktop is run at 10fps so there is ±100ms (it is 60fps on the mobile devices, ±16ms)
  • The display itself runs vsync at 60Hz so no matter what you do there will always be ±16ms
  • The actual drawing can have a multi-modal effect if there are scripts that need to run after the initial render (async, in the body, etc). This is because the browser can sometimes paint before running the script but if it happens to get delayed by vsync it will run the script before painting which can lead to variability of the length of the script execution.

Generally, look for consistency in the non-visual metrics first. First Byte, Dom Content Loaded and Load time (in that order as more dependencies are added at each step). If you can get those to be within the consistency levels that you are looking for then move on to the visual metrics. Start with “Start Render” first as that is usually the one that shows vsync-tied variability and will also usually have the biggest impact on Speed Index.

My apologies, that URL was too general, my query arose only from a series of results last night, starting at 23:28 to 00:14 in this list: WebPageTest - Test History

I shall indeed start with those.

Also - thanks for the comprehensive nature of the API. I had thought last night that I had just thrown away the results by not saving as they were happening, but got the test ids back from the testlog.php listing, and then all the individual results back as full json files.

I shall spend some time pulling out time-series of the non-visual metrics, see what they tell me.