How to use WPT to evaluate builds when times vary on same code?


I hope this in the right forum. Apologies if not…

I’m hoping you might be able to help us…as we are becoming loyal WPT servants we are trying to use it in our QA process to ensure every single build is faster than the previous one.

However what we are finding is that we are getting such different results at different times on the same code that we cant rely on the comparisons against the new code. Is it really faster/slower?

We are wondering what to do? should we run our own instance of WPT? should we be using this differently? how do others ensure that every version of their code is faster than the previous one?

We use the filmstrip and record the average time that the page shows the main content:

Here’s an example:

Old 1: - average 1.86
New 1: - average 2.17

Old 2: - average 2.47
New 2: - average 1.88

Old 1 and Old 2 are identical code at different times, but with dramatically different results.

All advice very gratefully received :slight_smile:

Looks like there are a couple of things going on:

1 - You have 3rd-party content on the page (analytics, twitter, facebook, ads). You will want to block all of those or stub them out. Granted, most of them load after onload so if you focus on the load time and not fully loaded it won’t be as big of an issue.

2 - Dynatrace adds quite a bit of overhead and I wouldn’t be surprised if it added some variability. You should try the same tests on the regular Dulles agents.

3 - Make sure you only look at “successful” tests (result of 0 or 99999). Some of the load time outliers were error runs.

4 - The page seems to behave quite differently with timing variations (sometimes the after-onload requests look like they are getting loaded before onload). If your lazy-load/async logic uses a timeout it could be introducing variability when it fires before onload.

5 - The TTFB seems to vary by ~100ms. Depending on how tight you’re looking to get that could be a problem.

If you click the “plot full results” link below the data table you can see all of the metrics for all of the runs to see how they line up. You want to have ALL of the metrics consistent (requests and bytes are absolutely critical for starters).

A private instance will give you a little more control over things like the DNS time (you can hard-code the hosts file to eliminate it) and network variabilities but you need the page to be absolutely consistent between tests first.