Results of bulk test analysis

I’ve been promising to do it for a while so i finally sat down and crunched some numbers by looking at all of the tests that have been run on WebPagetest over the last year.

There have been a little over 62,000 tests run over the last year. Of those, there were 24,763 unique urls that were tested using the default Dulles DSL location so I picked that as the sample set. For any given url, I used the most recent successful test and only looked at “first view” results.

At a high-level, here are the average statistics across the whole sample set:

Load Time: 10.1 seconds
Time to First Byte: 1.1 seconds
Time to Start Render: 3.8 seconds

Page Size: 510 KB
Number of Requests: 50
Number of Redirects: 1

I was a little shocked by how long the average Start Render time was as well as how big the average page was. Things really start to get interesting when you look at the distribution of the results. For all of the charts below (also available in the excel workbook at the bottom of the post) what is being graphed is a cumulative distribution of the results. I’ll walk through reading the graph for the first one but then mostly just talk about the results themselves.

[size=x-large]Load Time[/size]
Here is a cumulative distribution plot of the load times across all of the tests. As the times get larger, the percentile is the % of tests that loaded in under that amount of time. To see where you land against the other sites, take your load time and find it on the x-axis, then go up and see where you cross the line. You are slower than that % of web sites. WebPagetest has a 2 minute timeout so anything longer than that would have been an error.

[align=center][/align]

I was a little surprised at how long pages take to load, given that this is over a reasonably fast DSL connection (1.5Mbps). A full 35% of the pages took more than 10 seconds to load with only 33% coming in under 5 seconds. For reference, Yahoo’s portal (which is a fairly rich portal with lots of content) loads in under 3 seconds.

[size=x-large]Time to First Byte[/size]
The “Time to First Byte” measurement measures how long from when the user tries to access your site until the first bit of a response is sent by your server. This includes the DNS lookup, Socket connection and the time for the server to start responding with the base page. There is roughly 50ms of round-trip time latency on the DSL connection so there is a baseline of 150ms just for the 3 round trips, anything above that is fodder for improving.

[align=center][/align]

Here things actually look reasonably good. 76% of the sites start to respond in under 500ms which is pretty good considering sites from around the world have been tested (and the further from Dulles you are the longer each round trip will take). The 4% of sites that take over 4 seconds definitely have some back-end work to do though.

[size=x-large]Start Render[/size]
The “Start Render” measurement is a pretty interesting one and one unique to Pagetest as best as I can tell. This measures the time at which something can actually be displayed on the screen. It doesn’t guarantee that the page is displaying anything interesting (could just be a banner ad spot) but up until this point it is guaranteed that the user is staring at a blank page.

[align=center][/align]

There is quite a bit of room for improvement here. A full 60% of the sites tested take over 2 seconds to display ANYTHING to the user with 20% of the sites taking over 5 seconds. That’s a loooooong time to be staring at a blank screen.

[size=x-large]Page Size[/size]
The “Page Size” measurement keeps track of the number of bytes downloaded to load the web site being tested.

[align=center][/align]

Definitely eye-opening that 30% of the sites were over 500KB. That’s freaking HUGE!

[size=x-large]Number of Requests[/size]
Here we count the number of requests required to load the page.

[align=center][/align]

This is probably the single most important reason for the load times being as high as they are. 33% of the sites require over 50 requests with 12% over 100.

[size=x-large]Number of Redirects[/size]
This counts the number of redirects that occurred during the page load (not necessarily just for the base page).

[align=center][/align]

It’s not as interesting as some of the other metrics but I wanted to see what the distribution looked like. It’s good to see that 66% of the sites have no redirects at all and another 14% only have 1. That last 2% with > 8 redirects could probably use some work though.

[size=xx-large][align=center]Optimization Checks[/align][/size]
Here we look more at the why’s behind the page performance and see how well optimized the sites are in aggregate. I didn’t pull stats for ALL of the optimization checks but I got the most important ones. For these graphs you want to be as far to the right as possible.

[size=x-large]Enable Persistent Connections (Keep-Alive’s)[/size]
This checks to make sure keep-alive’s are enabled for any connections that would benefit from it (if there is only a single request to a domain you will not be penalized for that request since it doesn’t need keep-alive’s). The total score is the % of requests that are properly using keep-alive’s (so a score of 80 would mean that 20% of the requests on your page need to have keep-alive’s turned on).

[align=center][/align]

50% of the sites are fully benefiting from keep-alive’s. 80% of the sites are doing pretty good with only 20% room for improvement on them. This is by far the easiest thing to fix so there is no reason that everybody shouldn’t be at 100%.

[size=x-large]GZip Text Content[/size]
The GZip compression test looks at all text (non-xml) requests and makes sure they are gzip compressed. If they are not already compressed it will compress them and the savings in bytes are used to calculate the score. This way the more compression that you could be getting (and for larger objects) the lower your score. If you could save 60% of the bytes for all of the text resources by compressing your score would be 40.

[align=center][/align]

Pretty interesting (and smooth) distribution. 22% of the sites are properly using gzip compression, however 50% of the sites could save over 50% of their text bytes by enabling gzip compression.

[size=x-large]Properly Compress Images[/size]
The other size of the compression coin is image compression. The most critical here is usually picking appropriate quality levels for jpeg compression but smushing your png’s and gifs is helpful as well.

[align=center][/align]

Once again, 22% of the sites are properly compressing all of their images. The fall-off here is pretty rapid with only 15% of the sites needing to re-compress 50% of their images. The bytes savings by properly compressing images can be pretty significant though so it is worth the effort.

[size=x-large]Combine Multiple CSS and JS Files[/size]
This is probably the single most important optimization that can improve your start render times. Here we look at all of the js and css files that come before the start render and if there are more than one of each type we start deducting points. 5 points are deducted for each extra css file and 10 points for each js file (since js blocks other content the impact is larger). The goal is to get down to at most one of each.

[align=center][/align]

41% of the sites do a good job here. The other 59% could get some real benefit to their user experience and for 20-30% of the sites the benefit could be quite substantial.

[size=x-large]Enable Browser Caching of Static Content[/size]
Here we see how well sites are using Expires or cache-control headers to allow browsers to cache static content and not make wasteful if-modified-since requests. Improvements here don’t really help first-time visitors but they can be very substantial for repeat visits (upwards of 90% improvement in load time is not unusual).

[align=center][/align]

LOTS of room for improvement here. 25% of the sites are completely broken and don’t cache anything that they could. 80-90% of the sites could gain significantly and only 6% of sites get a perfect score.

[size=x-large]Use a CDN for Static Content[/size]
This is one of the more controversial checks but I thought I’d include it anyway. A Content Distribution Network (CDN) may not make sense for smaller sites and if you do everything else right the gains may not be as significant (though if you can’t do some of the other optimizations it will help hide those sins).

[align=center][/align]

Not really surprising - 56% of sites don’t use a CDN for their static content at all. I expect most of the remaining sites until you get up into the 90% range only get CDN credit because they use adsense or something else external served by aa CDN. 5% of sites tested make full use of a CDN.

[size=xx-large][align=center]Raw Data[/align][/size]
I am also making the raw data (urls anonymized) available for download. I’m sure there are a lot more interesting ways to look at the data (correlation of performance to optimization, etc):

CSV (just the raw test data): [attachment=1]

Excel Workbook with all of the charts and data: [attachment=2]

Hi Patrick,

first i’d like to say thank you for putting the tool on sourceforge and making it accessible. I’m a professional web tester and this is a one of a kind tool, i’m already thinking of enhancements which is a good indicator of how valuably i rate this tool…

With regards to your number crunching, how did you go about that? I’ve run 200+ tests and i’d like to crunch the numbers too. I was at the point of changing the code to use DBI and re run the tests so i could get the figures into a mysql DB (that would be enhancement No1), is that what you did? Or have you parsed all the results files into one big data set using perl / php?

Many thanks

Stuart.

There is an XML interface into several of the web site functions, including the test history that makes it easy to retrieve bulk results (I’ll firm up the docs a bit and get something posted today). I basically wrote a quick app that queried the test history and then went through every test for the last year and retrieved the results for each test in CSV format. At that point you can database it if you want, I just put it into a giant CSV and crunched it with Excel.

If you’re working completely locally you can automate testing with URLBlast and part of what you get as output is a tab-delimited file for the test results. If you’re interested in doing it this way let me know and I can write up some docs on the various ways to use URLBlast. It is used to drive the testing for the hosted version but it can also run in several automated modes where it walks through lists of urls for testing or even do a full site crawl.

Ahh yes i had seen the xml interface, but not looked into it further, i wondered if it formed part of a API for a testing GRID. Yes i would be interested. Would you make your “quick app” available?

Ah yes, there are a couple of options in urlblast i didn’t understand, i was going to post separately about that. I haven got a copy of visual studio on this machine, so i haven’t dug into the code to see what it does.

being able to walk through list of URLs would be a great help right now. I’ve been using wget to automate test requests.

BTW do you need help with this project? I always like to give a little back to the community.

regards,

Stuart.

So i’ve figured out how to do a crawl (thats pretty neat btw), but i’m a little unsure about having the crawler and pagetest urlblast on the same machine.

A pagetest urlblaster wont crawl, will a crawler pagetest? I guess what i’m asking is can the one node do two jobs?

Cheers,

Stuart.

sorry Patrick more questions, i couldn’t find the answer to.

In the logfile from the crawler, what do the columns represent?

Cheers,

Stuart.

Which file? The crawler should output a _IEWTR and _IEWPG file. The PG file is page-level stats and the TR file has all of the requests (the two can be matched up with the GUID).

Full documentation on the fields is available here: http://pagetest.wiki.sourceforge.net/Automation in the section on “Results files”

I’ve done a fair amount of work so that URLBlast would multitask and be able to do pagetest runs in the middle of crawling and running url lists but to be honest, I haven’t tested it much since we have dedicated systems for each function. The crawling functionality is also not terribly mature since I tend to run them on-demand but let me know if you are seeing any issues or would like to see it behave differently.

Doh! i thought i had search all over. I figured out the proxy stuff, i edited the registry for user1 and user2 and it works great. I had already taken a sneaky peak at the AOL keys, so the documentation has helped me understand that further, thank you.

Re crawling and pagetesting, in the config we have

; Type of test (cached, uncached or both) - setting is global for all threads
; 0 = Both (clears cache, loads web page then loads it again
; 1 = uncached (clears the cache before every page)
; 4 = pagetest-only mode
; 6 = Crawler mode

If i set it to 4, urlblast does not pick up the url file, but will pick up work from the work url. If i set it to 6, it picks up the url file.
Hence my question about can it do both? I was further confused by the “Crawler Config”, what should be in that?

Once again, many thanks,

Stuart.

Sorry, the back-end logic got re-done since those comments were put into the ini file and the logic got a little bit more complex…

Regardless of what mode it is configured for, it will ALWAYS be able to do pagetest testing. It will not do any crawling unless it is set to mode 6 but if it is it will still pick up pagetest runs in between crawler urls as pagetest tests come in. Modes 0 and 1 are for working with an url list (text file with one url per line) and will run through the list testing urls (but will also still take pagetest tests as they come in).

The plan was to be able to leverage the systems that are doing continuous monitoring/testing to also pick up and one-off pagetest requests as they come in so we don’t have to dedicate test machines (since the load for doing one-off tests is usually very low).

Ah excellent, thats what i was hoping you would say!

Is it possible to get urlblast to take a screen shot in crawling mode when there isnt an error and its only checking performance?

I ask because i’m getting lines in my logs where it appears some websites have a:
load time of 0
a bytes in of 575
and a document complete time of 153

I suspect what ive actually got is an apache 500 or similar, but without a screenshot i don’t know what the browser saw.

BTW what do you guys use to munge the log files? I naively tried excel, i wont do that again!

Cheers,

Stuart.

If you got a 500 or something like that it will show up as the result code for the page (which is normally 0 for success or 99999 for a content error) and a screen shot could be taken because it was an error (I assume you figured out how to grab screen shots on errors). If the page returned a 200 but gracefully failed then it wouldn’t.

Right now pagetest can only either grab screen shots on errors or do a full dump of screen shots and graphics (used for one-off testing). Wouldn’t be hard to add the ability to grab a screen shot for every page but be warned that the storage requirements could get out of hand pretty quickly if you’re crawling a complicated site.

Are you talking about the page-level or request-level data? For bulk processing of the request-level data I wrote an app that parses the log files and splits the results out by domain so we could see what all of the broken requests were for a given domain, regardless of property. Most of the bulk analysis has been targeted at looking for specific things (broken content, missing gzip, etc) so it wasn’t too hard to throw together a script that could just look for all incidences.

For non-crawled testing we database the results and have a front-end for plotting the results and doing drill-downs. That’s a fairly large and complex system though.

Hi Patrick, many thanks for your quick response.

I am loving urlblast.

Thank you for telling me what result code 99999 was, i had worked out the others from the php code, but noticed you read in records that were 0 or 999999 so i was a little confused.

Re screenshots: I think the problem i have is that for certain URLs in my URL file, the application is throwing an error server side, and JBoss returns a 500. But on its way out of Apache, it gets munged into a 200. So urlblast sees a tiny page returned in a few ms. Without a screen shot i cant prove that. I agree, the storage would be a consideration. I think overwriting the screen shots would be an acceptable tradeoff.

I was talking about page level data. I’m scanning 100 URLs, and i’m only interested in performance, and im running it around the clock so i can build up a trend over the month. I’m seeing v. fast times early morning and they slow down through the day until about 9pm when they begin to speed up again. I want to see the patterns over the weekends vs weekdays etc.

By doing this i can factor in latency of the ISP used for the collection of data, the latency of the Internet in the UK, and get a real handle on how well those websites were performing.

I could just do with 20 more nodes!!

I’m going to read in the tsv page file using mysql, and then drill down.

Cheers,

Stuart.

P.S

Once I’ve completed my testing, i’d like to offer my node out for use from the main pagetest site.

Hello!

I read this post with great interest. I want interested in trying to generalize these results. Although we know that website optimization is important, from a telco or operator perspective, it is relevant to understand what is the common experience per webpage. I hope you could provide your feedback on whether this makes sense or not at all.

Your stats provide us really interesting information. However, out of all the unique URLs, is it possible to know how many are for instance, in the top 100 as defined by Google? My guess is that most people use the same sites frequently.

In my attempt to generalize your results I have taken metrics from google, especifically requests per page and web page size and try to filter out your data with those two items.
http://code.google.com/speed/articles/web-metrics.html
My intention is to have data from your set as closely similar as the top 100 defined in this list and also take some outliers off.

I did the following:
(1) filter all your results that are

  • below Top 100 low 10% page size
  • above Top 100 high 10% page size

(2) filter further the results that are

  • below Top 100 low 10% number of objects
  • above Top 100 high 10% number of objects

Re plot the chart and recalculate averages for load time and average page size.

Would you be so kind on giving any thoughts on whether you believe this is representative/useful, or likely not to be the experience of an average user?

Best regards
-Andreas
Solution Architect, Japan


At a high-level, here are the average statistics across the whole sample set:

Load Time: 10.1 seconds
Time to First Byte: 1.1 seconds
Time to Start Render: 3.8 seconds

Page Size: 510 KB
Number of Requests: 50
Number of Redirects: 1

Nice graphs.

I’d love to see what the separation is between the top 100 sites and the not-in-the-top 100 sites. I’m not sure you could draw any conclusions from it but it would be an interesting data point.

I don’t know that there really is such thing as “the average user”. We see huge variations in browser installs with even the most prevalent version usually not having more than 30-40% of the share for a given site. Put variations in geography and connectivity on top of that and at least for the times there is a huge distribution.

The page size and request count information should be fairly consistent (assuming the sites aren’t doing much browser-specific work like data URI’s). Even then the “averages” might be interesting to look at over time but I bet the distribution would be even more interesting (the average may stay the same but are the heavy sites getting heavier and the light sites getting lighter?