I’ve been promising to do it for a while so i finally sat down and crunched some numbers by looking at all of the tests that have been run on WebPagetest over the last year.
There have been a little over 62,000 tests run over the last year. Of those, there were 24,763 unique urls that were tested using the default Dulles DSL location so I picked that as the sample set. For any given url, I used the most recent successful test and only looked at “first view” results.
At a high-level, here are the average statistics across the whole sample set:
Load Time: 10.1 seconds
Time to First Byte: 1.1 seconds
Time to Start Render: 3.8 seconds
Page Size: 510 KB
Number of Requests: 50
Number of Redirects: 1
I was a little shocked by how long the average Start Render time was as well as how big the average page was. Things really start to get interesting when you look at the distribution of the results. For all of the charts below (also available in the excel workbook at the bottom of the post) what is being graphed is a cumulative distribution of the results. I’ll walk through reading the graph for the first one but then mostly just talk about the results themselves.
Here is a cumulative distribution plot of the load times across all of the tests. As the times get larger, the percentile is the % of tests that loaded in under that amount of time. To see where you land against the other sites, take your load time and find it on the x-axis, then go up and see where you cross the line. You are slower than that % of web sites. WebPagetest has a 2 minute timeout so anything longer than that would have been an error.
I was a little surprised at how long pages take to load, given that this is over a reasonably fast DSL connection (1.5Mbps). A full 35% of the pages took more than 10 seconds to load with only 33% coming in under 5 seconds. For reference, Yahoo’s portal (which is a fairly rich portal with lots of content) loads in under 3 seconds.
[size=x-large]Time to First Byte[/size]
The “Time to First Byte” measurement measures how long from when the user tries to access your site until the first bit of a response is sent by your server. This includes the DNS lookup, Socket connection and the time for the server to start responding with the base page. There is roughly 50ms of round-trip time latency on the DSL connection so there is a baseline of 150ms just for the 3 round trips, anything above that is fodder for improving.
Here things actually look reasonably good. 76% of the sites start to respond in under 500ms which is pretty good considering sites from around the world have been tested (and the further from Dulles you are the longer each round trip will take). The 4% of sites that take over 4 seconds definitely have some back-end work to do though.
The “Start Render” measurement is a pretty interesting one and one unique to Pagetest as best as I can tell. This measures the time at which something can actually be displayed on the screen. It doesn’t guarantee that the page is displaying anything interesting (could just be a banner ad spot) but up until this point it is guaranteed that the user is staring at a blank page.
There is quite a bit of room for improvement here. A full 60% of the sites tested take over 2 seconds to display ANYTHING to the user with 20% of the sites taking over 5 seconds. That’s a loooooong time to be staring at a blank screen.
The “Page Size” measurement keeps track of the number of bytes downloaded to load the web site being tested.
Definitely eye-opening that 30% of the sites were over 500KB. That’s freaking HUGE!
[size=x-large]Number of Requests[/size]
Here we count the number of requests required to load the page.
This is probably the single most important reason for the load times being as high as they are. 33% of the sites require over 50 requests with 12% over 100.
[size=x-large]Number of Redirects[/size]
This counts the number of redirects that occurred during the page load (not necessarily just for the base page).
It’s not as interesting as some of the other metrics but I wanted to see what the distribution looked like. It’s good to see that 66% of the sites have no redirects at all and another 14% only have 1. That last 2% with > 8 redirects could probably use some work though.
Here we look more at the why’s behind the page performance and see how well optimized the sites are in aggregate. I didn’t pull stats for ALL of the optimization checks but I got the most important ones. For these graphs you want to be as far to the right as possible.
[size=x-large]Enable Persistent Connections (Keep-Alive’s)[/size]
This checks to make sure keep-alive’s are enabled for any connections that would benefit from it (if there is only a single request to a domain you will not be penalized for that request since it doesn’t need keep-alive’s). The total score is the % of requests that are properly using keep-alive’s (so a score of 80 would mean that 20% of the requests on your page need to have keep-alive’s turned on).
50% of the sites are fully benefiting from keep-alive’s. 80% of the sites are doing pretty good with only 20% room for improvement on them. This is by far the easiest thing to fix so there is no reason that everybody shouldn’t be at 100%.
[size=x-large]GZip Text Content[/size]
The GZip compression test looks at all text (non-xml) requests and makes sure they are gzip compressed. If they are not already compressed it will compress them and the savings in bytes are used to calculate the score. This way the more compression that you could be getting (and for larger objects) the lower your score. If you could save 60% of the bytes for all of the text resources by compressing your score would be 40.
Pretty interesting (and smooth) distribution. 22% of the sites are properly using gzip compression, however 50% of the sites could save over 50% of their text bytes by enabling gzip compression.
[size=x-large]Properly Compress Images[/size]
The other size of the compression coin is image compression. The most critical here is usually picking appropriate quality levels for jpeg compression but smushing your png’s and gifs is helpful as well.
Once again, 22% of the sites are properly compressing all of their images. The fall-off here is pretty rapid with only 15% of the sites needing to re-compress 50% of their images. The bytes savings by properly compressing images can be pretty significant though so it is worth the effort.
[size=x-large]Combine Multiple CSS and JS Files[/size]
This is probably the single most important optimization that can improve your start render times. Here we look at all of the js and css files that come before the start render and if there are more than one of each type we start deducting points. 5 points are deducted for each extra css file and 10 points for each js file (since js blocks other content the impact is larger). The goal is to get down to at most one of each.
41% of the sites do a good job here. The other 59% could get some real benefit to their user experience and for 20-30% of the sites the benefit could be quite substantial.
[size=x-large]Enable Browser Caching of Static Content[/size]
Here we see how well sites are using Expires or cache-control headers to allow browsers to cache static content and not make wasteful if-modified-since requests. Improvements here don’t really help first-time visitors but they can be very substantial for repeat visits (upwards of 90% improvement in load time is not unusual).
LOTS of room for improvement here. 25% of the sites are completely broken and don’t cache anything that they could. 80-90% of the sites could gain significantly and only 6% of sites get a perfect score.
[size=x-large]Use a CDN for Static Content[/size]
This is one of the more controversial checks but I thought I’d include it anyway. A Content Distribution Network (CDN) may not make sense for smaller sites and if you do everything else right the gains may not be as significant (though if you can’t do some of the other optimizations it will help hide those sins).
Not really surprising - 56% of sites don’t use a CDN for their static content at all. I expect most of the remaining sites until you get up into the 90% range only get CDN credit because they use adsense or something else external served by aa CDN. 5% of sites tested make full use of a CDN.
I am also making the raw data (urls anonymized) available for download. I’m sure there are a lot more interesting ways to look at the data (correlation of performance to optimization, etc):
CSV (just the raw test data): [attachment=1]
Excel Workbook with all of the charts and data: [attachment=2]