Gaps in waterfall specifically in IE

I use the following WPT environment:

  1. WPT server is hosted on a linux machine
  2. Community AMI windows server 2008 for test agents(California and Virginia)

I notice a huge gap in the waterfall which is not present in some of the old builds. This is IE specific.This gap doesn’t show up in firefox or chrome.

I have attached the screenshots. I updated the wpt test agents(latest version on May14th) and that doesn’t help. Is some one aware what could be the root cause?
[hr]
Both the screenshots are from the same URL, same test environment and same browser(IE10). It just got worst.

Can you enable tcpdump capture and grab a tcpdump? Gaps in that area tend to be SSL certificate revocation checks.

Enabling tcpdump and reading the captures helped me to figure out what was going on in that gap. Thanks a tonne Patrick!

Hi Sundeep,

I am wondering if you can share what you found. I am investigating the exact same problem.

Thanks!
-Matt

We got in touch with our CA as the OSCP verification was slow. It consistently took 400-500 ms. We compared the OSCP verification time with other CA’s and shared the findings with the vendor.

FWIW, if you want to get rid of the gaps entirely you can look into OCSP stapling (if your SSL termination point supports it): How To Configure OCSP Stapling on Apache and Nginx | DigitalOcean

Thanks, Patrick!

We do have OSCP stapling enabled on our CDN (or rather AWS CloudFront claims to have OSCP stapling enabled - New SSL Features for Amazon CloudFront – Session Tickets, OCSP Stapling, Perfect Forward Secrecy | AWS News Blog).

However, we still see enormous, 2+ second gaps in the waterfall for CDN requests. For example, take a look here: http://www.webpagetest.org/result/140929_9M_13KR/1/details/. cdn[0/1/2/3].apartmentlist.com is powered by AWS CloudFront.

SSL Labs is showing OCSP Stapling: https://www.ssllabs.com/ssltest/analyze.html?d=cdn0.apartmentlist.com&s=54.192.34.16

On the physical thinkpads it looks like the gaps are smaller but still way longer than I’d expect: http://www.webpagetest.org/result/140929_BE_19QJ/1/details/ (the gaps don’t look big enough to be validation checks but the SSL time is still really long)

The results from a Thinkpad look way more reasonable.

The gap for the www.apartmentlist.com is about 10x smaller than in the test I linked about, and is about 200ms. I would expect that to be normal OCSP delay, since www.apartmentlist.com does not support OSCP stapling (due to Heroku not supporting it at the moment). The gaps for cdnX.apartmentlist.com are even shorter (presumably because CloudFront supports OSCP stapling).

However, what I see from Denver location looks terrible. We have RUM (Real User Monitoring) enabled via NewRelic on our app, and we see that there is a large number of users that are experiencing similar performance performance in real life (20+ second page loads).

At first I thought there is something wrong with out certificate (we had one from RapidSSL), but I replaced it with the one from Comodo (total shot in the dark!) and absolutely nothing has changed in the waterfall and these gaps.

Any idea what I can do to get to the bottom of it?

FYI: OSCP stapling is supported on *.cloudfront.net domain but not on custom domains in AWS. One of our sys engineers reported this to AWS.

Thanks, Sundeep.

So you have a support link / number about the OSCP stapling not being supported for custom domains on CloudFront? SSL Labs showing OSCP stapling enabled on cdn0.apartmentlist.com (which is powered by cloudfront). Perhaps the issue is already fixed?

Hello Matt,

Yes we have a ticket number and the latest update says AWS is investigating the issue and their product teams are on the case. It is an internal support case so you might not be able to view the ticket history/details.

What you say is correct. For apartmentlist.com OSCP stapling works even for custom domains. In our case OSCP stapling doesn’t work for custom domains. We cross checked both the URL’s in SSL Labs.

We also shared this finding with AWS(for apartmentlist.com OSCP stapling works).

I work with Sundeep, and we finally figured out what was wrong.

The AWS support team checked on their side and confirmed that everything worked well, but the way we did our tests for OCSP stapling was actually flawed.

The OCSP stapling happens at the CloudFront edge level so every node from an edge location needs to do it. The first request returns immediately a non-stapled answer and the node fires an OCSP request to our CA and then caches the answer which is then used for the subsequent requests hitting that edge node. (The nodes currently don’t share these caches, but I filed a feature request so that they try to use a shared storage for that content, like they do for static content.)

During our test we only executed a relatively small number of requests, so we never hit the same edge node twice and that’s why it appeared the stapling was broken. When testing you need to fire a few hundreds of requests until consistently getting stapled responses.

As for explaining the rest of the gap from our waterfall graph, we saw that it often happens that the CPU is maxed out while loading our app, for multiple reasons:
[list]
[]sometimes due to multiple SSL connections started at the same time, since they are quite heavy during the negotiation phase
[
] while other gaps can be explained due to heavy CPU use during the parsing of Javascript and CSS content (our webapp code is around 400K when minified and compressed)
[/list]
When these overlap, the gap can grow even up to a few seconds, and it’s exacerbated if the testing machine is not powerful enough.

The CPU usage is the bottleneck. I think you have too many SSL connections being set up in parallel, those are quite heavy on the CPU.

If possible, try to switch to a CDN provider that supports SPDY, so you’ll only have an SSL connection to each of your domains.