Congestion Window issues

Hi there

I think it would be useful to surface potential congestion window issues for content downloads in the UI.

I have a report that is mostly ‘green’ in the top right area which is nice:
http://www.webpagetest.org/result/140903_Y8_17TS/

However, on closer inspection I felt that some of the images were taking longer than would have been expected.

Specifically: http://www.webpagetest.org/result/140903_Y8_17TS/9/details/

and this infor from waterfall view:
URL: http://s.games.iwin.com/m/alex_harrison/bugmatch/v_5/banner.jpg
Content Download: 352 ms
Bytes In (downloaded): 29.9 KB

To troubleshoot this I ran again with tcpdump and then parsed the tcpdump for this request (on port 53859) within cloudshark.
https://www.cloudshark.org/captures/a0ef29f69bff?filter=tcp.port%3D%3D53859

I was initially expecting to see errors but what I saw was repeated two TCP segments being delivered and then an ACK which strikes me as a really low congestion window setting.

We’re now following up with the relevant party to address this issue.

If WPT is calculating the download time then I imagine it might know how many segments are being delivered per ACK.

A value of 2 seems very low (compared to Initcwnd settings of major CDN providers - CDN Planet) so it might be useful to surface this as a red / amber / green value.

Hope this helps,

Michael Ewins
mewins@iwin.com

Got this reply from our CDN provider. I’m sharing this not to disagree with their analysis but to see if anyone can understand why we might be getting these results.

response below >>

As this is a free testing website, they are under heavy use over a changing network path, so it’s difficult to tell what is occurring and where.

I’ve run into other customers using this tool and have found widely varying results which differ greatly from other testing solutions. The service we are using to test is a dedicated, pay service, so the results seem to be much more reliable. Regarding the CloudShark analysis you’ve provided, it seems to be based on the same webpagetest.org results, which are again, difficult to rely upon. Allow me to illustrate this with some comparative test results below:


iWin - http://s.m.iwin.com

Webpage - 1581ms
http://www.webpagetest.org/result/140903_Y8_17TS/9/details/

Catchpoint:Cogent - 435ms
http://p.catchpoint.com/ui/Entry/PW/I/DH-hMA-CI-I818-jRlyeOBVAC-AV0AR0-jRlyeOBVAC-hMA

Catchpoint:Level3 - 293ms
http://p.catchpoint.com/ui/Entry/PW/I/DH-hMA-i-I880-jRlyeODtlq-AV0AR0-jRlyeODtlq-hMA

Catchpoint:NTT - 327ms
http://p.catchpoint.com/ui/Entry/PW/I/DH-hMG-CF-I490-jRlygInFmg-AV0AR0-jRlygInFmg-hMG


Google - http://google.com

Webpage - 1227ms
http://www.webpagetest.org/result/140905_2P_JD4/1/details/

Catchpoint:Cogent - 203ms
http://p.catchpoint.com/ui/Entry/PW/I/DH-hMC-CI-I865-jRlyfIdhQQ-AV0AR0-jRlyfIdhQQ-hMC

Catchpoint:Level 3 - 202ms
http://p.catchpoint.com/ui/Entry/PW/I/DH-hMC-i-I896-jRlyfIetjE-AV0AR0-jRlyfIetjE-hMC

Catchpoint:NTT - 187ms
http://p.catchpoint.com/ui/Entry/PW/I/DH-hMC-CF-I927-jRlyfIf514-AV0AR0-jRlyfIf514-hMC


CNN - http://cnn.com

Webpage - 4364ms
http://www.webpagetest.org/result/140905_PD_JXS/1/details/

Catchpoint:Cogent - 590ms
http://p.catchpoint.com/ui/Entry/PW/I/DH-hMM-CI-I755-jRlyuJ0aga-AV0AR0-jRlyuJ0aga-hMM

Catchpoint:Level3 - 768ms
http://p.catchpoint.com/ui/Entry/PW/I/DH-hMP-i-I833-jRlyvJpHtc-AV0AR0-jRlyvJpHtc-hMP

Catchpoint:NTT - 786ms
Error

As you can see from the above results, webpagetest.org shows much higher results for any tested webpage, including popular pages like google and cnn, while Catchpoint shows significantly lower results for all sites tested. This leads me to believe that the information gathered by webpagetest.org is not reliable for in depth network analysis. The capture you’ve used seems to have been collected from the same site, which leads me to question the reliability of this information.

We are open to investigating this issue further, but the information you’ve provided so far doesn’t agree with the tests that we’re seeing when we test from our dedicated testing solution. Have you tried testing against assets on s.m.iwin.com, then analyzing the output through CloudShark to see if the segment size is the same? If so, would it be possible to provide that information for further analysis?

The test differences they are seeing largely come down to the connectivity. The catchpoint tests are running directly on backbone network links (gigabit with zero latency) which is why they are so much faster.

The webpagetest result is running with a Cable connection profile which limits bandwidth to 5Mbps (and introduces some latency). If you look at the bandwidth chart at the bottom of the waterfall, the bandwidth is pegged at the full 5Mbps while downloading the images so it’s not really an origin delivery problem but a problem in trying to get the image bytes through the pipe.

You can also run the WPT tests with no traffic shaping (connectivity selection under advanced settings): http://www.webpagetest.org/result/140910_0R_KDK/ and the images all load really quickly.

At that point the site is mostly CPU constrained and running on the Dulles Thinkpad machines which are a lot faster hardware comes in right around the 500-600ms that they see in Catchpoint: http://www.webpagetest.org/result/140910_AP_KFP/

I didn’t look closely enough to see if the congestion window would let you ramp up to the full bandwidth a few round trips faster but at best you’re talking < 100ms overall on the images because the parallel loading is saturating the link pretty quickly.

With zero latency they also won’t see any impact from congestion window settings so the Catchpoint testing isn’t really going to show anything one way or another.