Site loads but says timeout?

Anyone have any ideas why a site would load normally on my own browser, but say ‘Timed out’ via

Check out this result: WebPageTest - Running web page performance and optimization tests...

If you load it in your own browser you’ll see ~150 resource successfully load (definitely needs improvement), but it completes.

Yet the 99997 result code from the WPT indicates things are failing to load, and it stops after 47 resources.

Is this an issue with the url, or WPT?

It timed out here on the first visit but loaded correctly after a second try. Possibly from the media and js CDN domains timing out when first retrieving the files from the origin?

If I had to guess, there may be a critical request that the connections are failing for and the failed connections aren’t showing up in the waterfall (maybe the origin for the failed requests has firewalled EC2 access). It should only cause a 20 second delay but if there are several then maybe it gets triggered multiple times.

A tcpdump should help in tracking down the connection (if that is the issue) and maybe comparing to a test run from a non-EC2 location will help display what the failed connection is.

It could also be doing something that is either confusing Chrome or the agent. The new Linux agents show a bunch of requests that never completed:

Possibly a problem with the HTTP/2 server or something else going on?

Likely you won’t like hearing this…

Looks like your problem is a combination of CloudFlare + Magento.

[ … rant on … ]

It appears your primary problem is you’re using CloudFlare.

CDNs add complexity creating difficult to debug problems + CDNs only help sites which are abysmally tooled… meaning the LAMP stack is untuned (using default config files). Way better to invest in tuning your LAMP stack, rather than using CloudFlare.

First thing I always do with new clients is strip out all CDNs (CloudFlare) + Proxies (NGINX/Varnish) + caching nonsense (memcached).

[ … rant off … ]

Here’s the problem with CloudFlare.

  1. Take a look at these two waterfalls…

Notice they are completely different which means either CloudFlare or your Hosting or how CloudFlare is interacting with your Hosting is glitchy…

Glitchy… tech term for inconsistent times to serve content.

Repeat tests of a site should give somewhat similar results.

  1. You can easily see the problems by drilling down on specific assets.

I do this by curl’ing (copy a file via curl or wget to one of my servers) + comparing time to serve the asset.

  1. Notice asset #34

951ms for CloudFlare…

243 ms on one of my servers.

  1. Notice asset #27

680ms for CloudFlare…

225ms for one of my servers…

Suggested fix.

Remove all cruft (CloudFlare + NGINX + anything else you’re running).

The other challenge is you appear to be using Magento, which is extremely difficult to tune.

If you’re generating substantial cash from this site, best tuning approach…

  1. Relocate /tmp to tmpfs (off disk into memory). This will cause PHP session files + MySQL temporary datasets (side effect of complex SELECTs) be memory resident (run at memory speed, rather than disk head seek speed + disk i/o speed).

  2. If you’re running any downlevel LAMP code, upgrade to latest everything.

Currently this means Apache-2.4.25 + PHP-7.1.5 and latest OpenSSL + config Apache to run HTTP2 + ALPN + Stapling + Strict Transport.

  1. If you’re running MySQL (shudder), remove they MySQL software (leave data) + install latest MariaDB.

  2. If you’re running MyISAM tables, convert to InnoDB (after MariaDB is installed).

  3. Run mysqltuner every few days + implement any diagnostics emitted.

If you’re not generating substantial cash from your site yet, dump Magento + go with WordPress.

With Magento you’ll always spend more for slower throughput.

I’ve never been able to tune a Magento site to come close to WordPress speed.

  1. Finally, if you simply must use Magento (horror or horrors), then you’ll have to run a memory resident database subsystem, if you have any appreciable traffic.

This means at boot time of your machine or LXD container, copy /var/lib/mysql into a tmpfs filesystem. Start MySQL/MariaDB pointing to memory resident database. Run an rsync every minute from tmpfs to /var/lib/mysql to pickup any changes. Then at host or container shutdown, as part of the systemd MySQL/MariaDB shutdown action, after daemon stops, do one final rsync to bring all data consistent.

  1. Also you’re running PHP 5.4.16 which is highly hackable.

Best first step, update your LAMP stack to latest.

If you’re hacked, likely your site speed will become a secondary issue.

Thanks Patrick and David for your responses.

Patrick - I have tried this from a non-EC2 location (Dulles), and same result (WebPageTest - Running web page performance and optimization tests...). I’ve also tried countless times from my own browser, and other tools such as GTMetrix and Pingdom’s page speed tool, and they load just fine.

I am aware that this is being routed through Cloudflare, which has HTTP/2 enabled, allowing many concurrent downloads. In fact, on GTMetrix I can see that at one time there’s 80+ concurrent downloads from the same IP address, perhaps that’s tripping up WPT? Either way, I’m beginning to think this is a WPT agent-related issue, and I’ll make my way over to the Github repo to see if an issue there would be better handled.

David - I appreciate the time you spend writing out your diagnosis, but I have to disagree with you on a few things. First, loading a single resource as a WPT cannot be compared to that same resource as part of a full page load, as mentioned above there’s many many concurrent requests happening with this page, and loading many requests at once (which may make each request appear longer) does have benefits as a whole considering all requests.

Also, while I do agree that fine tuning things are the server level are great, Cloudflare has shown to me (in many tests verified by MachMetrics) that is does help 90% of sites on shared environments that cannot afford dedicated environments. Additionally there’s a reason CDN’s have been shown to help page speed, and making a statement suggesting to strip all CDNs from any website is a bold thing to say.

This is not my website, but I will pass on the other suggestions to the owner.

I have done my fair share of MySQL/MariaDB tuning, and definitely agree with you on the gains with InnoDb, mysqltuner, and having adequate memory - thank you for the reminder.

Still digging into what is going on but it looks to be timing-sensitive and chrome-specific. Without the traffic-shaping it succeeds and Firefox and IE don’t have issues.

I’ll keep most of the investigation over on the Github issue but I think the site may be triggering a chrome bug somehow and getting a reproducible case would be great (so I can file a crbug on it). Current best guess is that a list of resources are starting to fetch but getting removed and for some reason part of Chrome still thinks they are pending and block the document from completing.