TTFB constantly fluctuates from sub 1s to minutes long

This has been driving me crazy all day.

Results for homepage: WebPageTest - Running web page performance and optimization tests...

Sometimes, the site will work perfectly, returning pages in less than a second. Other times, random static assets will hang for over a minute, the page won’t first byte for 30-60 seconds, or the connection will poop out completely and CloudFlare will pop up a 520 error page.

The site is hosted on a cloud VPS. top shows very low CPU usage, disk space/quota isn’t close to being fully used, and upgrading memory doesn’t seem to affect the issue whatsoever. Site traffic is low and bandwidth isn’t being saturated at all.

Even requesting blank static assets sometimes takes up to 6-10 seconds for TTFB, seemingly at random.

Any suggestions?

It looks like in Run 1 both of those requests for static assets (requests 38 and 46) were edge cache misses (See the Response tab on request 38 for example, it contains a response header of CF-Cache-Status: MISS) so hit the origin server(s). In the other runs, it looks like the static assets have a mix of cache hits and misses, but sometimes the misses take a long time.

Maybe consider looking at your origin access logs or webapp logs to see if they can give you more insight into what may be happening and where. If there are multiple VMs/processes serving traffic, it could be that one is having issues serving. In several of these runs, it looks like the first request, the dynamic request to the homepage is also periodically taking multiple seconds which seems to indicate an origin issue.

Are static assets served as static assets from the webserver or are they part of a web application? If served from disk, tuning is likely going to focus on the webserver itself, OS, network, etc. If from the web application.

Using ApacheBench with concurrency as described in by @dfavor in this thread https://www.webpagetest.org/forums/showthread.php?tid=13696 helped me troubleshoot a similar issue.

In my case it was a problem with an AWS Elastic File System, most requests would be fast but with concurrency there would be some taking several seconds.