Sometimes, the site will work perfectly, returning pages in less than a second. Other times, random static assets will hang for over a minute, the page won’t first byte for 30-60 seconds, or the connection will poop out completely and CloudFlare will pop up a 520 error page.
The site is hosted on a cloud VPS. top shows very low CPU usage, disk space/quota isn’t close to being fully used, and upgrading memory doesn’t seem to affect the issue whatsoever. Site traffic is low and bandwidth isn’t being saturated at all.
Even requesting blank static assets sometimes takes up to 6-10 seconds for TTFB, seemingly at random.
It looks like in Run 1 both of those requests for static assets (requests 38 and 46) were edge cache misses (See the Response tab on request 38 for example, it contains a response header of CF-Cache-Status: MISS) so hit the origin server(s). In the other runs, it looks like the static assets have a mix of cache hits and misses, but sometimes the misses take a long time.
Maybe consider looking at your origin access logs or webapp logs to see if they can give you more insight into what may be happening and where. If there are multiple VMs/processes serving traffic, it could be that one is having issues serving. In several of these runs, it looks like the first request, the dynamic request to the homepage is also periodically taking multiple seconds which seems to indicate an origin issue.
Are static assets served as static assets from the webserver or are they part of a web application? If served from disk, tuning is likely going to focus on the webserver itself, OS, network, etc. If from the web application.
In my case it was a problem with an AWS Elastic File System, most requests would be fast but with concurrency there would be some taking several seconds.