Problem with irregular TTFB

We have struggled with substantial random loading delays.
Sometimes the site is fast, at others it has wait times of up to 20 sec (!)

When I run tests, I get TTFB values ranging from excellent to F.
This happens when I load a simple text file with just a few digits as well as when I test the full homepage, www.learnthat.org.

We are hosting on AWS with plenty of medium EC2 servers and added another database server, so there is no load to talk of.

Both my system admin and developer are out of ideas, so I’m delighted to discover this forum where people seem to be really knowledgable about these matters.

Here a sample result: http://www.webpagetest.org/result/121021_J3_7RJ/
I’ve seen the TTFB as bad as 24 seconds, and as great as 0.2

Ideas, anyone??

Do you have a link to a bad test result? Everything I could see from the test logs looked like the TTFB was anywhere from 100ms to 1 second but none that were that long.

20+ seconds can be caused by packet loss if it happens to hit during a socket connect (or if you are making any back-end calls to services that may be having similar packet loss or timeout issues).

New Relic has a free offering for AWS users: aws-monitoring | New Relic

I recommend installing it and watching the results to see where the back-end time is going. From the outside we can’t really tell much about what causes slow responses, we can just identify them. Something like New Relic which profiles the back-end and can run in production will give you a lot more visibility there.

Hi Patrick, thank you for the prompt response!
The link included, WebPageTest Test - Running web page performance and optimization tests..., shows a 5.7 sec. TTFB, and it’s really going all over the place constantly.

We’ll take a look at the tool you suggested and hopefully will learn more. Thanks!

It actually shows a 5.7 sec load time (for the full page). TTFB was ~1 second.