[Solved] Sometimes Long Time to First Byte Time for the first request

Hi,

in the last couple of weeks I am working on improving the performance of my website. After several hundred tests via webpagetest.org there is obviously one thing, I cannot understand yet.

If I test my website 9 times in a row, the First Byte Time is in some Requests and also in different tests very volatile. You can see that in the following test row:
http://www.webpagetest.org/result/131020_0V_74f08fb0aa2b3a0b54607644e14345b9/

Can anyone explain me that behavior? The website runs on a virtual server, is it a topic regarding to customizing the backend?

UPDATE: The solution - especially in my case - was postet at the end of this thread.

No ideas about would the error / performance issue could be? In the meantime I reduced a few HTTP GET-Requests but the problem still exists.

E.g. http://www.webpagetest.org/result/131101_N9_6a81e7d37ee2672cf50c58758072497e/

Now I switched with most of the sides content to a CDN and the problem sometimes still occur. My provider says, everything is ok (the provider checked it twice in the meantime). But I still have problems with the TTFB.

At the moment the VPS must only handle one(!!) request and is obviously not able to do that in an appropriate way. Any ideas what the problem could be?

Here my last test: http://www.webpagetest.org/result/131110_AM_f62b28678c30793005b4a5707a1321b5/

Best

A CDN for your static resources has no effect on the TTFB of your pages. In my experience, it loads well most of the time (all of the time, actually, but your test results clearly indicate otherwise). Perhaps something like a scheduled (cron) job is straining the CPU at intervals, thus interfering with page load times?

Hi robzilla,

thank you very much for your answer. Yes, I know that the CDN has only an effect on static resources. But if you look in my first post, I had also trouble with the TTFB with static resources on my VPS.

I just monitored my cpu usage via “top -d 1” while doing the wegpagetest.org and I found one point: The long TTFB only occurred once during about 40 tests. And in this moment the ps showed that the command php-cgi will need about 28% of cpu usage.

I think this could be the reason but I have no idea what to do in this case?

28% shouldn’t be a problem unless your share of CPU on this particular VPS is less than or equal to 28% of a full (virtual) core. Also, if it’s consistently at 28%, all your requests should suffer from the longer TTFB.

Have you noticed any other patterns? Perhaps it’s always the first request that has a longer TTFB? In that case, you could look into how caching is done for your site; perhaps the first request takes longer to process because the cached version is invalidated.

Unfortunately I didn’t find any other patterns. Before I used a CDN in most of the cases there was a long TTFB for static files like pictures (see test link in my first post). And that’s a fact which not fits together with the theory about a caching problem.

Nevertheless I set the cache time in WP Super Cache from 1 hour to 24 hours.

The biggest problem which I don’t understand is, that there is also a long TTFB time for static resources if I don’t use a CDN.

Could be poor disk performance. When you’re testing the site and php-cgi takes roughly 28% of CPU, what’s the CPU doing? Is it waiting for the disk? (%wa in top) You can use tools like hdparm (e.g. hdparm -tT /dev/vda1) or ioping to get a basic idea of disk performance, or something like fio for a more thorough analysis.

Thanks again for the answer. I did a few more tests with top again and also ioping. I also temporary deactivated the CDN.

The result was: http://www.webpagetest.org/result/131114_NP_23d432b9a0724745e4fb9ec5129e3894/

But there was no high CPU usage and the %wa was all the time 0. The logging with top is a little bit complicated because you get the results only each second. There were also no high ms values via ioping. Sometimes about 15-20ms but that’s nothing in comparison to a TTFB of about 1000 ms.

Do you have any other ideas?

Further question: Is there any possibility that this is not a server-side issue? If not, then I will probably change my provider.

You always seem to use the Falkenstein test instance, so I would suggest trying a few other WPT test locations to see if they show similar results. If they do, then I would probably contact the hosting provider about this first, to see if they might have a clue.

I tried Irland and Amsterdam as well, still the same result:
http://www.webpagetest.org/result/131114_00_0a23e2aabf3c4df78b4a9f1d138e86c3/
http://www.webpagetest.org/result/131115_D7_54968c441ce14d57bf175e0548fd965e

Upload a blank html file on your host, and test it to see if you get the same TTFB. If it’s the same then the cause is the server, if it’s not the cause is your website.

One more thing you could try is profiling your PHP with XDebug or New Relic. Run your tests again with one of those installed and you should be able to see where the backend spends most of its time during those requests with a long TTFB.

Thank you for this tip. I just uploaded a html page with a headline and a few pictures. The result: Sometimes long TTFBs like this:

http://www.webpagetest.org/result/131115_RD_4e6b44ed99bc39fea047fbdddde51014/8/details/

@robzilla: Thank you for your answer. I will try one of these tools in the next couple of days.

I see no increse in TTFB for the html file but I do see increse in TTFB for images. Maybe your server has something that optimizes images. So that on every image request the image get’s resized, or smushed. Do you also see increse in TTFB for .css and js?

Right, sorry, I forgot you were also experiencing long TTFBs on static resources. It’s looking more and more like a hardware issue, but since it’s very irregular, your is not going to notice it. Try hosting your site elsewhere temporarily to see if it makes a difference.

It could also be an apache configuration issue if your are running out of clients to handle the requests. If you are running prefork MPM then each connection ties up one of your clients for the life of the connection, even if it is just sitting around (which is a problem with keep-alive).

Some of the other MPM’s are more sane but depending on how you have php integrated may not work. These days I usually run nginx or varnish in front of Apache (on the same box) if I absolutely HAVE to run Apache.

Thanks for the responses.

@Wu4D: Sometimes the long TTFB also appears for JavaScript and CSS Files.

@robzilla: I asked me provider to move my website to another server if they could not find any problem. They will now check the performance issue the third time.

@pmeenan: Apache is working with prefork. I adjusted the settings for prefork a few weeks ago to:

<IfModule prefork.c> StartServers 3 MinSpareServers 3 MaxSpareServers 10 ServerLimit 100 MaxClients 100 MaxRequestsPerChild 4000 </IfModule>

I am doing all my tests during the night at about 3-5am, so there is not much traffic on the site.

You have a very slow server and Word Press is a very inefficient page generator.

I checked your server to server transmission time and it is a very slow at 30-40 KBytes/sec.

If you are on a shared server large variations are normal. Especially during busy times of the day like early evening. You may get more consistent results at 3:00am. It’s not just the server speed. It can be the Service Provides infrastructure. Too much traffic on the local network.

Your DNS resolution is very fast so that’s not an issue.

Your HTML (Request #1) is very fast for a Word Press page. I have to assume the CPU has plenty of capacity. And your caching plug-in is working well at that point in time. Word Press is not the issue and a CPU processing bottleneck is unlikely.

Now I see the problem.

Where it becomes evident that it is a data server issue is when you look at request #24 & #25 on your results in your first post. I presume there was something going on with the server at that point in time. Request #24 has a 1.8 sec. TTFB which on an image file is an abominably. It is either the CPU is is highly being over utilized by another process or more likely the data bus from the Hard Drive is blocked. The connection time is 39mS and followed by the 1.8 sec wait. Only those two images have long TTFB issues. It is not file size dependent either. Request #25 is a 300 Byte file with a 925mS TTFB. So there must have been something going on on the server at that point in time and similarly to a lesser degree around the time of request #7 with a 727mS TTFB. Most request have typically a 70 mS TTFB.

My best guess, without knowing how many sites are hosted on the server and if there is a local drive or if it’s using Network Attached Storage, that there is an intermittent data access issue. All indicators point to a data access delay. I just do not know the cause.

Hi iSpeedLink,

sorry for the really late response and for digging out this thread again. After I had reduced the amount of GET-Requests the monitored delays decreased. But: They sometimes - even it’s rare - occure.

I also don’t know how many sites are hosted on the server, but I will ask my provider how many VPS the hardware has to manage. Do you have any ideas how to measure an data access delay?