Now I switched with most of the sides content to a CDN and the problem sometimes still occur. My provider says, everything is ok (the provider checked it twice in the meantime). But I still have problems with the TTFB.
At the moment the VPS must only handle one(!!) request and is obviously not able to do that in an appropriate way. Any ideas what the problem could be?
A CDN for your static resources has no effect on the TTFB of your pages. In my experience, it loads well most of the time (all of the time, actually, but your test results clearly indicate otherwise). Perhaps something like a scheduled (cron) job is straining the CPU at intervals, thus interfering with page load times?
thank you very much for your answer. Yes, I know that the CDN has only an effect on static resources. But if you look in my first post, I had also trouble with the TTFB with static resources on my VPS.
I just monitored my cpu usage via “top -d 1” while doing the wegpagetest.org and I found one point: The long TTFB only occurred once during about 40 tests. And in this moment the ps showed that the command php-cgi will need about 28% of cpu usage.
I think this could be the reason but I have no idea what to do in this case?
28% shouldn’t be a problem unless your share of CPU on this particular VPS is less than or equal to 28% of a full (virtual) core. Also, if it’s consistently at 28%, all your requests should suffer from the longer TTFB.
Have you noticed any other patterns? Perhaps it’s always the first request that has a longer TTFB? In that case, you could look into how caching is done for your site; perhaps the first request takes longer to process because the cached version is invalidated.
Unfortunately I didn’t find any other patterns. Before I used a CDN in most of the cases there was a long TTFB for static files like pictures (see test link in my first post). And that’s a fact which not fits together with the theory about a caching problem.
Nevertheless I set the cache time in WP Super Cache from 1 hour to 24 hours.
The biggest problem which I don’t understand is, that there is also a long TTFB time for static resources if I don’t use a CDN.
Could be poor disk performance. When you’re testing the site and php-cgi takes roughly 28% of CPU, what’s the CPU doing? Is it waiting for the disk? (%wa in top) You can use tools like hdparm (e.g. hdparm -tT /dev/vda1) or ioping to get a basic idea of disk performance, or something like fio for a more thorough analysis.
But there was no high CPU usage and the %wa was all the time 0. The logging with top is a little bit complicated because you get the results only each second. There were also no high ms values via ioping. Sometimes about 15-20ms but that’s nothing in comparison to a TTFB of about 1000 ms.
Do you have any other ideas?
Further question: Is there any possibility that this is not a server-side issue? If not, then I will probably change my provider.
You always seem to use the Falkenstein test instance, so I would suggest trying a few other WPT test locations to see if they show similar results. If they do, then I would probably contact the hosting provider about this first, to see if they might have a clue.
One more thing you could try is profiling your PHP with XDebug or New Relic. Run your tests again with one of those installed and you should be able to see where the backend spends most of its time during those requests with a long TTFB.
I see no increse in TTFB for the html file but I do see increse in TTFB for images. Maybe your server has something that optimizes images. So that on every image request the image get’s resized, or smushed. Do you also see increse in TTFB for .css and js?
Right, sorry, I forgot you were also experiencing long TTFBs on static resources. It’s looking more and more like a hardware issue, but since it’s very irregular, your is not going to notice it. Try hosting your site elsewhere temporarily to see if it makes a difference.
It could also be an apache configuration issue if your are running out of clients to handle the requests. If you are running prefork MPM then each connection ties up one of your clients for the life of the connection, even if it is just sitting around (which is a problem with keep-alive).
Some of the other MPM’s are more sane but depending on how you have php integrated may not work. These days I usually run nginx or varnish in front of Apache (on the same box) if I absolutely HAVE to run Apache.
You have a very slow server and Word Press is a very inefficient page generator.
I checked your server to server transmission time and it is a very slow at 30-40 KBytes/sec.
If you are on a shared server large variations are normal. Especially during busy times of the day like early evening. You may get more consistent results at 3:00am. It’s not just the server speed. It can be the Service Provides infrastructure. Too much traffic on the local network.
Your DNS resolution is very fast so that’s not an issue.
Your HTML (Request #1) is very fast for a Word Press page. I have to assume the CPU has plenty of capacity. And your caching plug-in is working well at that point in time. Word Press is not the issue and a CPU processing bottleneck is unlikely.
Now I see the problem.
Where it becomes evident that it is a data server issue is when you look at request #24 & #25 on your results in your first post. I presume there was something going on with the server at that point in time. Request #24 has a 1.8 sec. TTFB which on an image file is an abominably. It is either the CPU is is highly being over utilized by another process or more likely the data bus from the Hard Drive is blocked. The connection time is 39mS and followed by the 1.8 sec wait. Only those two images have long TTFB issues. It is not file size dependent either. Request #25 is a 300 Byte file with a 925mS TTFB. So there must have been something going on on the server at that point in time and similarly to a lesser degree around the time of request #7 with a 727mS TTFB. Most request have typically a 70 mS TTFB.
My best guess, without knowing how many sites are hosted on the server and if there is a local drive or if it’s using Network Attached Storage, that there is an intermittent data access issue. All indicators point to a data access delay. I just do not know the cause.