What next???

http://www.webpagetest.org/result/130628_AE_6F6/

My suggestions are:

  1. use a cdn ( we are, but it’s just another server in the same DC ).
  2. sort out what js can be deferred (? not sure what impact that ga.js download has )

It’s a magento site, and the server platform seems to be running fine…

Any others?

Cheers,

Steve

Reduce the quantity of JS and CSS in the head.

Depending on where you customers are consider serving the CSS from the same domain so that the browser doesn’t need to resolve a new domain and can use the TCP connection that most modern browsers open speculatively.

Establish where PIE.HTC is coming from

Test with a more modern browser - unless you have lots of customers of IE8 of course.

I don’t really have that much control over the content of the site - I’m just tuning the server…

What’s the problem with pie.htc? Ah… need to find out why it’s downloaded twice, and why they are different files!
I’m serving static files from static… as it is cookie free, and it spreads the network bandwidth over two (100mbit) network connections.

I’ve only had a quick look, but your TTFB are quite big. for the initial page the ttfb is more than 3 times the time to download the content.

I can see you’re using nginx and php, so i’d have a look at the php configuration for a start, but more worrying for me is the TTFB off your static server. If these are truly static assets that you are reading off the disc, they should be served very quickly unless some rewrite logic is being fired (in which case review it), or the disc that the assets are stored on is slow, or if you don’t have enough CPU to manage the NIC (you need to allow enough CPU to manage the interrupts from the NIC as well as the OS, web server and the application stack), especially if you are using SAN storage rather than local disc.

If you cant touch the app, take a look at implementing a varnish cache tier, and give it plenty of RAM.

@hsiboy

How is your nginx+php wired up (cgi or fpm)? If you’re not already, use php-fpm and make sure APC is also enabled and has plenty of ram.

And if you aren’t already, channeling Artur Bergman a bit ( Velocity 2011: Artur Bergman, "Artur on SSD's" - YouTube ), throw SSD’s at the server. Even consumer SSDs have insane IOPS and match the access patterns of web very well (assuming the database and all the static files don’t fit in RAM). They can often hide sins but they also completely eliminate entire classes of optimization and can negate the need for several different caching layers.

@hsiboy.
I’ve run another benchmark, with native speed, and chrome.
http://www.webpagetest.org/result/130702_RJ_2Q4/

I see an initial TTFB of 115ms, down from 178ms. TBH, seeing as this is a multilingual site with tens of thousands of products, I’m pretty happy with this. Whether the cache performance could be improved is another question, but I’m not in the slightest worried by these results.

The static server has 2GB spare for cache usage, and network interrupts peak at c. 750/sec, connections peak at c. 1,750, CPU ( primarily usermode ) at c. 25% - there is a staging server on here too…

I’ve updated the nginx config on the static site, upping worker_rlimit_nofile to 100k, and increasing the open_file_cache to 20k, but it doesn’t seem to have made an appreciable difference.

nginx config for static is

    location / {
            if ($request_filename ~ "\.(js|jpe?g|css|docx?|gif|png|txt|pdf|swf|ico|mp3|woff)$") {
                    expires 30d;
                    break;
            }
            return 404;
    }

I sort of can’t really pare it down much further.

@Patrick,

Server platform is multiple VPSes, nginx talking to mainly remote ( LAN ) php servers. I’ve got to the stage where - once up and running - there is practically zero disk IO - InnoDB buffer and query cache handle 99%+ of DB IO ( which is almost all read as you’d expect ), php servers use c. 50% of available memory even with 1GB APC segments, sessions and cache are managed through redis data stores. Artur’s presentation was interesting, but I still cringe when people throw away all of those decades of development with a new toy… combine the two, and surely you’ll gain even more?? ( but yes, in my case, everything fits into memory ).

So I’ve sort of done exactly the opposite of what you’re suggesting, due to the fact that high performance IO is what can’t be expected: so keep it all in memory. Same sort of result, just a bit more fragile…

I was a bit worried about bandwidth - my 5 min averages are showing 50+Mbit/s - but apparently they are now rated at 250Mbit/s, so I probably shouldn’t (: