Sitecore Platform Benchmarking

Hello folks,

I’ve been doing some extensive testing on my firm’s domain, and am trying to get a better sense of whether the data I’m getting is common for my platform.

We’re on Sitecore (v8 i believe), and I’m noticing fairly long TTFB on many requests (as much a 4.5s for international, but even 1.5s domestic). Is this common in sitecore applications? I’m not sure if our app pool/and config is just poorly optimized for speed (we do have lots of custom components and rules), or is this still highly unusual?

Some details:
Transfer times are lightning fast once the assets are sent, and the TTFB seem to correlate more towards the order things are pulled rather than specific assets or size. I’m noticing some degradation, where early request are running about 100-400ms, then hitting a point where they start taking much longer (though usually 1-2 requests end up shorter). My running guess is this comes from a bottelneck in multiplexing on the HTTP2 connection, as it seems to start degrade for every 5th-6th asset pulled through the connection.

I’m still not 100% that’s a good analysis though, and even if it is, is there a solve for that? Seems like we’d be able to stand a second, cookieless CD server/domain to provide another open connection, but that could be expensive especially getting it into our CDN. I’m also considering trying to combine images and position through CSS , though that’s going to be unpopular with the devs.

I’ve also struggled to find optimization guides on how sitecore process inbound requests. If anyone has a good resource, I’d love to dig in.

Do you have a test result to share? Long TTFB for resources later in the page is normal for HTTP/2. All of the requests are sent immediately and then the server sends the responses back in priority order so they should be staggered one after the other.

As @pmeenan suggested…

Post the URL you’re testing + likely someone can assist you.

Hey @pmeenan

Thanks for jumping in. My org is a bit security cagey, so anonymizing these for the public side (happy to share direct link in PM).

If security is an issue, best you hire someone to assist you… as there’s very little anyone can do without begin able to dig into various asset data + see if anything odd stands out.

From what little data you provided.

  1. Your HTML component serves relatively fast.

  2. Your HTTP2 looks correctly configured.

  3. Time to serve images looks highly suspect, unless they’re massive images.

If your images are small, then my guess is one of these…

  • Your images are massive. Compress them aggressively.

  • Your actual Webserver is running some sort of throttling module. Most of these fail abysmally, so disabling this might help.

  • Your actual connection is being throttled via iptables or similar. Remove this.

  • Your TCP/IP stack requires tuning for your work load. Simple approach to this is use a recent Linux Kernel 4.15 or better.

  • Your connection is saturating. Linux provides many tools to test this + highly unlikely this is occurring, unless this site runs on a machine with many other sites.

Best to hire someone to assist you.

@dfavor

Thanks for the thorough run-through as there are definitely avenues there we haven’t run down yet. We’ve had a few consultants come through, but no one who seems able to really parse out the issue here. Images are compressed, and we’re under 100 KB on all of them (max 89kb). I thought that was pretty solid, but maybe my yardstick is off.

Thanks for the tips! this was super useful.

The waterfall looks correct for HTTP/2 and that is the cause for what looks like long TTFB (except for the base page which is a bit slow at 1.2s). If you look at the dark bars they pretty much stagger and immediately follow each other. If you look at the bandwidth graph below the waterfall you will probably see that it is bandwidth-constrained (only way to go faster is to send less data or re-arrange it).

One test I’ve been running lately, which is why I listed the oddball items I listed, is dropping a file of similar weight onto a well tuned server + testing time required to serve the object.

For example, I picked a random 120k file off one of my well tuned sites.

Shows roughly 100ms to serve a 120k size file.

Since you say your images are 100K + taking 1.5-2 seconds, then the suggestions I made above do relate… somehow…

What’s required to debug + fix this will be root ssh into the machine to do an audit of your LAMP Stack + simply implement better tuning. Likely take a few hours of log analysis + running synthetic traffic to find + fix problem point(s).

Any decent Server Savant can accomplish this.

If you’re using non-LAMP tech… then fixing problems like this can take a very long time + a very large budget.

@dfavor

Unfortunately, I believe Sitecore is an ASP.NET platform so probably not the easy solve. I think we can probably get the budget to dig into this, but I don’t think our primary contractor would be able to handle it (as they’re who set it all up in the first place).

Really appreciated the tips and help understanding this from everyone.

ASP… Shudder…

Sigh… Likely you won’t like hearing this + others will surely disagree…

My conversations with clients over the years with ASP performance problems could never come to any true resolution… until…

The dozed under ASP + move to using a plain vanilla LAMP Stack.

A move like this may be simple or complex, depending on what ASP features you’re really using.