Etag/Last Modified - the numpty question I've always wanted to know.

So, using Google-speak, cache-control: max-age/s-maxage and expires headers are “strong caching headers” while etags and last-modified headers are “weak”.

Google recommend only one of each kind - which make perfect sense, but they also recommend one weak and one strong.

2 question chaps - and I’ve only just plucked up the courage on this one as I’m sure I’m asking a bonehead question:

  1. Which strong and which weak one do we choose? (Google recommend expires over cache-control: max-age, but they are vague on the reasons.)

  2. Why do we need a weak cache header at all? The behaviour I think I get (but I’d love to know others’ observations) is endless conditional GETs each of which gets a 304; where what I think I’d like is just a single new unconditional GET and then no traffic for as long as the strong header mandates.

Be gentle chaps :slight_smile:

N

I do not know why but my firefox ignores the expires-header. When i visit my website, i always find some 304 in my logfile. Possibly the reason you should use a strong and a weak is, that there is no rule a browser must use the local cache as the header says it.

So the 2nd rule is just a fall back to get the speed up and the transferred data low.

Do you have a reference for the Google recommendation? All I could find was the Page Speed optimization docs: http://code.google.com/speed/page-speed/docs/caching.html#LeverageBrowserCaching and they don’t seem to say anything specific about having both strong and weak headers.

Pat, this sentence is on that page:
“It is important to specify one of Expires or Cache-Control max-age, and one of Last-Modified or ETag, for all cacheable resources.”

So yeah, send 1 strong one (Expires or CC) and one weak one (Last-Modified/ETag).
Imo, this is a good best practice.

The weak one will only kick in if the Expires has … expired. The browser will then have a file in cache but is not sure if it may use it. So it sends a conditional request to the server, asking the server “hey dude, may I use this file or not?” and the server will respond with either a 304 “yeah man, go ahead” or a 200 “nope, here is a new one, use that and throw away the old”.

Thanks for going easy chaps. The thing that I think I don’t like is that I can’t work out whether the 304 “carry on” response itself persists. Is it just a “yeah man, carry on this time”, or a “yeah man, carry on and BTW there’s no need to bug my ass again for another n seconds”?

I suppose what I should really do is get off my backside and wireshark this stuff across browsers with various combinations of header :-D. (What I see in Firebug really confuses me - as jabubo finds as well, I think. I see conditional Gets before CC expiry, but only on every second request. Maybe that’s what Google mean with the statments “the browser applies a heuristic to determine whether to fetch the item from cache or not. (The heuristics are different among different browsers.)”).

BTW Pat, you recommend against Etags: ‘ETag headers should generally not be used unless you have an explicit reason to need them’. Can you elucidate?

Thanks again everyone.

Neil

The E-tag recommendation comes from here: http://developer.yahoo.com/blogs/ydn/posts/2007/07/high_performanc_11/

Basically, if you don’t need the dynamic capabilities that e-tags give you then you’re better off just sticking with a last-modified. It’s not a rule that I feel particularly strongly about (along with the cookies check) which is why they aren’t in the grades at the top of the page but once you’ve done everything else they’re worth considering.

Thanks for the quick reply, Pat.

“Even if your components have a far future Expires header, a conditional GET request is still made whenever the user hits Reload or Refresh.” Hmmm… interesting.

Should mention my other reason for being interested in this. On my site, a lot of far future expires resources are quick to produce server side and also are within the standard 2 packet window size - so returning bytes is not going to affect latency much over and above an empty 304.

Now what I really want to know, is what having CDN inbetween adds to the mix. Lol.

Ok, question to see if I understand correctly: if I only use Expires header, and I do not use a Last Modified header also, in some cases things can go wrong? Eg. some clients cannot react properly? Is there an example use case for that? And does this apply to all different kind of files?

It’s actually a little more fringe than that. You need to be running a site that has multiple servers serving the same files and those servers would have to be using an Apache config with the default etag behavior (that includes machine-specific information in the hash or at least used to).

In this worst-case scenario, it’s possible that when the conditional e-tag check is made, instead of a “not modified”, the server will send the full file.

It’s such a fringe condition that it’s not really worth investing any time into and I’ll be removing it from the checks to eliminate confusion (or premature optimization).

Thanks,

-Pat

We have a big site with multiple site servers. We disabled Etag everywhere, exactly because of those problems :-). We now use far future expires headers but I wonder if I should add Last Modified headers.

No one an idea?

I’ve always heard (askapache.com) that for static content (JS/CSS), it’s best to set far futures headers by setting both cache-control and expires headers. And then unsetting both etags and last modified.

Anybody have any thoughts on this?

The idea is that it is supposed to prevent any request at all for the file if the file is already in the browser cache.

This is exactly what we do, also for images and binaries (jpg, gif, pdf, odf, doc). But I don’t have any info about if this is a better strategy than 1 ‘strong’ (eg Expires) and 1 ‘weak’ (eg Last Modified)… :-(.

Cache-Control is a “strong” caching header, in the sense that it’s unconditional. When you instruct caches via the cache-control header, they are allowed to cache the content for the indicated time without checking back with you. Expires acts in the same way.

Cache-Control is a HTTP 1.1 header, and as such requires a HTTP 1.1 compliant cache. Expires is a HTTP 1.0 header.

Personally, I don’t serve both HTTP 1.1 and 1.0 caching headers anymore, I exclusively use the HTTP 1.1 Cache-Control.

ETags and Last-Modified are not “weak” caching headers, they are conditional download / content headers. ETag is a header used for conditional downloads, with a clear meaning and pretty consistent implementations. Last-Modified is, well, the timestamp for when the content was last modified. But since it is timestamp-based, and quite old, you can find some inconsistent behavior when Last-Modified is used as the sole criteria for conditional downloading.

If you want to manage caching, and have something that is simple to reason about, then there are IMHO 2 good solutions:
[list]
[]Serve content with the HTTP 1.1 Cache-Control header as the only ‘caching’ header.
[
]Serve content with both Cache-Control and Expires, with both headers indicating the exact same expiry time.
[/list]
In both cases, removing ETag and Last-Modified just makes things a little simpler to reason about, especially if ancient and weird proxy caches are involved.

I base my suggestion of not using 304’s in part on the following quote:
“Cache Updates:
Caches are required to be updated by the headers in 304 responses,[…]
In practice, updates were spotty; a lot of the time, the test suite couldn’t get the cache into a state where it could tell, but when it could, there were failures. As a result, it’s probably not a good idea to rely on 304 responses or HEAD requests to update headers; better to just send a 200 back with a whole new representation.”

Mark Nottingham has a really good write-up on HTTP mechanisms for controlling caches here:
http://www.mnot.net/cache_docs/

Thanks for the great info. I now get the ‘unconditional’ and ‘conditional’ bit :-).

Some check questions:

  • Etag is difficult to use with a multi-server config, right?
  • is the setting Cache-control: public important?
  • we use the Cache-control Max-age as an absolute value (600 seconds) instead of a relative value, next to a relative Expire setting. Can that be a problem? Should I fix that? It works well as far as I can see…

Well, not really. You would just have to change the standard configuration to drop the INode part of the Etag. But if you have your caching headers configured properly, I do not see a reason, why you should use Etags at all.

This header serves two purposes:
a) It allows a shared cache to store the object as well. As an example, AOL users coming to your site are using a shared cache, when using the built-in browser. So if AOL user A has visited your site, AOL caches are storing the object. If then AOL user B is visiting your site, the object would be served from the shared AOL cache, not from your site.
This might become a grey area, if it is “cookied” content. Some shared caches might decide not to store the asset if it is cookied. Others will.

b) It allows Firefox when using an SSL connection to store the object in the disk cache. If the header is missing, it would be stored in memory cache and purged, when the user closes Firefox.

Shouldn’t be a problem. Especially, if I recall correctly, as according to the RFC, Cache-Control beats Expires. So if you have both, and the browser is talking HTTP1.1, the browser would simply ignore the Expires Header. If it is HTTP1.0, it would ignore the Cache-Control Header.
But 600 seconds seems rather short. Might be a good idea to start versioning your assets and have a much larger TTL like 1 year.
See also here:
Souders on Versioning

Hope that helps a bit!

Kind regards,
Markus

Markus, thanks. We only use 600 seconds for the .htm files. For all assets we use an expire of 1 year, and we use versioning for new ones.

Hi,

  • Use Cache-Control with high values (far future) on static assets; do versioning in the file name (e.g. all-20110204.css)
  • No need to then also send an Expires, but it you need/must, give it the same ‘value’ as the Cache-Control
  • Cache-Control beats Expires
  • Cache-Control:Public is important, for reasons @Jesper_Mortensen outlines
  • Use Last-Modified (not ETags) to enable conditional requests.
    If you do the far future Cache-Control thing right, you will see very few to no conditional requests, because people have the file either in cache (and then use from cache) or not (200 response).
    But why not, for that very small chance, send the Last-Modified?

Ok, I think I have all headers covered now… Still have to let our tech-guys add ‘cache-control public’ and maybe ‘last-modified’ for those corner-cases if ‘cache-control max-age’ and ‘expires’ don’t work :-).