To ETag or not to ETag

(thread migrated from sourceforge forum - originally posted 6/25/2008)

This one’s been bothering me for a while.

ETags exist for a reason. It’s part of the standard HTTP protocol ( [url=http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.19]HTTP/1.1: Header Field Definitions ) and is designed to help optimize cache utilization.

Cacheing is generally good, so why is ETag bad?

It can only mean that either:

  1. the spec is broken
  2. its implementation is broken

Now, I have seem some discussion that says it’s easy to break (server-side), so the same resource may have different ETag values which could lead to a cache miss, but what if you’re a web guy who knows what he’s doing and has consistent ETags?

Why should my sites get penalized for using correctly-implemented W3C standards?

We have the recommendations in order of importance (left to right in the checklist, top to bottom in the report). At least based on sites we’re familiar with and ETags is the lowest. It’s really only there because it’s fairly easy to mess up and there are specific use cases where they are needed (dynamic content where you hash the response for example). Outside of the specific use cases where they add value there’s really no benefit beyond what you get from last-modified.

As with just about all of the other recommendations, there are usually specific use cases where it makes sense to not follow the recommendations. Take http://www.google.com for example. We usually frown pretty strongly on serving static content from a domain that has cookies on it but since there are only 2 requests to display the page (plus some pre-caching onLoad) the savings from not having to do another DNS lookup and socket connect more than make up for it.

ETags are very useful for checking if the browser cache is valid. By default, Apache creates the ETag out of the file size, the inode and the modified date of the file. It’s the inode that can cause trouble, but that doesn’t kick in until your website is spread out over multiple servers. (The inode is specific to the file system that the file is stored on, so the same file with have a different ETag on each server, leading to cache confusion)

So if you’re using multiple servers, either disable ETags or do not use the inode to create them. Until then, go ahead and use them, they do help.

The question is “do they help more than last-modified”? For the cases where you are actually doing something interesting with the E-Tag (hashing some data that isn’t tracked by modification date for example) I could see where it would be useful but if it’s essentially replicating the same behavior as last-modified, why bother?

Pat,
Following blog have an answer for you. see in bold.

Detail as follows:

Configure ETags

tag: server

Entity tags (ETags) are a mechanism that web servers and browsers use to determine whether the component in the browser’s cache matches the one on the origin server. (An “entity” is another word a “component”: images, scripts, stylesheets, etc.) ETags were added to provide a mechanism for validating entities that is more flexible than the last-modified date. An ETag is a string that uniquely identifies a specific version of a component. The only format constraints are that the string be quoted. The origin server specifies the component’s ETag using the ETag response header.

  HTTP/1.1 200 OK
  Last-Modified: Tue, 12 Dec 2006 03:03:59 GMT
  ETag: "10c24bc-4ab-457e1c1f"
  Content-Length: 12195

Later, if the browser has to validate a component, it uses the If-None-Match header to pass the ETag back to the origin server. If the ETags match, a 304 status code is returned reducing the response by 12195 bytes for this example.

  GET /i/yahoo.gif HTTP/1.1
  Host: us.yimg.com
  If-Modified-Since: Tue, 12 Dec 2006 03:03:59 GMT
  If-None-Match: "10c24bc-4ab-457e1c1f"
  HTTP/1.1 304 Not Modified

The problem with ETags is that they typically are constructed using attributes that make them unique to a specific server hosting a site. ETags won’t match when a browser gets the original component from one server and later tries to validate that component on a different server, a situation that is all too common on Web sites that use a cluster of servers to handle requests. By default, both Apache and IIS embed data in the ETag that dramatically reduces the odds of the validity test succeeding on web sites with multiple servers.

The ETag format for Apache 1.3 and 2.x is inode-size-timestamp. Although a given file may reside in the same directory across multiple servers, and have the same file size, permissions, timestamp, etc., its inode is different from one server to the next.

IIS 5.0 and 6.0 have a similar issue with ETags. The format for ETags on IIS is Filetimestamp:ChangeNumber. A ChangeNumber is a counter used to track configuration changes to IIS. It’s unlikely that the ChangeNumber is the same across all IIS servers behind a web site.

The end result is ETags generated by Apache and IIS for the exact same component won’t match from one server to another. If the ETags don’t match, the user doesn’t receive the small, fast 304 response that ETags were designed for; instead, they’ll get a normal 200 response along with all the data for the component. If you host your web site on just one server, this isn’t a problem. But if you have multiple servers hosting your web site, and you’re using Apache or IIS with the default ETag configuration, your users are getting slower pages, your servers have a higher load, you’re consuming greater bandwidth, and proxies aren’t caching your content efficiently. Even if your components have a far future Expires header, a conditional GET request is still made whenever the user hits Reload or Refresh.

If you’re not taking advantage of the flexible validation model that ETags provide, it’s better to just remove the ETag altogether. The Last-Modified header validates based on the component’s timestamp. And removing the ETag reduces the size of the HTTP headers in both the response and subsequent requests. This Microsoft Support article describes how to remove ETags. In Apache, this is done by simply adding the following line to your Apache configuration file:

  FileETag none

Yep, sorry, the question didn’t originate from me, I was transplanting some posts from the forums on sourceforge so they didn’t get lost :-). Thanks for bringing the detail over though.

-Pat