Blog Objective

This is a blog that attempts to make life easier by noting down the author's accrued knowledge and experiences.
The author has dealt with several IT projects (in Java EE and .NET) and is a specialist in system development.

29 February 2012

Best Practices for HTTP Servers

How can web servers be optimised to perform faster? Can resources be cached, compressed, etc.
There are a number of HTTP headers that one needs to understand in order to deal with performance optimisations.
  1. Last-Modified – Origin server indicates when the resource was believed to have been last modified. Given by both date and time.
  2. Etag – Entity tag or unique ID for each version of a resource which is usually comprised of the file-location, file-size and last modified date
  3. Expires – Origin server indicates to the browser when the resource becomes stale or how long to keep in cache. Applicable to HTTP/1.0; apparently  deprecated in HTTP/1.1
  4. Cache-control – Origin server indicates to the browser and intermediaries whether or not to cache the resource and if to cache, for how long (cache-control:max-age). Applicable to HTTP/1.1
Notice the similarity in some definitions.
In practice, Last-Modified and Etag are similar validators and can be grouped together. Expires and cache-control can similarly be grouped together.
Expires and cache-control would indicate to the browser that a cached resource is still valid and therefore, does not need to be fetched.
Last-Modified and Etag however, will require the browser to attempt to validate with the server every-time. If the origin server ascertains that the resource has not changed, status 304 is returned (indicating that the cache is still valid). The actual resource would be returned otherwise.
It is a good practice to use both groups where appropriate:
  1. For content that seldom changes (e.g. CSS, images) – use Expires and Cache-control
  2. For others (e.g. javascript files, HTML, dynamic pages) – use Last-Modified and Etag
Note that use of Etag in a web farm environment can be tricky as the Etag computed value may be distinct for different machines in the web farm.
Apache allows the Etag value to be computed using only last modified date and size without the inode value.
IIS 6 allows administrators to set the Etag_change_number – e.g. to 0 – to synchronise the servers in a web farm (incidentally, IIS 7 already has 0 as the immutable default value).

More information regarding Cache-control can be found here

No comments: