HTTP caching is a key part of what makes the web usable, and draft standards like HTTPbis add further refinements to the existing HTTP/1.1 caching features. At WebPerfDays 2012, Mark Nottingham (@mnot) and Josh Bixby (@joshuabixby) gave some useful tips on how we can tune our web applications to take advantage of the existing and forthcoming HTTP cache features.
WebPerfDays is an excellent one-day unconference organised by the London Web Performance meetup group timed in 2012 to follow Velocity EU in London and attract much of the same crowd. I was at WebPerfDays to share some experiences with Continuous Delivery at thetrainline.com, and in particular to hear what Mark Nottingham and Josh Bixby had to say on HTTP caching, a subject which seems to baffle or bore many software development teams, leading to suboptimal no-cacheable web applications.
Changes to HTTP caching in HTTPbis
Few people understand HTTP caching better than Mark Nottingham, who works for the
Yahoo Akamai web engineering team, is a maintainer on Squid cache, and has been writing on HTTP caching for many years. Mark gave a talk on the forthcoming proposed changes to HTTP caching as part of the new draft HTTPbis standards.
The most important change (more of a clarification, actually) is one which should intrigue or worry most web developers:
[with HTTPbis,] HTTP caches will be free to cache most HTTP 200 responses
Let’s think about that. At present, due to vagueness in the HTTP/1.1 spec, HTTP caches will typically ignore HTTP 200 responses which do not have cache control headers set, which effectively means that web sites are non-cacheable by default. If I read this correctly, the new standards will turn that the situation on its head; HTTPbis clarifies that websites are cacheable by default. The implications for web applications are large, and mean that web application development teams need to start understanding and coding for HTTP caching behaviours.
So, how do we persuade budget-holders to sponsor the additional work needed to address HTTP caching? The good news is that
almost all HTTP 1.0 libraries understand max-age (even though it’s not part of the HTTP/1.0 spec)
which means that we can start off by using the max-age cache control setting in order to start expiring responses without much effort, and expect almost every client device to respond correctly to the max-age setting. We can use tools like REDbot (
co-authored by Mark) to help diagnose HTTP caching issues (e.g. http://redbot.org/?uri=http%3A%2F%2Fblog.matthewskelton.net%2F for my blog); you can add request headers to test cache and other settings:
One thing I hadn’t realised before was that “Varnish is not a [conforming] HTTP cache – Varnish has its own rules for caching HTTP content”. So although Varnish can be useful in some scenarios, it is not the same kind of component as Squid, Nginx, or even a CDN. Mark also advised against using must-revalidate for HTTP caching, and predicted that Content Aware Networking would be a future topic for WebPerfDays.
CDNs are not (necessarily) the answer
Josh gave a keynote session on web performance [slides], and two points he mentioned struck me as interesting:
There is no strong correlation between use of CDNs [Content Delivery Networks] and site performance; people are misusing CDNs.
Judging by the plot on slide 15, the strongest correlation is a general correlation between use of CDNs (shown in blue diamonds) and a high (numerically low) Alexa Retail Rank, a proxy indicator of how much money the organisation has to spend on, err, CDNs.(Interestingly, Steve Souders (@souders), also at WebPerfDays, noted that the aspect of a webpage which correlates best with page load time is the number of DOM elements, which implies that simplifying page structure is the best route (in 2012/2013) to faster pages.)
There is a significant number of retailers who do not use CDNs (red crosses) but whose page load times are still decent; likewise, there are many retailers who do use CDNs whose page load times are quite a lot worse than their non-CDN-using counterparts.
Given the simplicity and benefits of setting up and using a reverse caching HTTP proxy in the data centre, I am always surprised by how quickly folks jump to “CDN!!” as the solution to poor page load times, especially for websites whose user base is primarily in a single geography (such as the UK).
The second surprise from Josh was that:
One out of four top websites do not use cache control headers
It turns out that many organisations have been burnt by bad HTTP cache setups (see slide 22).
It seems a bit crazy in 2012-2013 that HTTP cache control headers are still causing so much confusion and concern. My sense is that this is because in many software development shops HTTP caching is seen as a “production” or “ops” problem, and the development team does not get involved. However, some simple changes to a web application are often all that is needed in order to take advantage of HTTP cache control [more on this from me in a future post].
(Josh helpfully noted down all the unconference post-it note questions from WebPerfDays on his blog!)
Special thanks to Stephen Thair (@TheOpsMgr) and all the London Web Performance meetup crowd for such a great tech day; I’m looking forward to WebPerfDays 2013!
- 2013-02-04: Minor corrections to HTTPbis details, @mnot’s biog, and REDbot (thanks, Mark!)