Improving page speed: CDN vs Squid/Varnish/nginx/mod_proxy

Too few people understand the benefit of using a caching reverse proxy server to improve web page delivery speeds, and instead go straight to a CDN solution, which can be costly and complex to administer.

Conversations about web page speeds can often go like this:

My website is slow – how can I make it faster?

Just use a CDN!

Although using a CDN can help with page load speeds, a CDN is not automatically the right solution.

Content Delivery Networks

Content Delivery Network (CDN) can be an important tool in achieving decent page load and web application speeds. Providers such as AkamaiEdgeCast, and more recently Amazon (with CloudFront), Rackspace (CloudFiles), Google (PageSpeed) and Microsoft (Azure CDN) are providing the means to distribute your content to locations geographically closer to your customers/users, which improves the responsiveness of the application or website.

End of story, no?

Not quite. Using a CDN can be a bit like using a sledgehammer to crack a nut: overkill. This is especially true if your customers or users are not geographically spread (i.e. most or all users are in a single country or region).

HTTP Caching Reverse Proxy

If most of your users are in the same region, you should consider using an HTTP caching reverse proxy, such as SquidVarnishnginx (with HttpProxy module) or Apache with mod_proxy. My personal favourite is Squid, which I have been using for over 10 years, both as a forward proxy and a reverse proxy, but many people rave about Varnish: which proxy cache you choose depends on the exact caching requirements.

squid-cache

In a typical/simple configuration, the caching reverse proxy sits within your own infrastructure, in front of your web application server. That is, the first server which sees inbound requests is the caching proxy. The proxy talks to other servers behind it when it needs to fulfil a request for content not in its cache, and serves the content back to the user. The types of content cached can be finely controlled, so that (for example) only images, Flash movies and CSS files are cached, and other content is always requested afresh from the application servers behind the proxy.

Why Use a Reverse Proxy?

The result of using a well-configured caching reverse proxy is usually a huge speed-up in page load times for end-users. This is due to several factors:

  1. The cache can typically serve assets more rapidly than an application server, as the workload on the application or web server is more mixed than on the cache server.
  2. Off-loading the serving of common, seldom-changing assets to the cache server frees up the web/application server to handle more “useful” requests, which in turn makes for speedier page response times. The application/web server can render the page more quickly, as it is not spending time serving content.
  3. The web/application servers receive fewer inbound connections, so they are more responsive in general, spending less time thread context switching.

Properties of a CDN

A CDN typically has the following properties:

  • A set of “edge” servers which are located in various distinct geographic locations
  • Suitable for slowly-changing content, because content propagation times are relatively high (hours)
  • Owned by a third party
  • Usually combined with custom DNS solutions (with low DNS TTL values) to effect the geo-direction
  • Disconnected (by design) from the web application
  • Typically serve “static” content such as images, Flash, video, etc.
  • Cannot effectively cache dynamically-generated content
  • URLs or applications often need to be modified to work with the CDN
Update: Recent developments in the CDN space (with Akamai and Amazon CloudFront, for example), have added support for custom origins, allowing them to cache dynamic content. However, most uses of CDNs still fall into the “edge cache” model described above.

Properties of Caching Reverse Proxy

A caching reverse proxy server typically has these properties:
  • Local (close) to the web application, usually in the same DC
  • Reduces the load on web/application servers for cacheable content
  • Can cache many kinds of content, including dynamically-generated content
  • Full control of cache flushes is with you
  • The web application is ‘unaware’ of the caching taking place and does not need to be  modified for the benefits of reverse proxy caching to be had

Which Is Right For Me: CDN or Caching Reverse Proxy?

In some respects, this is a false dichotomy. There is no reason why you cannot combine a reverse proxy with a CDN, because they solve different problems. In a nutshell:
  • A CDN locates static content geographically close to end-users to avoid transmission delay
  • A caching reverse proxy reduces load on web/application servers and avoids unnecessary trips to a database or other content store for frequently-accessed content
(As noted above, some CDNs now offer custom origin support, allowing them to cache dynamic content. You’d still likely want a caching reverse proxy in front of the true origin, however, to reduce load on those servers.)

So, if your users are geographically spread, use a CDN. If you need to reduce load on web or application servers for common content, use a caching reverse proxy. If you need to address both issues, use both a CDN *and* a caching reverse proxy.

6 thoughts on “Improving page speed: CDN vs Squid/Varnish/nginx/mod_proxy

Join the discussion...

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.