Speed up Web Applications with SSL Offloading

Web sites and web applications are increasingly using secure connections (HTTPS) for all traffic not just obviously sensitive data, as a way to guard against security threats. However, HTTPS requires encryption/decryption of data, which is computationally intensive. Web applications can therefore benefit from “offloading” the encryption/decryption processing required for HTTPS to specialised hardware devices.

Secure Connections with SSL/TLS

The HTTPS scheme uses SSL or TLS to “wrap” a secure, encrypted channel around the HTTP connection between browser and server. The abbreviations SSL (Secure Sockets Layer) and TLS (Transport Layer Security) are often used interchangeably; TLS is in effect the more recent and secure version of SSL. In the words of RFC 5246 (the specification of TLS 1.2),

The protocol allows client/server applications to communicate in a way that is designed to prevent eavesdropping, tampering, or message forgery

The key thing here is that HTTPS is a point-to-point communication; the security exists only between the two directly-communicating endpoints. This is unlike other secure messaging schemes such as WS-Security, which can provide end-to-end security (with non-trusted intermediaries).

264394467_ab3800c85c

Encryption using Public Key Algorithms

HTTPS makes use of the SSL/TLS standards for establishing a secure session, which ensures that the data sent between two computers is not readable or modifiable by intervening parties. SSL/TLS in turn rely on Public Key algorithms (not to be confused with PKI), in which each end of the communication channel provides the other with a public key to use when encrypting information to be sent to the other end:

a public key algorithm does not require a secure initial exchange of one, or more, secret keys between the sender and receiver. [Wikipedia]

Only the holder of the private key can decrypt the message encrypted with the public key, so the two-way communication is thus secure.

The Compute Cost of SSL Crypto

Although there is some debate about exactly how expensive HTTPS crypto is (F5 and Adam Langley of Google showing two opposing views), crypto is clearly not a cost-free operation, and takes a significant number of CPU cycles.

Increasingly exploits (such as ‘BEAST’) for older and weaker versions of SSL/TLS and faster commodity hardware means that a DES 56-bit key can be discovered within a week, mandating increasingly longer encryption keys (NIST recommends 2048-bit keys).

Whilst there are efforts such as Googles Overclocking SSL aimed at optimising SSL performance, the use of increasing key length to counter threats commits a greater number of CPU cycles to the HTTPS round trip.

SSL Offload Basics

If the HTTPS connection is terminated at the web application server, because HTTPS affords point-to-point security only, that server must decrypt the HTTPS payload before responding to the request, as shown here:

ssl-offload-1

This means that while the payload is being decrypted, the web application server execution thread is unavailable for serving other web requests, potentially leading to thread pool exhaustion or at least longer response times.

The solution is to “offload” this processing effort to a dedicated piece of hardware (or software), either on the web server or (more effectively) in front of the web tier, as shown here:

ssl-offload-2

The result is higher throughput from the web application servers, particularly if those servers are running on ARM/RISC hardware (such as the new HP ProLiant servers), and therefore less optimised than some for crypto processing.

Because the HTTPS tunnel has been terminated at the load balancer layer, the web application servers receive “plain” HTTP on port 80. In order to convey to the web servers that the original connection had been encrypted (say, for a login page), an additional HTTP header can be inserted by the load-balancer device, such as:

 X-Secure-Connection:  true

The web application can then distinguish secure and non-secure connections, even though it never sees HTTPS traffic directly.

SSL Offload on the Server

There are broadly five ways to offload SSL on the server:

Software such as stunnel (http://www.stunnel.org/), Stingray Traffic Manager, and Kemp Virtual LoadMaster
SSL Accelerator add-in cards for servers (using hardware encryption/decryption). There are interesting Open Source projects for SSL chips at opencores.org, and hybrid SSL devices from Freescale. PC Engines’ ALIX boards have on-board AES-128 crypto too.
Dedicated hardware devices (more details below)
Build your own SSL accelerator using Nginx – the original post on o3magazine is no longer around, but the discussion on Slashdot has some interesting background on SSL acceleration, and archive.org has most of the details.
Use CPUs with Intel’s AES-NI instruction set (see below) – works only with a limited set of key sizes.

In practice, most enterprise SSL offload is handled by dedicated hardware devices known as web application accelerators (aka content switches or application delivery controllers, ADCs), such as the Cisco Netscaler, F5 BIG-IP, Kemp LoadMaster, and Blue Coat ProxySG – #3 in the list above.

These devices are examples of multi-layer switches; network devices which operate at several different layers of the OSI model, including layer 6, where SSL/TLS operate. These web ADCs can provide significant functionality and intelligence, for instance by optimizing TCP requests from a client browser, compressing/uncompressing HTTP data, and by offloading the SSL/TLS decryption workload.

Such devices become particularly crucial when HTTPS traffic needs to be terminated then re-encrypted for onward transmission to another server inside the (possibly untrusted) network; financial and military networks typically employ this strategy:

ssl-offload-3

Hardware and Virtualisation for SSL Offloading

The performance of software implementations (#1 above) for high traffic sites is doubtful. Certainly, the best software implementation of SSL encryption/decryption will almost always be substantially slower than the best hardware implementation. Therefore, either add-in cards (difficult to scale/manage), or dedicated hardware are likely to be the choice for large web sites.

The latest hardware SSL accelerators (such as the AX series from A10) offer an additional benefit: virtualisation. By running a special hypervisor atop the crypto-tuned hardware, multiple virtual appliances can be defined, with the benefit of isolating changes from each other, whilst retaining the benefit of direct access to specialist encryption/decryption hardware via the hypervisor.

SSL Offload on the Client

For mobile devices, reducing compute cycles leads to power savings. Therefore, if computationally expensive operations such as encryption/decryption can be performed in hardware on the device, the result will be extended battery life. Some devices now include dedicated crypto hardware support, such as Intel’s AES-NI (Advanced Encryption Standard New Instructions).

This “client-side” support for hardware crypto is still in its infancy, but seems likely to grow as the power-saving and speed advantages become apparent (and HTTPS yet more widely used).

Beyond SSL Offload

Although offloading SSL to dedicated hardware-backed application accelerators, there are limits to the performance improvements achieved if other aspects of the web application are not optimised. This article over at HTTPWatch describing HTTPS performance tuning makes three excellent points to help improve application performance over HTTPS, and I’d add two more:

Use HTTP 1.1 Connection Keep-Alive – this avoids the extra round-trips needed by the browser to set up/tear down the underlying TCP connection on every request. The more recent versions of the TLS specification include optimisations for Keep-Alive (“abbreviated handshake”).
Avoid mixed content warnings – the dreaded “This page contains both secure and non-secure items” dialog warning in older IE browsers. Achieved by aligning the protocols used for content delivery (http/https), or by serving all requests over HTTPS.
Use persistent caching for static content – set the cache-control headers on non-sensitive content to allow browsers to cache this at least in memory, if not disk. Michael Hoisie talks about JavaScript over HTTPS here, and here is advice from Google on caching HTTPS content.
Use HTTPS-aware Content Delivery Networks or proxies – by delivering static content over HTTPS via CDNs or caching proxies, you reduce the workload on your servers and improve page load speed.
Compress HTTPS content – because the cost of encryption/decryption depends on the length of the data, compressing the HTTPS data stream before encryption can help to reduce the compute cost.

The Future – SSL Proxies?

The value of web acceleration appliances was highlighted by the acquisition in December 2011 of Blue Coat Systems for $1.3 billion by a private equity firm. Blue Coat offers an SSL Proxy device, allowing organisations to “power to define, enforce and audit intelligent policy controls over user/application interactions”; in other words, SSL Proxies can transparently “sniff” encrypted HTTPS traffic in real time.

In a future of “HTTPS everywhere”, it seems that HTTPS proxies will become more prevalent. Clearly, optimising the decrypt/encrypt of proxied traffic will be crucial, so dedicated hardware for SSL offload is likely to be with us for some time.

9 thoughts on “Speed up Web Applications with SSL Offloading”

mylesmcdonnell says:

February 17, 2012 at 10:56

Excellent post. Are GPU’s a good candidate for optimised encryption/decryption algorithms?

Matthew Skelton (@matthewpskelton) says:

February 19, 2012 at 18:32

That’s an interesting point, Myles.

If you had a large quantity of crypto calculations (encryption/decryption) to perform, then yes, you could see some low OpEx benefit from using Amazons GPU compute cloud over standard compute instances.

However, I am not sure whether HTTPS offload would be a good fit, unless the network pipe between the offload point and the GPU compute were very rapid; you might see that the time taken to pushing the data via TCP to the GPU compute might be greater than needed just to do the crypto directly on the standard compute instances.

There’d obviously be a sweet spot, though – I wonder if anyone has tried it in the real world. There is some academic interest in SSL Offload using GPUs:
* http://www.gpucomputing.net/?q=node/2702
* http://dl.acm.org/citation.cfm?id=1851250
* http://shader.kaist.edu/sslshader/ – SSLShader – “the GPU implementation of RSA shows a factor of 22.6 to 31.7 improvement over the fastest CPU implementation.”

So we’d expect to see some real-world appliances using GPUs coming to market soon.

Matthew Skelton (@matthewpskelton) says:

February 19, 2012 at 20:35

Reblogged this on Matthew Skelton and commented:
Originally posted at Four Nines (http://fournines.wordpress.com/2011/12/08/speed-up-web-applications-with-ssl-offloading/)

sanaz says:

August 2, 2012 at 12:37

could you explain me more about your paragraphs that mentioned “Such devices become particularly crucial when HTTPS traffic needs to be terminated then re-encrypted for onward transmission to another server inside the (possibly untrusted) network; financial and military networks typically employ this strategy”
i think it’s very dangerous for people who live in those country that government monitor all the activity of all user to find out who is sending the critical information specially on ssl protocol?
for example in Syria

1. Matthew Skelton (@matthewpskelton) says:
  
  August 18, 2012 at 20:16
  
  Hi sanaz,
  
  There are legitimate uses of SSL re-encryption: for example, where an organisation trusts the peers (client/server) in a communications exchange, but not the channel or network across which they communicate. In this scenario, decrypting HTTPS traffic and then re-encrypting for the next ‘hop’ makes sense. This is a decent use of SSL decryption/encryption.
  
  However, the use of transparent SSL proxies within enterprises is potentially open to question, as the enterprise could in theory intercept (say) traffic to an employees personal Gmail account or even internet banking session. Why organisations use transparent SSL proxies is understandable: to ensure data is not being leaked out of the organisation. Concerned employees should undertake personal communications at home in such cases. I’d call this use of SSL proxying neutral, on balance.
  
  As you point out, however, if ISPs or even governments introduce HTTPS proxying as a matter of routine, and use this to intercept and block communications, that is indeed a worrying situation. To counter this, email can be encrypted with PGP, and further security could be achieved with pre-shared keys if needed.
  
  Matthew
  
Guido Leenders says:

July 19, 2013 at 07:37

It took me some time to learn how to do it on Apache Tomcat for servlets. I have put the experiences in the form of a step-by-step guide on http://www.invantive.com/invantive/news/entryid/897/ssl-offloading-for-apache-tomcat

1. Matthew Skelton (@matthewpskelton) says:
  
  July 19, 2013 at 11:29
  
  Thanks for the link, Guido
  
Pingback: Operability can Improve if Developers Write a Draft Run Book | Software Operability
Pingback: SSL Offload Testing with HAProxy and Stunnel | Loadbalancer.org Blog