Has HTTPS finally been cracked? Five researchers deal SSL/TLS a biggish blow…

Cryptographers have once again put SSL/TLS (that’s the padlock in HTTPS) in their gunsights and opened fire.

This time, they’ve done some severe damage.

The attack they’ve devised doesn’t work against all possible ways that TLS can be used; it requires you to capture somewhere between millions and billions of connections that all contain the same plaintext; and it only works well for the first 200 bytes or so of the transmitted data.

Nevertheless, it reveals a deep-rooted problem in using the RC4 encryption algorithm to secure your TLS traffic.

“Wait a moment,” I hear you saying. “RC4 is a symmetric cipher, meaning that it uses the same key to encrypt and decrypt. TLS relies on public key cryptography, based on public/private key pairs. So how can RC4 affect TLS?”

The answer is that public key encryption is much too slow for scrambling all your network traffic, so TLS uses a hybrid approach.

You use your public/private key pair only when setting up a TLS connection, as a secure way to negotiate a random session key you can use with a symmetric cipher.

Once both ends of the connection have secretly agreed on a secret key, the actual data you want to exchange over TLS is conventionally encrypted using a regular, symmetric cipher.

There are many ciphers to choose from: OpenSSL, for example, supports AES, Blowfish, DES, Triple-DES, RC4 and many more.

Wait a moment,” I hear you saying. “RC4 has known flaws sufficiently serious that they blew apart the WiFi encryption system known as WEP. So how can RC4 still be around for securing web traffic?”

The answer is that RC4 shouldn’t be around.

Experts have recommended avoiding it completely, at least for any newly-written applications, for several years.

But replacing or banning RC4 in existing cryptographic implementations is a much trickier problem.

Indeed, according to the authors of of this latest research, RC4 is the cipher chosen for about half of all TLS traffic.

So it’s the part of TLS they decided to attack.

The researchers also decided not to give their attack a groovy name like BEAST, or Lucky Thirteen, claiming that “naming one’s attacks after obscure Neil Young albums is now considered passé.”

Instead, the paper they’re working on (the full details aren’t out yet, as the researchers are still working with vendors on countermeasures) is known as AlFardan-Bernstein-Paterson-Poettering-Schuldt (AlFBPPS), being the authors’ names in alphabetical order.

RC4 is a stream cipher, so it is basically a keyed cryptographic pseudo-random number generator (PRNG). It emits a stream of cipher bytes that are XORed with your plaintext to produce the encrypted ciphertext.

To decrypt the ciphertext, you initialise RC4 with the same key, and XOR the ciphertext with the same stream of cipher bytes. XORing twice with the same value “cancels out”, because k XOR k = 0, and because p XOR 0 = p.

Stream ciphers are handy for general-purpose network protocols because they can encrypt a single byte at a time, rather than processing only fixed-size multibyte blocks, so input data never needs to be padded.

→ A PRNG can offer high-quality randomness without being cryptographic. Mersenne Twister, for instance, produces excellent random numbers from a starting key, known as a “seed”. But if you know any 64 successive outputs of the algorithm for any given seed, you can reconstruct the internal state of the PRNG at that point and predict all future outputs, without ever knowing the seed. A cryptographic PRNG sequence can only be reconstructed if you know the starting key.

The problem is that although RC4 is a cryptographic PRNG, it’s not a very high-quality one.

For more than a decade, we’ve known that it produces statistically anomalous output, at least early on in each stream of cipher bytes.

In 2001, Israeli cryptographers Itsik Mantin and Adi Shamir published a seminal paper entitled “A practical attack on RC4“.

(Adi Shamir is the S in RSA; the R in RC4 is Ron Rivest, who’s the R in RSA.)

Their paper is brief, but more than enough to undermine RC4’s claim to randomness.

In particular, Mantin and Shamir examined the second output byte produced in any RC4 cipher stream, and found that the value zero turned up twice as often as it should:

You should see a zero as the second RC4 output once for every 256 keys on average; Mantin and Shamir showed that you would see it with a probability of 1/128.

This result, incidentally, was the basis of the attack that broke WEP, the original encryption protocol used in Wi-Fi networking, and forced its replacement with a newer encryption system called WPA.

AlFBPPS went much further than anyone else had done with RC4.

They produced statistical tables for the probability of every output byte (0..255) for each of the first 256 output positions in an RC4 cipher stream, for a total of 65535 (256×256) measurements.

By using a sufficiently large sample size of differently-keyed RC4 streams, they achieved results with sufficient precision to determine that almost every possible output was biased in some way.

The probability tables for a few of the output positions (which are numbered from 1 to 256) are show below.

(In a truly random distribution, each probability would be 1/256. The numbers here are multiplied by 256, so that each value ought to be 1, and the lines in the graphs should be perfectly horizontal at Y=1. Given a large enough sample size, any deviation from 1 reveals a statistically-exploitable anomaly in RC4.)

The authors realised that if you could produce TLS connections over and over again that contained the the same data at a known offset inside the first 256 bytes (for example an HTTP request with a session cookie at the start of the headers), you could use their probability tables to guess the cipher stream bytes for those offsets.

As Dan Bernstein very concisely put it at the recent Fast Software Encryption 2013 conference:

Force target cookie into many RC4 sessions. Use RC4 biases to find cookie from ciphertexts.

Here’s how it works.

Imagine that you know that the 48th plaintext byte, P48, is always the same, but not what it is.

You provoke millions of TLS connections containing that fixed-but-unknown P48; in each connection, which will be using a randomly-chosen session key, P48 will end up encrypted with a pseudo-random cipher byte, K48, to give a pseudo-random ciphertext byte, C48.

And you sniff the network traffic so you capture millions of different samples of C48.

Now imagine that one value for C48 shows up more than 1% (1.01 times) more frequently than it ought to. We’ll refer to this skewed value of C48 as C’.

From the probability table for K48 above, you would guess that the cipher byte used for encrypting P to produce C’ must have been 208 (0xD0), since K48 takes the value 208 more than 1% too often.

In other words, C’ must be P XOR 208, so that P must be C' XOR 208, and you have recovered the 48th byte of plaintext.

The guesswork gets a little harder for cipher stream offsets where the skew in frequency distribution is less significant, but it’s still possible, given sufficiently many captured TLS sessions.

AlFBPPS measured how accurate their plaintext guesses were for varying numbers of TLS sessions, and the results were worrying, if not actually scary:

However, given the huge number of TLS sessions required, The Register’s provocative URL theregister.co.uk/tls_broken might be going a bit far.

Initiating 232 (4 billion), or even 228 (260 million), TLS sessions, and then sniffing and post-processing the results to extract a session cookie is unlikely to be a practicable attack any time soon.

If nothing else, the validity of the session cookie might reasonably be expected to be shorter than the time taken to provoke hundreds of millions of redundant TLS connections.

On the other hand, the advice to avoid RC4 altogether because of its not-so-random PRNG can’t be written off as needlessly conservative.

If you can, ditch RC4 from the set of symmetric ciphers your web browser is willing to use, and your web servers to accept.

Go for AES-GCM instead.

GCM, or Galois/Counter Mode, is a comparatively new way of using block ciphers that gives you encryption and authentication all in one, which not only avoids the risky RC4 cipher, but neatly bypasses the problems exposed in the Lucky 13 attack, too.

Easy for me to say, to be sure, but dropping old ciphers, especially those with known problems, is always the best plan.

PS. If you run a website and you have already dropped TLS-RC4 support, please leave us a comment below to say whether any of your visitors were inconvenienced as a result. Did anyone complain? Did it cost you any transactions?