Cryptographers have once again put SSL/TLS (that’s the padlock in HTTPS) in their gunsights and opened fire.
This time, they’ve done some severe damage.
The attack they’ve devised doesn’t work against all possible ways that TLS can be used; it requires you to capture somewhere between millions and billions of connections that all contain the same plaintext; and it only works well for the first 200 bytes or so of the transmitted data.
Nevertheless, it reveals a deep-rooted problem in using the RC4 encryption algorithm to secure your TLS traffic.
“Wait a moment,” I hear you saying. “RC4 is a symmetric cipher, meaning that it uses the same key to encrypt and decrypt. TLS relies on public key cryptography, based on public/private key pairs. So how can RC4 affect TLS?”
The answer is that public key encryption is much too slow for scrambling all your network traffic, so TLS uses a hybrid approach.
You use your public/private key pair only when setting up a TLS connection, as a secure way to negotiate a random session key you can use with a symmetric cipher.
Once both ends of the connection have secretly agreed on a secret key, the actual data you want to exchange over TLS is conventionally encrypted using a regular, symmetric cipher.
There are many ciphers to choose from: OpenSSL, for example, supports AES, Blowfish, DES, Triple-DES, RC4 and many more.
Wait a moment,” I hear you saying. “RC4 has known flaws sufficiently serious that they blew apart the WiFi encryption system known as WEP. So how can RC4 still be around for securing web traffic?”
The answer is that RC4 shouldn’t be around.
Experts have recommended avoiding it completely, at least for any newly-written applications, for several years.
But replacing or banning RC4 in existing cryptographic implementations is a much trickier problem.
Indeed, according to the authors of of this latest research, RC4 is the cipher chosen for about half of all TLS traffic.
So it’s the part of TLS they decided to attack.
The researchers also decided not to give their attack a groovy name like BEAST, or Lucky Thirteen, claiming that “naming one’s attacks after obscure Neil Young albums is now considered passé.”
Instead, the paper they’re working on (the full details aren’t out yet, as the researchers are still working with vendors on countermeasures) is known as AlFardan-Bernstein-Paterson-Poettering-Schuldt (AlFBPPS), being the authors’ names in alphabetical order.
RC4 is a stream cipher, so it is basically a keyed cryptographic pseudo-random number generator (PRNG). It emits a stream of cipher bytes that are XORed with your plaintext to produce the encrypted ciphertext.
To decrypt the ciphertext, you initialise RC4 with the same key, and XOR the ciphertext with the same stream of cipher bytes. XORing twice with the same value “cancels out”, because k XOR k = 0, and because p XOR 0 = p.
Stream ciphers are handy for general-purpose network protocols because they can encrypt a single byte at a time, rather than processing only fixed-size multibyte blocks, so input data never needs to be padded.
→ A PRNG can offer high-quality randomness without being cryptographic. Mersenne Twister, for instance, produces excellent random numbers from a starting key, known as a “seed”. But if you know any 64 successive outputs of the algorithm for any given seed, you can reconstruct the internal state of the PRNG at that point and predict all future outputs, without ever knowing the seed. A cryptographic PRNG sequence can only be reconstructed if you know the starting key.
The problem is that although RC4 is a cryptographic PRNG, it’s not a very high-quality one.
For more than a decade, we’ve known that it produces statistically anomalous output, at least early on in each stream of cipher bytes.
In 2001, Israeli cryptographers Itsik Mantin and Adi Shamir published a seminal paper entitled “A practical attack on RC4“.
(Adi Shamir is the S in RSA; the R in RC4 is Ron Rivest, who’s the R in RSA.)
Their paper is brief, but more than enough to undermine RC4’s claim to randomness.
In particular, Mantin and Shamir examined the second output byte produced in any RC4 cipher stream, and found that the value zero turned up twice as often as it should:
You should see a zero as the second RC4 output once for every 256 keys on average; Mantin and Shamir showed that you would see it with a probability of 1/128.
This result, incidentally, was the basis of the attack that broke WEP, the original encryption protocol used in Wi-Fi networking, and forced its replacement with a newer encryption system called WPA.
AlFBPPS went much further than anyone else had done with RC4.
They produced statistical tables for the probability of every output byte (0..255) for each of the first 256 output positions in an RC4 cipher stream, for a total of 65535 (256×256) measurements.
By using a sufficiently large sample size of differently-keyed RC4 streams, they achieved results with sufficient precision to determine that almost every possible output was biased in some way.
The probability tables for a few of the output positions (which are numbered from 1 to 256) are show below.
(In a truly random distribution, each probability would be 1/256. The numbers here are multiplied by 256, so that each value ought to be 1, and the lines in the graphs should be perfectly horizontal at Y=1. Given a large enough sample size, any deviation from 1 reveals a statistically-exploitable anomaly in RC4.)
The authors realised that if you could produce TLS connections over and over again that contained the the same data at a known offset inside the first 256 bytes (for example an HTTP request with a session cookie at the start of the headers), you could use their probability tables to guess the cipher stream bytes for those offsets.
As Dan Bernstein very concisely put it at the recent Fast Software Encryption 2013 conference:
Force target cookie into many RC4 sessions. Use RC4 biases to find cookie from ciphertexts.
Here’s how it works.
Imagine that you know that the 48th plaintext byte, P48, is always the same, but not what it is.
You provoke millions of TLS connections containing that fixed-but-unknown P48; in each connection, which will be using a randomly-chosen session key, P48 will end up encrypted with a pseudo-random cipher byte, K48, to give a pseudo-random ciphertext byte, C48.
And you sniff the network traffic so you capture millions of different samples of C48.
Now imagine that one value for C48 shows up more than 1% (1.01 times) more frequently than it ought to. We’ll refer to this skewed value of C48 as C’.
From the probability table for K48 above, you would guess that the cipher byte used for encrypting P to produce C’ must have been 208 (0xD0), since K48 takes the value 208 more than 1% too often.
In other words, C’ must be P XOR 208, so that P must be C' XOR 208, and you have recovered the 48th byte of plaintext.
The guesswork gets a little harder for cipher stream offsets where the skew in frequency distribution is less significant, but it’s still possible, given sufficiently many captured TLS sessions.
AlFBPPS measured how accurate their plaintext guesses were for varying numbers of TLS sessions, and the results were worrying, if not actually scary:
However, given the huge number of TLS sessions required, The Register’s provocative URL theregister.co.uk/tls_broken might be going a bit far.
Initiating 232 (4 billion), or even 228 (260 million), TLS sessions, and then sniffing and post-processing the results to extract a session cookie is unlikely to be a practicable attack any time soon.
If nothing else, the validity of the session cookie might reasonably be expected to be shorter than the time taken to provoke hundreds of millions of redundant TLS connections.
On the other hand, the advice to avoid RC4 altogether because of its not-so-random PRNG can’t be written off as needlessly conservative.
If you can, ditch RC4 from the set of symmetric ciphers your web browser is willing to use, and your web servers to accept.
Go for AES-GCM instead.
GCM, or Galois/Counter Mode, is a comparatively new way of using block ciphers that gives you encryption and authentication all in one, which not only avoids the risky RC4 cipher, but neatly bypasses the problems exposed in the Lucky 13 attack, too.
Easy for me to say, to be sure, but dropping old ciphers, especially those with known problems, is always the best plan.
PS. If you run a website and you have already dropped TLS-RC4 support, please leave us a comment below to say whether any of your visitors were inconvenienced as a result. Did anyone complain? Did it cost you any transactions?
I do wish these guys would speak in plain English. And before you go off on one, not everyone knows what a RC4 is or a lot of the other bloody double dutch words mean. I am by no means thick but bloody hell. Rant over. 🙂
Err, the heart of this attack on TLS is the bias in the RC4 encryption algorithm.
And, to stick up for myself a bit, my first mention of RC4 refers to it as "the RC4 encryption algorithm," later augmented by a note to remind you that it's a symmetric stream cipher, which is info that will come in handy later in the article.
I have changed the text "RC4 encryption algorithm" into a hyperlink to Wikipedia, though I don't doubt many readers would probably go there (or to a search engine) off their own bats if they were stuck…
…but I'm not sure how to write a mildly technical article of this sort without making some assumptions that readers will have a passing familiairty with crypto, or be willing to look up stuff up as they go along.
I'm with you, Paul. There are plenty of non-technical articles on this blog, and this time it's nice to see something with lots of meat. Dumbing it down in any way, shape, or form would detract from the article.
Despite being a technical person myself, I find mathematics and cryptography often overwhelming. I found this post to be very well-written, and very informative.
Paul, your article was NOT overly technical. You did a nice write-up. I left a lengthier comment below.
Tiny error: Fourth paragraph from the end, I think you meant "TLS" rather than "TSL". I got confused for a moment, thinking it had all gone over my head!
Fixed, thanks.
I've only had the two chapters of crypto from a mathematics text, and I was able to follow the text just fine. You explain all of the necessary concepts needed for basic understanding, and drop concept names that can be researched for further understanding.
If you wan’t the dumbed down versions try reading other websites?
Not sure how you would expect Paul to explain how the encryption was vulnerable if you don’t understand the encryption itself.
If he tried to just talk in “plain English” then his article would be the kind of thing that you read in The Sun rather than a useful explanation of the issue.
Most of Sophos’ articles are non-technical. I prefer their technical articles. So they cater for you more than they do for me.
Nice trolling. Congratulations sir, that went almost undetected. You won one internet for trying. Don't spend it all at once.
Cryptography is hard. It is intentionally hard. There is only so much you can do to take one of the most theoretical and advanced branches of mathematics and turn it into something vaguely resembling English. Frankly, this article is leaps and bounds simpler than several "beginners" books on cryptography I have tried to read. There's only so much complexity you can strip out before you're not talking about the subject anymore.
Good Thing Ethical Hackers Discovered It First…
All un-ethical cryptographers already work for government agencies and already exploit this (and have GCM mode ciphers) 😉
One always wonders who actually discovered what first in crypto attacks. There is a lot of motivation to not talk for many of the top experts. I would hardly be shocked if it comes out some day that the NSA, GCHQ or Spetsnaz had broken RSA decades ago.
Agencies like those will of course claim that they already discovered the attack/weakness years ago. They have to. And we will never really know the truth. We can only hope to use better encryption in future, and by better I mean practically secure by today's known standards. Public efforts in cyber security are therefore of utmost importance and are the foundations of online privacy.
We published a first plaintext recovery attack of RC4 in the broadcast setting where same plaintext is encrypted by different user keys at FSE 2013 (earlier than AlFardan-Bernstein-Paterson-Poettering-Schuldt Results).
Summary of results of our paper is as follows
Our attack can recover ANY byte of first 257 bytes of the plaintext by using around 2^32 ciphetexts, and also recover later bytes (after 258 bytes) by using 2^34 ciphertexts. In addition, we give theoritical reasons why first 255 bytes of the keystream have such strong biases.
Pre-proceedings version of this paper is available at pdf file as follows: http://home.hiroshima-u.ac.jp/ohigashi/rc4/Full_P…
http://home.hiroshima-u.ac.jp/ohigashi/rc4/
Dan Bernstein's paper at FSE 2013 (see the link above to the Royal Holloway website) explicitly acknowledges the work you guys did, and acknowledges it as "slightly earlier."
Bernstein et al's paper is somewhat different, inasmuch as:
* It quantifies all possible biases for every byte at every offset in the first 256 bytes of cipher stream, allowing a more subtle probabilistic attack. (65536 biases.)
* It examines the situation of TLS using RC4, not just RC4 itself.
But thanks for the link to your paper. It's a great read, though it doesn't directly address the issue I mentioned in the headline (whether this breaks HTTPS or not).
Rather than saying HTTPS has been cracked, do you really mean "HTTPS using RC4 is weak, but we have known that for years, please use a stronger symmetric algorithm"?
Well, I *didn't* say "HTTPS has been cracked" 🙂
The headline says, "Has HTTPS finally been cracked? Cryptographers deal biggish blow…"
I like your headline, but it's more of a conclusion than a headline, I'd suggest. Probably better at the end than at the beginning.
(And headlines are supposed to/allowed to grab a bit of attention 🙂
I for one am glad for the technical detail. Just enough for someone who needs to understand the risks, but not so much you need a PHD in mathematics to understand it. I think a lot of people are having to learn more about this simply because the PCI scanning companies are now flagging sites for BEAST.
How can you ditch RC4 when Facebook and Google use it to encrypt most of their services?
They don't only support RC4. When your browser fronts up to a TLS server, it sends a list of ciphers it's able/willing to use. If you leave out the RC4-based ones, the server will either accept one of the others you *are* willing to use, or will drop your request.
(You could try this with a neat FF plugin called CipherFox, except the author's latest version removed the "block RC4" feature…though he's now planning to put it back :-
You can try various ciphers against specific sites with:
$ openssl s_client -cipher ONE-TO-TRY -connect www.example.com:443
The web site for CipherFox states RC4 changes in FF must be altered in “about:config” – RC4 Settings.
Does that mean every mention of RC4 in “about:config” should be set to “FALSE”?
Errr….it looks that way, doesn't it?
(The CipherFox author suggests he's going to produce a new release of his plugin that does the RC4 enable/disable tweaks from the plugin itself.)
This is not a terribly technical article for those who complained. Don't you learn about these things when you become CISSP? I had to learn about PRNG and Mersenne Twister in my statistics classes. This was a pleasant article. Thank you, Paul Ducklin!
Question remains though: Who is using AES-Galois (GCM?) encryption? What are the trade-off's for websites, or rather users, I wonder. Lots of sites don't use SSL at all because the trade-off isn't considered worthwhile i.e. not the cost of the SSL, but the user backlash over response time.
No, you actually don't "learn" about these things when you become a CISSP.
Very Interesting and very informative to the none technical, We have know this for some time but never explained in a that I could understand without my head exploding, while also point a way forward Thank you.
I like technical articles, and articles with at least more technical meat than most industry blogs.
But I also like, in conjunction with the meat, a practical analysis, and maybe even a brief discussion about what this means to an attacker and (more importantly) a defender. Thank you for including that. 🙂
For my own public-facing websites, we use a popular enterprise-level load balancer that provides SSL termination. In the past, we've disabled cipher redirects and SSLv2 support, without any apparent effect to our users (it's hard to tell since auditors forced me to turn it fully off, even though I was originally redirecting users who tried it to a friendly landing page to update their browser; but auditors aren't that thorough and mgmt not that interested in arguing technical topics). We had/have not disabled RC4 specifically, though the parans story likely will play out that way again and we'll disable it, despite the low value my company has and the low practical risk of an actual attack. :
I think this is very basic question , How to check what encryption my online banking is using , I thought it is AES .
Nice article! thank you Paul.
Although this is very secondary to the subject, I can't help but notice a small missing element in your mini-demonstration that "XORing twice with the same value cancels out".
In addition to "because k XOR k = 0, and because p XOR 0 = p", I believe you need to add the associative property of XOR, meaning "and because (a XOR b) XOR c = a XOR (b XOR c)".
Regards
I think the issue isn't that Paul wrote a article on encryption (Yes the lingo will look like Stephen Hawking hieroglyphics to anyone outside of computers). I think the issue is that there is an article written without the aid of something that will be comprehensive for everyone. In other words, there could be a short video to go along with this article to demonstrate visually what he is talking about.
I do wish one day the OS we use will be able to show us parts of what goes on on the backend by default offline and online without risking virus infection or hacking.
Hey there! Do you know if they make any plugins to protect against hackers?
I’m kinda paranoid about losing everything I’ve worked hard on. Any tips?