Rudest man in Linuxdom rants about randomness – “We actually know what we are doing. You don’t.”


Linus Torvalds is a very clever man – he invented Linux, after all – but he seems to struggle with simple human decency.

(He recently expressed the wish that the designers of some hardware he doesn’t like might “die in some incredibly painful accident“, and invited you to puncture the brake lines on their car as a way to make it so.)

So it’s hardly surprising that when he heard a cryptographic suggestion he thought was silly, he let rip like this:

Where do I start a petition to raise the IQ and kernel knowledge of people? Guys, go read drivers/char/random.c. Then, learn about cryptography. Finally, come back here and admit to the world that you were wrong. Short answer: we actually know what we are doing. You don't. Long answer: we use RDRAND as _one_ of many inputs into the random pool, and we use it as a way to _improve_ that random pool. So even if RDRAND were to be back-doored by the NSA, our use of RDRAND actually improves the quality of the random numbers you get from /dev/random. Really short answer: you're ignorant.

That’s right – yet more “NSA cracked my crypto” conspiracy, and this time, the rudest man in Linuxdom is in the thick of it!

Interestingly, there are some useful lessons to be learned here – and they’re more about how to deal will technical issues well than they are about surveillance or digital snooping.

So, at the risk of receiving a Royal Rant from Torvalds himself (me for writing this, and you for reading it), let me explain.

Linux has a special file called /dev/random that doesn’t exist as a real file.

If you open it in a program, and read from it, you get a stream of pseudorandom numbers, generated right inside in the kernel.

The idea of doing the work in the kernel is to end up with randomess of a very high quality.

That means minimal bias (the next bit is always zero with a probability of 50%), minimal predictability (even if you have a detailed history of recent outputs), and minimal repeatability (you can’t trick the system into giving the same sequence twice).

The way this works, very loosely speaking, is that the kernel continually sucks in pseudorandom data from various hard-to-predict sources – how much did the mouse move last time? how quickly did you type? how much time elapsed between two hardware events? – and stirs it all together into a bucket of digital slurry.

Along the way, the pseudorandom inputs are each shovelled through a non-cryptographic hash function to hasten the slurrification.

An estimate is kept – expressed in bits – of the amount of randomness that has been mixed in so far.

When you need random numbers, some of the slurry is fed into a cryptographic hash function in order to extract a pseudorandom bitstream from it.

The amount of randomness extracted is never allowed to exceed the amount currently swilling around in the sludge-bucket: if necessary, your reads from /dev/random are slowed down until the bucket fills up again.

This stops an attacker diluting the sludginess of /dev/random simply by reading wastefully from it until the metaphorical water runs clear.

The question, in the light of recent implications that the NSA has tainted the cryptographic sanctity of everything it could get its tentacles into or onto, is whether it is acceptable for the Linux kernel to use random numbers generated by the CPU itself as part of its official pseudorandom stream.

After all, modern Intel CPUs have an instruction called RDRAND which is supposed to use thermal noise, generally considered an unpredictable byproduct of the fabric of physics itself, to generate high quality random numbers very swiftly.

Sounds like just the ticket, but what if Intel tainted RDRAND, by order of the NSA?

Linus’s school of thought, which is entirely understandable, is that mixing a tainted data stream with a pseudorandom one can’t reduce randomness: even if you stir your bucket of sludge in a really careful and ordered fashion, you still end up with sludge.

(I’m not sure I agree with Linus that mixing in a known-tainted RDRAND stream would nevertheless invariably improve randomness, but on the surface, it shouldn’t reduce it.)

Of course, you can counter that claim – and some concerned digerati have done just that – by postulating an actively hostile RDRAND instruction.

This RDRAND might monitor the state of the rest of the CPU in order to produce “random” data that is specially matched to the existing contents of the sludge-bucket so as to cancel out some of its randomness.

But how likely is that, given the cryptographic and non-cryptographic hash-churning that goes on inside the Linux kernel to stir in new pseudorandom input?

Can you “cancel out” randomess under those circumstances?

How much of the state of the CPU, and even the computer as a whole, would the tainted RDRAND instruction have to track in order to produce a real time active cancellation stream that could predictably tweak the overall output of /dev/random?

Well, here’s the thing.

If you take Linus’s advice, and go read drivers/char/random.c, as an interested spectator called Taylor Hornby did, you won’t find quite the clarity that the rant-master seems to suggest.

For example, the core function get_random_bytes() says that it “does not use the hw random number generator” (which would handily render this whole discussion moot), yet calls a function which does just that:

Furthermore, the hardware-generated random data (that the algorithm isn’t supposed to be using at all, remember?) is consumed after both the non-cryptographic and the cryptographic hash-churning described above.

The RDRAND data is merely XORed into the already-hashed output of the random number generator as the last step of the process.

In theory, then, a hostile RDRAND instruction wouldn’t need to keep track of much CPU state at all, since you can cancel out an XOR merely by repeating it. (X XOR X = 0; X XOR 0 = X; and so Y XOR X XOR X = Y.)

As Taylor Hornby notes, in a mock dialogue amusingly modelled on Galileo’s Dialogue Concerning the Two Chief World Systems:

Ironically, the random.c source code suggests that a tainted source of randomess is a problem – even at the stage when the bucket of sludge is still being filled, let alone after it has been drained – when it says:

So, if I were King, what would I do to sort this out?

  • I’d order my subjects to stop worrying about a tainted RDRAND, at least for now, and concentrate on all the other problems in my Kingdom, such as IE 6, browser Java, unencrypted USB keys, XP’s forthcoming funeral, and sources of randomness that really are broken.
  • I’d have some King’s Messengers fix the comments in random.c so that they matched the code, like good documentation should, and actually helped prove the Rude Man’s assertion that “we know what we are doing.”
  • I’d fold in the data from RDRAND earlier in the process, along with all the other sources of entropy, so that no-one would need to answer the question, “Who asked you to leave RDRAND until an XOR right at the very end?”
  • I’d sentence Mr Torvalds to 200 hours of community service in a hospital orthopaedic ward, helping those who can’t help themselves because of serious injuries sustained in automobile accidents.

Right now, we could do with a bit more clarity in cryptography.

Sadly, ranting that people should read a bunch of historically inconsistent comments in a source code file in order to conquer their ignorance is not a means to that end.