We recently wrote about a security advisory from Cryptocat, an open-source web-based secure messaging project.

Cryptocat’s hacktivist credibility was cemented in 2012 when its Canadian developer, Nadim Kobeissi, was stopped at the US border and interviewed about his Cryptocat-related programming activities.

The software enjoyed a surge in popularity as a result, which made last week’s Fourth-of-July vulnerability announcement all the more worrying.

It turned out that the Cryptocat code had for some time contained a number of cryptographic flaws that made it measurably less secure than its users might reasonably have assumed.

One of these flaws was found in the heart of Cryptocat’s PRNG (pseudo-random number generator).

#### PRNGs are a key component of cryptographic software

For the same reason that you don’t use the same password on more than one website, so that one cracked password doesn’t open up your whole online life, Cryptocat generates new keys every time you chat to someone.

So, to generate a steady supply of unpredictable, unguessable, one-use encryption keys, Cryptocat needs a steady supply of high-quality random numbers.

With this in mind, I thought it might be interesting, and educational, to drill into one of Cryptocat’s now-patched PRNG bugs.

#### Cryptocat random numbers, old-style

Cryptocat is written in JavaScript, which represents numbers internally as 8-byte IEEE-754 floating point numbers.

Usefully, 8-byte floats allow you to represent integers up to and beyond 32 bits, though annoyingly this numeric range doesn’t extend all the way to 64 bit integers.

For technical reasons, IEE-754 numbers run out of integer precision at 2^{53}, which corresponds to nearly, but not quite, 16 decimal digits of precision.

Since Cryptocat’s core PRNG function, `Cryptocat.random()`, returns a JavaScript number, the coders made sure they extracted all possible variability every time it was called.

They constructed a text string representing a random 16-digit (53 bit) decimal number somewhere from 0 to 0.999999999999999 inclusive. In pseudocode:

// Start building a number (0<=N<1) as a text string numstr = '0.' // tack on 16 random ASCII digits for i = 1 to 16 do numstr = numstr + randomASCIIdigit() end // Convert the string to an 8-byte float return stringtonumber(numstr)

Curiously, the coders generated each random ASCII digit by extracting a random byte from a stream cipher of excellent repute known as Salsa20, and converting that byte into one of the characters from ‘0’ to ‘9’.

But a byte (8 bits) can take on 256 (2^{8}) different values, and 256 isn’t a multiple of 10.

Since they couldn’t divide 256 bytes into 10 equally-sized groups of values, the coders resorted to a trick:

repeat byte250 = randomSalsaByte() until byte250 <= 250 randomdigit = remainder(byte250,10)

The idea is simple: 250 is evenly divisible by 10, so it’s easy to map the numbers from 1..250 onto the digits 0..9.

And 250 is very close to 256, so by simply throwing away random bytes from 251..255, the problem of indivisibility by 10 is sidestepped.

#### The problem

The code above is certainly an inelegant solution, since it is, in theory, at least, a potentially infinite loop.

→ If the `randomSalsaByte()` function is truly random, there is no mathematical *guarantee* that it won’t continue to return values above 250 for seconds, or minutes, or months.

It is also flat-out wrong.

The less-than-or-equal-to sign should be a less-than sign.

There are 251 different integers such that 0 ≤ X ≤ 250.

The programmer meant to specify that 0 ≤ X < 250 or, equivalently, that 0 ≤ X ≤ 249.

This is known as a fencepost error, by analogy with the fact that it takes seven fenceposts one metre apart to create a fence six metres long.

That’s because you need an extra fencepost at the far end of the sixth section:

#### Quantifying the error

When you divide numbers from 0..249 by 10 and take the remainder, the values 0 to 9 come out on average 25 times each, or 10% of the time, for a uniform distribution of digits.

But when you do the same with numbers from 0..250, there are 25 ways to come up with each of the digits 1..9, and 26 ways to come up with 0.

Instead of coming up on average 10% of the time (25/250, or 0.1), 0 will come up approximately 10.36% of the time (26/251, or 0.1036), and 1..9 will come up 9.96% each (25/251, or 0.0996).

The difference is easily visualised.

Here is a graph showing three tests, each generating 1,000,000 random digits using a good PRNG (the file `/dev/urandom` on OS X).

You’d expect each digit to come up about 100,000 times, and that’s what happened:

But the Cryptocat generator, with its one-part-in-251 inaccuracy, gives a graph that on casual inspection doesn’t look right.

Each time, the count for 0 is somewhere around 103,500 – approximately 10.36% of the 1,000,000 digits in the each sample:

Of course, the visible differences between the red graphs and the blue graphs might not be statistically significant, since they could have arisen by chance.

There is, after all, no law that prevents unusual looking results, just as there is nothing that stops a casino’s roulette wheel from landing on red 20 times in a row.

But there are statistical tools we can use to measure the likelihood that a set of non-random-looking results really did happen by chance, not due to some flaw in the PRNG.

#### Chi-squared

One common method is known as the Chi-squared test.

You compute the difference between each reading and the average value you expected.

Then you square the differences, divide each one by the expected value, and add those scaled-squares-of-differences together.

Because you square the differences, readings that are unusually far from what you expected have a much larger effect on the Chi-squared value than readings that were close to correct.

You then do some fancy mathematical calculations (or look up the result in a pre-computed table) to convert the Chi-squared value into a probability figure; the lower the probability, the worse your PRNG looks.

Loosely speaking, the Chi-squared probability tells you the chance that the variation in results that you observed really did happen by accident.

So, the lower the figure, the less likely that the inaccuracies you observed can be explained away as natural and reasonable.

→ Lower than 0.1% is considered very suspicious, since it suggests that your observed result should only happen “against the odds” once in every 1000 tries. So there’s a 99.9% chance that your weird results are down to a flaw in your assumptions. In this case our assumption is that the numbers come from a reliable PRNG; if we *reject* that hypothesis, we effectively *accept* that the PRNG is faulty.

Generating 30 sets of 100,000 samples using the `/dev/urandom` generator produced a wide range of probability figures, none particularly alarming:

But similar results for 30 sets of Cryptocat PRNG digits have a much shabbier look:

Stepping up the number of samples in each test – simply put, giving more time for genuinely random variations to even themselves out – polarises the results even further.

With 30 sets of 1,000,000 samples, `/dev/urandom` seems to satisfy our assumption, namely that it is a decent PRNG:

Cryptocat’s fencepost error, however, has really put a spanner in the works:

#### The error propagated

Clearly, the PRNG’s bias towards 0 is consistent and quantifiable: it’s off-kilter by 1/251, as we showed above.

Does that sort of bias still cause a problem when we combine sixteen digits into a floating point string and convert it back to a 53-bit number?

What if we generate 53 bits of floating point precision, and then do this:

number = rounddown(Cryptocat.random()*100) + 1

Since the `random()` function produces a number from 0 to 0.9999999999999999, multiplying by 100 and rounding down limits us to numbers from 0..99, and adding 1 produces an integer from 1..100.

We’ve converted 16 flawed digits into 53 bits of IEEE-754 floating point fraction, then converted that again to fewer than seven bits’ worth of positive integer.

Perhaps that is enough to hide the bias we saw above?

It isn’t.

#### Visualising the flaw

Below, I created a 100×100 grid and then generated 10,000,000 pairs of Cryptocat-style floats.

I mapped each pair of 53-bit numbers to a pair of integers from 1..100, as explained above, and used that pair as an (X,Y) co-ordinate in my 100×100 grid, keeping tally of how many times each point in the grid got a “hit”.

I then used those counts to choose a colour for each point in the grid; more “hits” meant a brighter colour.

Using `/dev/urandom` to produce my digits from 0..9 in my PRNG, I ended up with what looks like (and, indeed, is) just so much random noise:

But using the flawed Cryptocat PRNG produced a ghostly grid of colours that clearly reveals the systematic prejudices in the PRNG:

And that, my friends, is why PRNGs are important, and why they should be tested thoroughly every time you build and ship the latest version of your software.

A PRNG which passes the Chi-squared test may very well still be deeply flawed, but one that fails, and which produces such clear visual regularities, is definitely unfit for purpose.

Random number generators, as the joke goes, are far too important to be left to chance.

It isn't often I am enlightened and entertained and even less often that I get to the end of a 1400+ word blog post and feel happier for it. It's clear that it wasn't by chance and should be even more apparent by my public commentary. I don't hand out public praise, even for friends, very often. Thanks for the excellent article Duck.

Chester

Aw, thanks, Chester :-)

Apart from the colorful plot, I'm curious what the impact of this bug really was. If it only reduced the randomness in their passwords by a fraction of a percent, is it really practically less secure?

I'm also curious what the author's preferred method would be for reducing the range of random integers. I'd always been taught that this was a good method. If you wanted to be really deterministic, I guess you could collect the remainder bits until they add up to a multiple of 10, to guarantee that you can terminate at that point. But the odds of holding up that loop for even a tenth of a second are so vanishingly small that I wonder if it's really worth the added complexity.

Hey Jack, please refer to this blog post by sc00bz (Steve Thomas) to read about the real impact of this bug: http://tobtu.com/decryptocat.php

The potential severity of this bug paled into insignificance compared to another bug in Cryptocat – the code acquired 32 bytes of random data (used by the crypto key pair generator for each message) but inadvertently requested hex digits (4 bits per byte) instead of full bytes, thus effectively using 16 bytes where the crypto algorithm specified 32.

As for the whole multiple-of-10 thing, you can avoid it altogether. The only reason for having it is to generate ten decimal digits to convert into 53 bits of floating point number.

But base 2 and base 10 never play nicely togther. Mixing them, as here, is unnecessary.

You can just generate two 32-bit integers from Salsa and use multiplication and addition to combine them into exactly 53 bits of float.

To convert from 53 bits of float in the range 0 to 0.9999999999999999 into an integer in the range 1..N, just multiply by N, do math.floor(), i.e. round down, and add 1.

Actually, no. This will divide the float values into N buckets of unequal size, unless N is a power of 2 less than 2 **53. To see this, imagine for a moment the float has a 7 bit mantissa, and N is 100. There are 128 possible float values and 100 integers, which is resolved by giving 28 regularly spaced integers an extra ticket in the lottery. With 53 bits, the effect is smaller because the buckets are much bigger (around 2** 46 rather than 1 or 2), but it is still there.

The correct thing to do is what e.g. GSL does:

don’tconvert to floating point at all. Pick the power of two just bigger than N, and poll those many bits at a time until you get a number in your range. Along these lines:And don’t worry about the indeterminacy of the loop, for goodness sake! You’re no more likely to wait a hundred cycles than you are win a lottery or guess an RSA key. Amortise.

Cool. I don't know if this applies here, but another friend pointed me to this article (http://rdist.root.org/2009/05/17/the-debian-pgp-disaster-that-almost-was/) about how a biased PRNG can actually leak your (even properly generated) private key through signatures alone. Holy cow.

Debian cut the entropy of its crypto keys to 15 bits, I think.

The Debian story is a really intriguing one – IIRC, developers ran an automated code analysis tool that said, "Here's a bunch of code that seems to produce unpredictable results and might therefore be wrong," which usually it would be.

So they removed it "for safety," except that it was the code to put randomness into the PRNG, and it was supposed to produce unpredictable results!

In the end, the only data used to seed the PRNG was the process ID (an integer from 0..32767). Oops!

Wow I agree…. very interesting and entertaining article regarding the use of random numbers in security software. I am interested in cryptography software and so this was informative as well.

Excellent post! I would like to point out the dieharder test suite (http://www.phy.duke.edu/~rgb/General/dieharder.php ) , which is a great way to test whether a RNG is producing random numbers or not.

The while(randomnumber > 250) generatenewnumber() also serves as an example of why getting cryptography right is difficult. If the attacker can accurately measure the time it takes to run that function (but not get the result), and it takes longer than one iteration, that's a tiny fraction of a bit of information that could be useful in a brute force attack. Probably totally academic in this scenario, but sometimes it's really not :-)

For each bit of information the attacker knows, the amount of brute force required goes down 50%, so even a small failure in randomness can ruin your day.

The National Institute for Standards and Technology (NIST) also has random number validation software:

http://csrc.nist.gov/groups/ST/toolkit/rng/index….

Dieharder and dieharder2 are great. Even if you don't use them (dieharder2 won't compile on OS X, for example, because the code is, ahhhhhh, a bit of a mess), the docs are well worth reading for how to go well beyond the basics of just doing a Chi-squared test.

Agree about the timing problems of the not-provably-terminating loop.

As I explained somewhere above, you don't need decimal digits for the purpose they're put to here (constructing a float) – you can do the same thing with powers of two instead, with no loops needed at all.

This was an awesome article. Please do more of this!

Thanks.

This is part of what I call an "occasional series," of which you can find other examples by putting "Anatomy" in the [search articles] box at the top right of the page…

Very interesting! Thanks for this; it's a useful lesson for all of us who write software that's used in a secure environment. Interesting, though, that no-one thought to test this during development – frustrating through traditional software development paradigms are (and by no means error-proof), I do suspect that someone would have stuck 'test algorithm statistically' on to the testing team's work list somewhere along the line.

Like Jack, though, I'd be interested in the real-world impact of the bug. Any crypting is effective protection against typical chat being intercepted – if you're going to hunt for Facebook passwords or credit card details, there's easier places to get them than crypted chat traffic. If I was Edward Snowden, however, I'd be scared if I'd used Cryptocat before the bug was fixed.

Nice article. Good to see a visualization of chi squared!

Javascript stores all numbers as double-precision floating-point digits ("binary64" in IEEE754), aka 'double' in C. The 'double' type can actually only store the equivalent of 15 decimal (base 10) digits — the constant DBL_DIG in the system header float.h — not 16. (The 16th digit is either truncated or rounded away.)

So there is only 120 bits of entropy there, even with the fence-post bug fixed.

Most programmers can rely on IEEE754 arithmetic most of the time, but people who have not studied numerical analysis should not touch floating-point for serious work.

"The code above is certainly an inelegant solution, since it is, in theory, at least, a potentially infinite loop."

I don't see how this indicates a lack of elegance. The algorithm they used (without the off-by-one error, of course) has a crystal-clear implementation and, assuming it is backed by a good PRNG, won't be slow with extremely high probability. To be more explicit: Suppose I have a fixed favorite atom. Now choose one atom uniformly at random out of all atoms in the universe. The probability that you chose my favorite atom is *larger* than the probability of the mentioned loop running for 50 iterations with an uniform RNG!

The very successful field of Randomized Algorithms should convince you that such tricks often make algorithms *more* elegant, not less so.

The thing is, they've created a bad PRNG out of a good one, and needlessly included a loop of unpredictably variable length.

(Many cryptographers spend ages trying to remove timing variations from their code to prevent timing attacks, where you learn more than is necessary about internal state by measuring how long things take.)

I'm not sure how generating a 53-bit string by numeric conversion of 16 base-10 digits generated non-primitive-recursively from a full byte *that already came out of a PRNG* is making anything more elegant.

Especially when those 53-bit random numbers are being generated, for the most part, as a source from which to generate random strings of exactly N bytes (i.e. multiples of 8 bits) in length.

If what you want is a 32-byte random string, e.g. as a nonce to generate a keypair, why not just call the randomSalsaByte() function precisely 32 times? *That* would be elegance in my book.

I agree, the general idea of the algorithm is not elegant at all: The fact that base-10 is even used is a big hint that things weren't well thought out at all. No argument from me there. Feel free to disregard the rest of the post if you feel anything smaller than this is just nitpicking (it very much is); I'm only commenting on the line I quoted in my previous post and on the one I'm about to quote now.

Here's the thing, though: The specific two-line algorithm to get an uniform number from {1, …, n} from a uniform {1, …, m} RNG you criticized in the post is fine, and "potentially infinite-running loop" is a red herring in my opinion. The article states:

"If the randomSalsaByte() function is truly random, there is no mathematical guarantee that it won't continue to return values above 250 for seconds, or minutes, or months."

That's technically true, but very misleading: Under those conditions, there is a mathematical guarantee that, with probability higher than 99.99999999999999999999999999% (< 10^-28 probability of failure), the loop will run in less than 30 iterations for all of the next 10^30 runs (which is more than 1000 runs every second since the Big Bang). I would guess that lightning striking someone a billion times this year is much much more likely, but I haven't actually done this particular calculation.

Besides, the distribution of the times when you get more than 10 cycles out of the loop is a) Uniformly random b) Very very very very sparse. I would be very surprised to see timing attacks coming out of it. But I'm writing this post mainly because I want to be surprised, so I'd definitely like to hear any timing attack ideas.

Oh, and by the way: I really enjoyed the article! Thanks for the entertaining and informative read (and sorry for not saying thanks earlier :)).