Cross words take weak GSM crypto into 2010

The Summer Holiday Crossword Competition is open until the end of 2009 in any regular timezone, so you have until 2009-12-31T23:59:59-12:00 to enter. (That’s 11pm on NYD in Sydney.)

The guaranteed prizes for the first three correct entries were snapped up within a few hours, but you can still enter solutions for the draw.

Quite a few solvers couldn’t finish without resorting to brute force against the answer hashes in the puzzle’s JavaScript. They found that the hash algorithm used was very poor, producing unexpectedly large numbers of collisions.

Since there are 98503 possible hashes (ranging from 0 to 98502 – see code below), and since there are 456,976 four-letter answers (26x26x26x26), you would expect an average of about 4.6 collisions per hash value for each word. But the first four-letter word in the puzzle has a hash which collides with a whopping 112 other answers.

The blue bars in the graph show the number of collisions for the first 70 hashes of four-letter words, using an algorithm with an output distribution which matches that of a random number generator. The red bars show the collision statistics for the crossword hash. Most red hash values never appear, causing statistically absurd numbers of collisions for those which do.

Interestingly, the blue bars above are generated by an algorithm which looks very similar to the crossword hash, as indicated in the pseudo-code for the original and the modified hash on the right.

As always, in matters of hashing and indexing, as in cryptography, the devil is in the details. Choices made when designing and implementing an algorithm can have a significant impact on its effectiveness and durability.

Algorithms which were originally thought to mix their inputs sufficiently to resist analytical attacks may be found to have flaws which limit their safety. Similarly, key or hash lengths which were resistant to brute force attacks five or ten years ago may have insufficient keyspace for today.

Algorithms may look statistically attractive even though they were not designed for any sort of operational correctness – like the blue hash algorithm above. For four-letter words, this produces a distribution of collision counts which almost perfectly matches the Poisson distribution expected from a good random number generator. Yet there was no science at all to my tweak. It was made up on the spot.

So it is rather surprising that the GSM Association is still in denial about the cryptographic weaknesses of the 64-bit A5/1 encryption algorithm, which is used to this day by most GSM phone handsets.

In December 1999, when the credulous amongst us still believed that the billions of dollars poured into the so-called millennium bug was money well-spent, two Israeli cryptanalysts were already warning about problems with A5/1. In 2003, further research in Israel led to the headline Technion team cracks GSM cellular phone encryption.

Now, cryptograper Karsten Nohl has taken these long-held concerns about GSM one step further, publishing the results of team work which suggests that “a full GSM interceptor to collect GSM data could hypothetically be built from open source components”.

Nevertheless, the GSM Association insists that there is no cause for concern, claiming that Nohl’s work is “a long way from being a practical attack on GSM”. According to the GSMA, “a hacker would need a radio receiver system and the signal processing software necessary to process the raw radio data. The complex knowledge required to develop such software is subject to intellectual property rights, making it difficult to turn into a commercial product.” GSMA also points out that “intercepting a mobile call is likely to constitute a criminal offence in most jurisdictions.”

So we can all rest easy. The fear of civil liability from the intellectual property violations alone will surely be enough to dissuade even the most scofflaw cybercriminal, to just the same degree that fear of the courts has greatly diminished phishing, piracy, spam and malware creation.

(As both the GSMA and Nohl point out, a superior A5 algorithm variant, A5/3, based on 128-bit keys rather than 64-bit keys, has been available for more than a decade. At the rate new phone handsets are churned onto the market, and old ones cast aside in the name of progress, it does seem to beggar belief that handset crypto is still stuck back in 99. Sorry, 1999.)