We’ve already written about the ability of the Flame malware to trick you into installing apparently-trusted software signed with a fraudulent digital certificate.
Microsoft has now gone public with additional information about the cryptographic trickery used in this case.
For pre-Vista versions of Windows, it seems that the certificate spoofing didn’t rely on any sort of cryptographic forgery.
But for Vista and later, the attackers needed to forge a certificate. They did this using an MD5 collision.
The details of what the forged certificate looked like can be found on the Technet blog; here, we’ll simply review what we mean by “an MD5 collision.”
Digital certificates rely on public-key cryptography. You use a private key to encrypt the certificate; the recipient can use your public key – and only your public key – to decrypt it and thus to validate it. If the certificate doesn’t decrypt with your public key, then clearly you didn’t sign it, and vice versa.
Except that public-key algorithms aren’t good at encrypting and decrypting arbitrarily-sized blocks of data such as certificate files, not least because they’re very much slower than non-public-key cryptographic algorithms.
So, to digitally sign a file, you first generate a cryptographic hash of it – generating a fixed-length fingerprint – and then use public-key encryption on just the hash.
This makes computation much more convenient, but at the cost that the digital signature is now only as strong as the weaker of the hash algorithm and the public-key algorithm.
In the Flame case, the attackers took a legitimate Microsoft certificate using MD5 for its hash and RSA-2048 for its public-key encryption. They then generated a similar-but-different certificate with the same MD5 hash. This means that the RSA-2048 signature from Microsoft’s genuine certificate could be grafted into their forged certificate to make it appear valid.
Of course, this isn’t supposed to happen. For a hashing algorithm to be considered cryptographically sound, there should be no method better than chance (i.e. no alternative to brute force) for creating two objects with a matching hash.
But this isn’t true for MD5. Back in 2004, Chinese researchers showed how to create colliding MD5 messages in about one hour on a (then) high-end IBM server.
And by the end of 2008, European researchers had successfully generated a rogue Certificate Authority (CA) digital certificate. They used a network of 200 PS3s for the computational heavy lifting.
The moral of the story?
Don’t use digital certificates which rely on MD5. In fact, avoid MD5 as far as you can. You should not use MD5 in any new IT project.
In 2012, use SHA-1 as a minimum whenever a cryptographic hash is called for, and prefer SHA-2 if practicable. (SHA-2 comes in four flavours, giving hashes of 224, 256, 384 or 512 bits long.)
–
and at a minimum, use SHA-1 *with a salt* too 🙂
How does one do that? Use either or any? Is there a program to be used? The "how" is as important as the story. For non-computer professionals I mean.
Avoiding MD5 certs offered to you (e.g. into your browser) by other people isn't easy.
Firefox, for example, has had an on-again-off-again effort to make "reject MD5-hashed certificates" a default setting for over a year now. Indeed, in FF 13, which I acquired a couple of days back, it is now an official setting, but it is _off_ by default (you can use about:config to change "security.enable_md5_signatures" to "false").
If Mozilla can dither for that long…then what are the rest of us to do 🙂
My advice above was meant to be more restricted, and thus easier to achieve but less useful overall: give up using MD5 in anything _you_ do, including minting digital certificates.
Having said that, you can check your own (and other people's) certs, assuming you know how to save them to disk, with a command like "openssl x509 -in certname -text" (UNIX/Linux/OSX) or "certtool.exe -dump certname" (Windows).
Most Linux distros are using MD5 on isos and updates. Should they change this? Or is the risk low enough that the lower overhead is more desirable than the security?
Download sites I've used lately seem to carry both SHA1s and MD5s.
FWIW, If you just want to verify "did the download complete without errors" then you are OK with the MD5 hash – it's not a digital signature, after all. (If hackers could alter the ISOs, they could probably alter the page containing the MD5s and other hashes, too. So even a SHA-512 would be no more useful, given that you can't validate the authenticity of the hash. It could be just as "hacked" as the ISO 🙂
Many (most? all?) distros these days also provide signed checksums which you can verify using GPG and a copy of the distro-maker's public key – which you can acquire from a separate source if you like. In the case of software updates, that's typically done automatically.
"Except that public-key algorithms aren't good at encrypting and decrypting arbitrarily-sized blocks of data such as certificate files, not least because they're very much slower than non-public-key cryptographic algorithms."
I'm puzzled as to in what way being "arbitrarily-sized" poses a problem, particularly as the context suggests that "arbitrarily-sized" is not a loose substitute for "very large". If the algorithm is much more effective when handling blocks of 1k (say), surely we can pad to the next multiple with random noise?
The speed comparison with non-PKI is, not, in itself, informative. What's important is how slow is decryption of public-key encrypted data/code, on non-archaic hardware, and is the consequent delay to the update process a price worth paying for the potential avoidance of malware? (I'm assuming the encryption time, at the head end, will be far from beyond what Microsoft and et al can support.)
I haven't got current measurements, but RSA is typically about 1000x slower than a symmetric encryption algorithm, which is in turn generally slower than a designed-for-the-purpose cryptographic hash. And RSA and friends don't lend themselves to enciphering data in blocks or streams (in the context of which, I do consider 1KByte "large" 🙂
Actually, I failed to mention another reason for hash-and-sign, namely that you may not actually want to encrypt the data you are presenting. In other words, the raw data of a certificate is public information – you want it to be in clear. You also want the cryptographic hash to be in clear. And you want a separate signature by which the hash can be authenticated, and thus the content of the certificate validated. (Nice thing about that is it means you can start processing the certificate, computing and checking its hash, and validating the authenticity of the hash, all in parallel.)
This would be a great time to discuss last month’s #sophospuzzle, eh?
Er, yes 🙂
How about SHA-5 or WHIRLWIND? Also, if MD5 is weak why do antiviruses use it to detect viruses?
I've not heard of SHA-5.
Don't forget that NIST is currently getting to the end of a public contest to choose and standardise on a hash that will be designated "SHA-3" and that will replace its currently-officially-recommended SHA-2. Since we're not yet even at SHA-3, anything called SHA-5 sounds rather confusing at best. Perhaps SHA-5 is somebody's unofficial name for a hash function or some sort? If so, then on those grounds alone, I'd avoid it, whatever it might be.
In my book, there is only: SHA-0 (a retrofitted name for a NIST hash function which was "unrecommended" immediately and reissued as SHA-1), SHA-1 and SHA-2 (which comes in four flavours, as mentioned in the article). SHA-3 will appear Real Soon Now.
As for Whirlpool (which is what I presume you meant by "Whirlwind") – it's derived in part from AES, and one of the designers is Vincent Rijmen, co-designer of AES. So it has good heritage 🙂 Use it if you like, but bear in mind that keeping to the the current NIST-approved hash _might_ be a simpler option when it comes to justifying your choice to customers.
And finally, the way hashes are used in malware detection doesn't generally require a _cryptographic_ hash – merely a hash with quantifiably satisfactory dispersion properties. I won't try to explain here (maybe another time 🙂 but there's a hard-to-find research paper on this very issue from way back by the late anti-virus guru Yisrael Radai, if you care to go digging.
There are tons of hashes. Which is the "strongest"?