Today I came across an interesting article written by a group of researchers at the Technical University of Eindhoven.
The group has been involved in research of collisions in MD5 one-way hash algorithm. By using an improved (and yet unpublished) variant of their Chosen Prefix Collision Algorithm in combination with the computing power of the Cell processor as implemented in PlayStation 3 they have managed to create an arbitrary number of PDF documents with meaningful content having the identical MD5 checksums.
In a Nostradamus Future Prediction attack they created a document containing "predictions" of the 2008 US Presidential election results. The authenticity of the document is supposed to be proven by MD5 checksum. Until today, in practice, a document's MD5 checksum changes when the document is modified, so it is also feasible to ascertain that the document has not changed if its MD5 checksum has not changed. To prove that the document has not changed it is sufficient to publish its checksum before the event and then publish the document after the event. In case of the 2008 Presidential election prediction, several documents exist, all with the identical checksum, each containing the name of one of the possible winners together with structural changes that allow the checksum collisions.
The computational power required to use the attack is still significant so we do not expect malware to be using similar techniques to infect files any time soon. Furthermore, anti-malware software is often not using checksums for detection and it is unlikely that a similar attack would be successful even if it was feasible.
On the other hand, the computing power required to create these MD5 collisions is widely available so it is easy to see how the technique could be used by the organised cyber-crime gangs for launching data-diddling attacks. To check the integrity of a document it is advisable to use a more collision resistant checksum algorithm such as SHA-1. Here at SophosLabs, we have been using SHA-1 as the main hashing algorithm for our malware samples ever since MD5 collision problems have been disclosed.