The code monkey's guide to cryptographic hashes for content-based addressing
Trivial or no effect
In some systems, the consequences of a collision are trivial or unnoticeable. If the color value of a single pixel is wrong, or the wrong font is displayed, many people will simply not notice. Search engines make absolutely no guarantees about accuracy and users expect none; returning the wrong page for a search may grate on the programmer's conscience but have no practical impact.
Only trusted users
Many systems today use CBA with thoroughly broken cryptographic hashes; rsync uses MD4 and many archival document storage systems use MD5. Yet these systems continue to be practical and useful, because only trusted users who have no incentive to break the system are allowed to introduce data into the system. True, there will be certain useful data which your users will be unable to store; a real-world example is when cryptographers publish colliding inputs to prove a cryptographic hash function has been broken. (My laptop file system has contained files with colliding MD5 checksums since about the day after the first MD5 collision was made public.) Other systems only allow trusted users to add data to the system; the various CBA-based version control systems like git and Monotone fall into this category. If a user can create hash collisions in the system, they can also directly check in code to accomplish the same effects, so why worry about a fancy hash collision attack? Educate your users not to store colliding data intentionally and the problem is solved.
One way to judge if your system truly falls into this category is to imagine using it with a broken cryptographic hash function (with a sufficient number of bits) and see if you find any problems. In fact, in this case it makes sense to use the fastest appropriate cryptographic hash function regardless of whether it has been broken.
Note that UNIX-style operating systems do not assume users are trusted. File systems and other operating systems services must assume that users are untrusted and may deliberately introduce hash collisions (sometimes not even maliciously, as in the case of verifying published collisions). In general, systems software will have many untrusted or downright malicious users, and will be used in unpredictable ways. Applications can specify valid users and inputs much more narrowly than systems software.
Corruption or security problems
In other systems, the ability to deliberately generate hash collisions results in data corruption or security holes. If untrusted users can add data to the system, then they can, in effect, replace one piece of data with another piece of data, with varying consequences. The degree to which the hash is broken affects the kind of attacks that are feasible (read more about collision resistance, preimage resistance, and second preimage resistance on Wikipedia), but simple thought experiments demonstrate the possibility of significant security problems from even the easiest attack, finding two random inputs that collide. The simplest version is a binary which checks a single bit value and does either the right thing or the wrong thing based on that bit; distribute it with one input that causes it to do the right thing and later replace that input with a colliding input that differs in that bit value (the rest of it can be junk). (Getting people to download your binary, run it, etc. are left as an exercise for the reader.)
The above contrived example is amenable to the easiest hash collision attack, one that can find only arbitrary collisions, but real-world examples of meaningful colliding inputs for MD5 are now legion. A few:
- Postscript documents: http://www.cits.rub.de/MD5Collisions/
- Executables (for both Windows and Linux): http://www.mscs.dal.ca/selinger/md5collision/
- X.509 certificates: http://www.win.tue.nl/hashclash/TargetCollidingCertificates/
F-Secure Warns About a Worm Affecting Corporate Networks 2009-01-08 16:42:00+11
Fortinet Cures Mobile Phone “Curse of Silence/CurseSMS” Attack 2009-01-07 16:30:00+11
SEAGATE SHIPS DESKTOP HARD DRIVE WITH WORLD’S HIGHEST AREAL DENSITY – 500GB PER DISK 2009-01-06 15:34:00+11
New FileMaker Pro 10 Ships With Sleek New Interface and Breakthrough Reporting and Automating Features 2009-01-06 12:21:00+11
Lexar extends KODAK offering with Secure Digital High-Capacity, High-Speed Memory Card 2009-01-06 09:36:00+11



