The code monkey's guide to cryptographic hashes for content-based addressing
Serious but fixable
Systems with serious effects from hash collisions can still safely use CBA if they have a mechanism for upgrading the hash function when a successful attack appears immanent. Peer-to-peer file-sharing systems like BitTorrent are vulnerable to attack, since any untrusted user can serve pieces of files. But if the protocol allows for a change in cryptographic hash function, users can protect themselves by switching to the new cryptographic hash function and refusing to accept data from peers using the old cryptographic hash function. Other systems are harder to retrofit. File systems or archival services using CBA that store data from untrusted users would need to re-index their storage to change cryptographic hash functions, which requires reading every block in use, computing a new hash, and writing the new value out - a time-consuming and potentially risky operation. Just ask your sysadmin what she thinks about re-indexing in place your company's long-term archival storage.
Summary
So when does content-based addressing make sense for your application? Use CBA if:
- Computing and comparing cryptographic hashes is faster than direct comparison or traditional hash tables.
- Cryptographic hashes are needed for other reasons anyway.
- Only trusted users can introduce data,
- Or, untrusted users can introduce data but the hash function can be upgraded when necessary.
Conversely, don't use CBA if:
- Content-based addressing is slower or otherwise less desirable than other methods.
- Cryptographic hashes are not needed for other purposes (for error detection alone, faster non-secure hashes make more sense).
- Untrusted users can add data to the system and the hash function is difficult to upgrade.
Acknowledgements
This article could not have been written without the advice and criticism of many programmers and cryptographers over the years. In particular, I would like to thank Fred Douglis, Armando Fox, Yongdae Kim, and Aaram Yun for corrections and many improvements in format and clarity. All errors and inopportune turns of phrase remain, of course, my own.
About the author
Valerie Henson is the founder of VAH Consulting, a company specializing in Linux file systems consulting. She first became interested in cryptographic hash functions at OSDI '02 and would start a Fantasy Hash Function League if anyone else would play it with her. Her first published paper, An analysis of compare-by-hash, sparked debate about content-based addressing but little useful advice for programmers, for which she is trying to make amends.
F-Secure Warns About a Worm Affecting Corporate Networks 2009-01-08 16:42:00+11
Fortinet Cures Mobile Phone “Curse of Silence/CurseSMS” Attack 2009-01-07 16:30:00+11
SEAGATE SHIPS DESKTOP HARD DRIVE WITH WORLD’S HIGHEST AREAL DENSITY – 500GB PER DISK 2009-01-06 15:34:00+11
New FileMaker Pro 10 Ships With Sleek New Interface and Breakthrough Reporting and Automating Features 2009-01-06 12:21:00+11
Lexar extends KODAK offering with Secure Digital High-Capacity, High-Speed Memory Card 2009-01-06 09:36:00+11



