Thursday | 8 January, 2009
LinuxWorld.com.au

The code monkey's guide to cryptographic hashes for content-based addressing

Keeping track of large blocks of content using hash values only can be a a way to save bandwidth and build exciting new applications. Follow some important rules to use hashes correctly and securely.
Valerie Henson 28/11/2007 12:09:39

Within a few years of the first theoretical weakness, stronger weaknesses are discovered, with the result that a slightly simplified version of the hash function is vulnerable to an attack using the equivalent of several months of time on a large supercomputer. At this point, the experts recommend dropping the use of this hash immediately, although no practical attack yet exists. Non-experts point out that several months of supercomputer time are hard to get and that normal people have nothing to worry about. Within a few months or years of the discovery of a strong weakness, some ambitious researcher discovers a practical way to generate collisions between fairly meaningless random-looking inputs. Experts declare the hash very broken indeed, open bottles of champagne, and clap the researcher on the shoulder. Non-experts point out that collisions can only be generated in random-looking data, not data that looks like contracts or certificates, so there's still nothing to worry about. Not long after that, improvements in the attack allow some bored programmer to write a program to generate collisions between meaningful inputs on a home computer and release it as open source. Experts don't have any official reaction because they are too busy reviewing new cryptographic hash functions.

Life cycles of popular cryptographic hashes

You might have noticed a lot of changes in 2004; this is the year that Xiaoyun Wang, et al. published a paper named, simply enough, Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD. (I'd heard rumors that MD5 had been broken and watched the webcast of the announcement while eating popcorn.) One of my favorite quotes from this paper: "Our attack can find collision [in MD4] with hand calculation." This introduces another stage in the life cycle of a cryptographic hash function: The stage when finding collisions doesn't even require a pocket calculator.

Unbroken cryptographic hash functions

At present, the only hash widely recommended for secure use is the SHA-2 family, which includes SHA-256 and other variants. Most of the other popular cryptographic hashes have been broken or weakened.

Some of the less popular algorithms have not been broken, but often these algorithms are less popular for good reasons. For example, several methods for securely converting a block cipher into a cryptographic hash function are known, such as the Davies-Meyer technique. Unfortunately, the output is the size of the output of the block cipher, usually 64 bits, although modifications exist for doubling the the length to 128 bits. 160 bits is considered the current rock bottom for safe hash output length, with some advocating for 256 bits. Most of the schemes for both converting block ciphers into cryptographic hash functions and increasing their output length have been broken. The performance of block cipher-based hashes tends to be significantly slower than purpose-built cryptographic hashes; Table 18.2 on page 456 of Applied Cryptography compares the performance of Davies-Meyer (using DES), a 128-bit variant of Davies-Meyer (using IDEA), another block cipher-based hash named GOST, and 12 other hash functions. The block cipher-based hashes are the three lowest performing hashes in this evaluation.

Back in the realm of purpose-built cryptographic hash functions, RIPEMD-128 is considered too short by many experts. RIPEMD-160 is a worthy competitor to the SHA-2 family, but has received comparatively little scrutiny and is less trusted. The RIPEMD functions have the simultaneous advantage and disadvantage of being designed entirely outside the U.S.-controlled NSA.

Amateur cryptography

Be wary of inventing a new hash function - that is, adding a salt, double hashing, concatenating hashes, XORing the results, steganographically embedding them into LOLcats pictures, etc. Designing cryptographic hashes is extremely hard, as evidenced by the names MD5 (following MDs 1 through 4), RIPEMD-160 (RIPEMD and RIPEMD-128), and SHA-2 family (SHA and SHA-1). Adding a few operations to a hash function or combining existing functions is a tempting do-it-yourself fix, but usually adds only an insignificant amount of complexity to the attack, and can easily reduce the complexity of attack. From the abstract of the CRYPTO 2004 paper Multicollisions in iterated hash functions by Joux (one of the major contributors to breaking SHA-0):

[...] We solve a long standing open problem and prove that concatenating the results of several iterated hash functions in order to build a larger one does not yield a secure construction.

Additional Resources
Newsletter Subscription
Sign up for our LinuxWorld newsletters!
RSS Feeds
 
Sponsored Links