Imperfect but Still Useful: Data Destruction and MD5

We techies sometimes have an unfortunate tendency to be absolutists.

For example, consider secure data destruction. Ask a group of techies how to securely dispose of a disk full of sensitive data, and you’ll get a discussion about Gutmann, magnetic force microscopes, massive electromagnets, 35-pass overwrites, shredding, drilling, crushing, melting—pretty much everything up to and including throwing it into the fires of Mount Doom. We get caught up in the extreme cases—how to protect data from shadowy figures with infinite time and infinite resources. But unless you’re a government and your hard drive has vital state secrets on it (in which case, go ahead and use the Mount Doom method), it just doesn’t matter that much. Almost any method of data destruction is so much better than nothing that any differences between methods are usually insignificant.

Plenty of data breach announcements have come from companies that improperly disposed of media. In each of these cases, the problem was not that the media was only overwritten once instead of thirty-five times, but that the media hadn’t been erased or encrypted at all. It’s similarly hard to imagine a court holding someone negligent for “merely” using a three-pass wipe to erase data. We shouldn’t get so caught up in edge cases that we ignore the center.

The MD5 certificate hack is another example. MD5 has been “broken” for a while, but the term “broken” gets tossed around so much for crypto algorithms that it’s meaningless. There’s a difference between the way MD5 is “broken” and, say, the way a Caesar cipher is “broken.” MD5 is “broken” in that a cluster of 200 PS3s can create a fake CA certificate in a few days. A Caesar cipher is “broken” in that a kid with a pencil can solve it in a few minutes. Treating both as equally “broken” is silly. Cryptographic strength is not a binary question of whether something is “valid” or “broken,” but a matter of the computational power needed to find an original plaintext without the key. “Broken” implies that an algorithm is either perfect or useless, when most flaws merely lower the computational cost of working around the algorithm.

An interesting question is what affect, if any, the certificates hack has on other uses of MD5 as a hashing algorithm. The exploit focuses on web certificates, and some (but not all) of the cleverness is in how they craft signing requests that get a CA to sign a “real” certificate with a signature that also fits a counterfeit certificate. A web authentication certificate has a certain structure that makes it hard to create a meaningful collision, and the researchers figured this out. But they also developed a “sophisticated and highly optimized method for computing MD5 collisions,” which might have broader implications than just certificates.

For example, it’s not clear what this means for use of MD5 in forensics, where one-way hashes are used to show that a hard drive hasn’t been modified. There are obvious differences between certificates and hard drives. Certificates are small (about 1KB), and hard drives are big. Certificates have a carefully defined structure. Hard drives also have a structure, but that structure has more free space that might be used to create collision blocks. If one has to look at all the data to see any signs of hash trickery, it will be easier to do that with a small certificate than with a large hard drive. Someone with better crypto knowledge than me could opine on these factors, but they illustrate that getting a CA to sign a certificate that collides with a fake certificate is different than modifying a hard disk and keeping the same hash.

Assume, however, that the exploit is equally useful for hard disks—that with 200 PS3s, one could create an entire disk with the same MD5 hash as another disk, then use that fake disk as evidence against someone in court. Would a drive hashed with MD5 be thrown out of evidence because of that weakness?

I don’t think so. When used to authenticate a copy of a hard drive, the purpose of the MD5 hash is twofold: (1) to show that the data was not accidentally modified from the original, and (2) to prove that the data was not maliciously modified. MD5 is still good enough for the first purpose—if it takes 200 PS3s to create a collision, that’s enough to show that a hashed hard drive wasn’t accidentally modified. It’s a little weaker as proof against intentional modification, but 200 PS3s would still involve a lot of work to forge evidence. An opposing party would probably have to do more than merely allege the possibility of forged evidence; the burden would probably still be on that party to show inauthenticity. The best choice, of course, is to use something other than MD5 in forensics when possible. But MD5 is still a whole lot better than nothing.

Not everything falls into an easy distinction between “perfect” and “broken.” Technical measures can be less than perfect, but still useful, and sometimes even the best option in some circumstances.

Published in: on January 8, 2009 at 6:44 pm Leave a Comment

The URI to TrackBack this entry is: http://blog.subjunctive.com/2009/01/08/imperfect-but-still-useful-data-destruction-and-md5/trackback/

RSS feed for comments on this post.

Leave a Comment