Sen. Feinstein Reintroduces Federal Data Breach Notification Bill

Senator Dianne Feinstein re-introduced her federal data breach notification bill this week. This is the Senator’s fourth attempt to pass a data breach bill, having introduced similar bills in 2003, 2005, and 2007.

The bill looks at lot like Sen. Feinstein’s previous bills. Most importantly, this year’s bill, like previous bills, would preempt state data breach laws. That would be good for businesses, who currently have to track forty-seven data breach notification laws enacted by states, the District of Columbia, and two territories. A federal data breach notification law would replace all these different laws with a single standard for notification. Consumers, however, might be better off without a national breach notification law, because companies usually comply with whichever state law demands the most of them, rather than adjusting their notifications by state. A federal data breach notification law therefore has to be carefully written so that it doesn’t end up reducing consumer protection.

Preemption is almost certain to be part of any national data breach notification law. Only 6.7% of Americans live in states without data breach notification laws, and they probably get breach notifications as a side effect of other states’ laws. So a national law is no longer necessary to ensure notification, but it could still be used to create uniform requirements—which means preemption.

Sen. Feinstein’s previous bills didn’t get far, even when TJX and Choicepoint were in the news. This year, we have no data breach poster child, and lots of other priorities. It’s a new Congress and a new administration so anything could happen, but unless there’s another big public breach I’d be surprised if this bill gets much attention this year.

Published in:  on January 10, 2009 at 10:34 am Leave a Comment

Imperfect but Still Useful: Data Destruction and MD5

We techies sometimes have an unfortunate tendency to be absolutists.

For example, consider secure data destruction. Ask a group of techies how to securely dispose of a disk full of sensitive data, and you’ll get a discussion about Gutmann, magnetic force microscopes, massive electromagnets, 35-pass overwrites, shredding, drilling, crushing, melting—pretty much everything up to and including throwing it into the fires of Mount Doom. We get caught up in the extreme cases—how to protect data from shadowy figures with infinite time and infinite resources. But unless you’re a government and your hard drive has vital state secrets on it (in which case, go ahead and use the Mount Doom method), it just doesn’t matter that much. Almost any method of data destruction is so much better than nothing that any differences between methods are usually insignificant.

Plenty of data breach announcements have come from companies that improperly disposed of media. In each of these cases, the problem was not that the media was only overwritten once instead of thirty-five times, but that the media hadn’t been erased or encrypted at all. It’s similarly hard to imagine a court holding someone negligent for “merely” using a three-pass wipe to erase data. We shouldn’t get so caught up in edge cases that we ignore the center.

The MD5 certificate hack is another example. MD5 has been “broken” for a while, but the term “broken” gets tossed around so much for crypto algorithms that it’s meaningless. There’s a difference between the way MD5 is “broken” and, say, the way a Caesar cipher is “broken.” MD5 is “broken” in that a cluster of 200 PS3s can create a fake CA certificate in a few days. A Caesar cipher is “broken” in that a kid with a pencil can solve it in a few minutes. Treating both as equally “broken” is silly. Cryptographic strength is not a binary question of whether something is “valid” or “broken,” but a matter of the computational power needed to find an original plaintext without the key. “Broken” implies that an algorithm is either perfect or useless, when most flaws merely lower the computational cost of working around the algorithm.

An interesting question is what affect, if any, the certificates hack has on other uses of MD5 as a hashing algorithm. The exploit focuses on web certificates, and some (but not all) of the cleverness is in how they craft signing requests that get a CA to sign a “real” certificate with a signature that also fits a counterfeit certificate. A web authentication certificate has a certain structure that makes it hard to create a meaningful collision, and the researchers figured this out. But they also developed a “sophisticated and highly optimized method for computing MD5 collisions,” which might have broader implications than just certificates.

For example, it’s not clear what this means for use of MD5 in forensics, where one-way hashes are used to show that a hard drive hasn’t been modified. There are obvious differences between certificates and hard drives. Certificates are small (about 1KB), and hard drives are big. Certificates have a carefully defined structure. Hard drives also have a structure, but that structure has more free space that might be used to create collision blocks. If one has to look at all the data to see any signs of hash trickery, it will be easier to do that with a small certificate than with a large hard drive. Someone with better crypto knowledge than me could opine on these factors, but they illustrate that getting a CA to sign a certificate that collides with a fake certificate is different than modifying a hard disk and keeping the same hash.

Assume, however, that the exploit is equally useful for hard disks—that with 200 PS3s, one could create an entire disk with the same MD5 hash as another disk, then use that fake disk as evidence against someone in court. Would a drive hashed with MD5 be thrown out of evidence because of that weakness?

I don’t think so. When used to authenticate a copy of a hard drive, the purpose of the MD5 hash is twofold: (1) to show that the data was not accidentally modified from the original, and (2) to prove that the data was not maliciously modified. MD5 is still good enough for the first purpose—if it takes 200 PS3s to create a collision, that’s enough to show that a hashed hard drive wasn’t accidentally modified. It’s a little weaker as proof against intentional modification, but 200 PS3s would still involve a lot of work to forge evidence. An opposing party would probably have to do more than merely allege the possibility of forged evidence; the burden would probably still be on that party to show inauthenticity. The best choice, of course, is to use something other than MD5 in forensics when possible. But MD5 is still a whole lot better than nothing.

Not everything falls into an easy distinction between “perfect” and “broken.” Technical measures can be less than perfect, but still useful, and sometimes even the best option in some circumstances.

Published in:  on January 8, 2009 at 6:44 pm Leave a Comment