Geeky fun idea of the day – I want to pre-calculate and copyright all un-salted hashes of common words the moment the SHA-3 algorithm is approved.
As some of you may know, there are plans afoot for creating a new SHA-3 one-way-hash algorithm.
One-way hashes are often used in cryptography/security – the idea is that you can take some data (like a password, or an entire book/dvd/cd) and run it through an algorithm which returns you a single number that uniquely identifies the input data. However, given the number alone, you can’t get back to the data (hence it’s “one way”). The returned number returned is indeed huge, but it is easily represented as a simple text string of a consistent length.
Many people use hash algorithms for storing people’s passwords in databases, so that they can avoid having a massive list of user’s passwords in easily-stealable form. The idea is that instead of storing the password in the database directly, you store an encrypted (hashed) version of it. When someone tries to authenticate themselves, the code takes the password the user supplied and re-hashes it. The code then compares the new hash against the database hash – if they are the same, the person entered the correct password.
This is all fine and dandy, except when the supplied password is a common word. If you use the word “fred” as your password, the hash algorithm will always return 570a90bfbf8c7eab5dc5d4e26832d5b1. So, if I find an encrypted password entry in a database which is 570a90bfbf8c7eab5dc5d4e26832d5b1, then I know the input value (their password) is “fred”.
The OED has, according to Wikipedia, 500,000 words. So it’s entirely possible to take every english word and hash it. Voila – you have now figured out a huge number of passwords in the database.
So what I want to do is pre-calculate (and copyright) all the hashed words in SHA-3. If someone stores them in a database, they are effectively using my copyrighted information.
The only problem is that it it’s not possible to copyright a number … or is it? (oh, and the fact that it’s only theoretically possible to copyright longer texts…)
There’s a technique called salting, which helps avoid the problem described above. In short, a random string is added to the supplied password – so now instead of having to calculate the password for “fred”, I have to calculate every possible combination of “fred” and all possible random strings. This is effectively equivalent to pre-calculating AAAAfred, AAABfred, AAACfred, etc, all the way to ZZZZfred. By chosing enough “salt” characters, I can make pre-storing all possible passwords / salt combinations impossible.
Oh, if you’re interested – check out this page for more info on storing passwords in your database. In short, use bcrypt – more info here