March 6, 2012

Do not use MD5 or SHA1 to simply hash DB passwords

Note: cracking speed has increased dramatically since this article was written. 8 characters are just seconds now...

Recently the amount of ‘copied’ or ‘stolen’ databases with user accounts has risen dramatically. Some databases have the user’s password in plaintext, others have a seemingly random string called a hash.

The latter may seem ‘secure’, but let’s see if it really is.

Suppose the developer chose SHA1 to ‘encrypt’ his user’s passwords.

The string mypassword would hash into 91dfd9ddb4198affc5c194cd8ce6d338fde470e2 and I can not reverse the process to get mypassword back because SHA1 is a one-way hash.

Or is there? Well, since SHA1 (and MD5) were designed to be very computationally efficient, one could quite easily use brute force to calculate it by trying literally millions of hash calculations.

Ok, then what is the problem? It will take a lot of time then to try out all these millions of combinations…

Well, http://www.golubev.com/hashgpu.htm does around 2300.000.000 SHA1 hashes per second and about 5600.000.000 MD5 hashes per second (and these numbers are with 2 year old graphics cards… so that number should have risen to almost double by now, take a look at Implementations of Hash Functions SHA-1 and SHA-512).

So, let’s do some math… and then you will simply see what kind of security you have implemented (we deal with the ‘salt’ later on):

- Most passwords only use alphanumeric characters with upper and lowercase mixed (though more often than not it is all lower). That gives us 62 characters.

- Most passwords are only the minimum required length of 8 characters.

- This gives us 62^8 = 218.340.105.584.896 combinations

- A single brute force search will take about 26.37 hours (remember, this is just one single GPU… a hacking team probably has more).

- That means in about 1 day we have the full complement of possible combinations and we can ‘decrypt’ most of the passwords

- This is all done without a dictionary… a dictionary contains most commonly used words and can speed up the process quite a bit. But why would we want to?

As you can see, the numbers are even worse for MD5.

But the bad news does not end there… this is just the most blunted method of attack; Rainbow Tables and Collision Attacks are also in the arsenal of the data-thief.

Some ‘security experts’ will tell you that ‘salt’ is the answer.

Salt means adding extra data to the password before you hash it. But here is a flaw to it as well; the salt needs to be reproduced reliably or a user can not use a hash to compare it with his password.

This means that if a hacker has the database and he has access to the code (which is not very uncommon if the site is open enough to let the hacker steal the database) then he can see the method used to generate and store the salt, and reproduce it in the brute force attack.

But salts are not meant to slow down hackers on your specific system, they are meant to prevent precomputed hashtables floating around the Internet so all that is needed is a lookup (O(1) in complexity).

Also, they are meant to give different results for the same password, so a hacker can not use the results gained in cracking one password for another entry.

The main problem here is actually not the whole password hashing or the salt thing. As always it is the ‘not really knowing what it is all about’ that makes the ‘little knowledge’ dangerous.

The algorithms SHA1 and MD5 have been designed to be computationally very efficient and fast. They were meant to be used for verification purposes, not for authentication purposes.

And you can make variations to the input like hash(username+salt+password) but at the end of the day it is just a thing like the millennium bug; don’t try to fudge it away, just fix it now.

How to fix it?

Well you should use a cryptographically slow method. Slow, because that will make brute-forcing not a lot of fun.

And the answer is surprisingly simple… use bcrypt or Eksblowfish an algorithm (from 1999!) that you can make as slow as you want, I will get to that in a bit.

First, the Eks stands for Expensive Key Schedule… and that is the whole key to what we need; it takes a computationally long time to just set the algorithm up, on purpose.

And the setup time is configurable… it is called the ‘cost’; you can select yourself how long the algorithm should work on just setting the key up.

Now, don’t think that making the hashing take 100 times as long will impact your web application… even if you have 100 users per second logging in, the overhead of network, disk IO, database etc. will be so much that this hundredfold increase is not anywhere near noticable to the end-user.

A hacker though will notice… brute force hashing takes 100 times as long as well, and that means each password would take months instead of just hours.

And that gives you enough time to change all the passwords.