You probably didn’t miss the news – and the fallout that followed – about Adobe’s October 2013 data breach.
Not only was it one of the largest breaches of username databases ever, with 150,000,000 records exposed, it was also one of the most embarrassing.
The leaked data revealed that Adobe had been storing its users’ passwords ineptly – something that was surprising, because storing passwords much more safely would have been no more difficult.
Following our popular article explaining what Adobe did wrong, a number of readers asked us, “Why not publish an article showing the rest of us how to do it right?”
Here you are!
Just to clarify: this article isn’t a programming tutorial with example code you can copy to use on your own server.
Firstly, we don’t know whether your’re using PHP, MySQL, C#, Java, Perl, Python or whatever, and secondly, there are lots of articles already available that tell you what to do with passwords.
We thought that we’d explain, instead.
Attempt One – store the passwords unencrypted
On the grounds that you intend – and, indeed, you ought – to prevent your users’ passwords from being stolen in the first place, it’s tempting just to keep your user database in directly usable form, like this:
If you are running a small network, with just a few users whom you known well, and whom you support in person, you might even consider it an advantage to store passwords unencrypted.
That way, if someone forgets their password, you can just look it up and tell them what it is.
Don’t do this, for the simple reason that anyone who gets to peek at the file immediately knows how to login as any user.
Worse still, they get a glimpse into the sort of password that each user seems to favour, which could help them guess their way into other accounts belonging to that user.
Alfred, for example, went for his name followed by a short sequence number; David used a date that probably has some personal significance; Eric Cleese followed a Monty Python theme; while Charlie and Duck didn’t seem to care at all.
The point is that neither you, nor any of your fellow system administrators, should be able to look up a user’s password.
It’s not about trust, it’s about definition: a password ought to be like a PIN, treated as a personal identification detail that is no-one else’s business.
Attempt Two – encrypt the passwords in the database
Encrypting the passwords sounds much better.
You could even arrange to have the decryption key for the database stored on another server, get your password verification server to retrieve it only when needed, and only ever keep it in memory.
That way, users’ passwords never need to be written to disk in unencrypted form; you can’t accidentally view them in the database; and if the password data should get stolen, it would just be shredded cabbage to the crooks.
This is the approach Adobe took, ending up with something similar to this:
→ For the sample data above we chose the key DESPAIR and encrypted each of the passwords with straight DES. Using DES for anything in the real world is a bad idea, because it only uses 56-bit keys, or seven characters’ worth. Even though 56 bits gives close to 100,000 million million possible passwords, modern cracking tools can get through that many DES passwords within a day.
You might consider this sort of symmetric encryption an advantage because you can automatically re-encrypt every password in the database if ever you decide to change the key (you may even have policies that require that), or to shift to a more secure algorithm to keep ahead of cracking tools.
But don’t encrypt your password databases reversibly like this.
You haven’t solved the problem we mentioned in Attempt One, namely that neither you, nor any of your fellow system administrators, should be able to recover a user’s password.
Worse still, if crooks manage to steal your database and to acquire the password at the same time, for example by logging into your server remotely, then Attempt Two just turns into Attempt One.
By the way, the password data above has yet another problem, namely that we used DES in such a way that the same password produces the same data every time.
We can therefore tell automatically that Charlie and Duck have the same password, even without the decryption key, which is a needless information leak – as is the fact that the length of the encrypted data gives us a clue about the length of the unencrypted password.
We will therefore insist on the following requirements:
- Users’ passwords should not be recoverable from the database.
- Identical, or even similar, passwords should have different hashes.
- The database should give no hints as to password lengths.
Attempt Three – hash the passwords
Requirement One above specifies that “users’ passwords should not be recoverable from the database.”
At first glance, this seems to demand some sort of “non-reversible” encryption, which sounds somewhere between impossible and pointless.
But it can be done with what’s known as a cryptographic hash, which takes an input of arbitrary length, and mixes up the input bits into a sort of digital soup.
As it runs, the algorithms strains off a fixed amount of random-looking output data, finishing up with a hash that acts as a digital fingerprint for its input.
Mathematically, a hash is a one-way function: you can work out the hash of any message, but you can’t go backwards from the final hash to the input data.
A cryptographic hash is carefully designed to resist even deliberate attempts to subvert it, by mixing, mincing, shredding and liquidising its input so thoroughly that, at least in theory:
- You can’t create a file that comes out with a predefined hash by any method better than chance.
- You can’t find two files that “collide”, i.e. have the same hash (whatever it might be), by any method better than chance.
- You can’t work out anything about the structure of the input, including its length, from the hash alone.
Well-known and commonly-used hashing algorithms are MD5, SHA-1 and SHA-256.
Of these, MD5 has been found not to have enough “mix-mince-shred-and-liquidise” in its algorithm, with the result that you can comparatively easily find two different files with the same hash.
This means it does not meet its original cryptographic promise – so do not use it in any new project.
SHA-1 is computationally quite similar to MD5, albeit more complex, and in early 2017, a collision – two files with the same hash – was found 100,000 times faster than you’d have expected.
So you should avoid SHA-1 as well.
We’ll use SHA-256, which gives us this if we apply it directly to our sample data (the hash has been truncated to make it fit neatly in the diagram):
The hashes are all the same length, so we aren’t leaking any data about the size of the password.
Also, because we can predict in advance how much password data we will need to store for each password, there is now no excuse for needlessly limiting the length of a user’s password. (All SHA-256 values have 256 bits, or 32 bytes.)
→ It’s OK to set a high upper bound on password length, e.g. 128 or 256 characters, to prevent malcontents from burdening your server with pointlessly large chunks of password data. But limits such as “no more than 16 characters” are overly restrictive and should be avoided.
To verify a user’s password at login, we keep the user’s submitted password in memory – so it never needs to touch the disk – and compute its hash.
If the computed hash matches the stored hash, the user has fronted up with the right password, and we can let him login.
But Attempt Three still isn’t good enough, because Charlie and Duck still have the same hash, leaking that they chose the same password.
Indeed, the text password will always come out as 5E884898DA28..EF721D1542D8, whenever anyone chooses it.
That means the crooks can pre-calculate a table of hashes for popular passwords – or even, given enough disk space, of all passwords up to a certain length – and thus crack any password already on their list with a single database lookup.
Attempt Four – salt and hash
We can adapt the hash that comes out for each password by mixing in some additional data known as a salt, so called because it “seasons” the hash output.
A salt is also known as a nonce, which is short for “number used once.”
Simply put, we generate a random string of bytes that we include in our hash calculation along with the actual password.
The easiest way is to put the salt in front of the password and hash the combined text string.
The salt is not an encryption key, so it can be stored in the password database along with the username – it serves merely to prevent two users with the same password getting the same hash.
For that to happen they would need the same password and the same salt, so if we use 16 bytes or more of salt, the chance of that happening is small enough to be ignored.
Our database now looks like this (the 16-byte salts and the hashes have been truncated to fit neatly):
The hashes in this list, being the last field in each line, are calculated by creating a text string consisting of the salt followed by the password, and calculating its SHA-256 hash – so Charlie and Duck now get completely different password data.
Make sure you choose random salts – never use a counter such as 000001, 000002, and so forth, and don’t use a low-quality random number generator like C’s random().
If you do, your salts may match those in other password databases you keep, and could in any case be predicted by an attacker.
By using sufficiently many bytes from a decent source of random numbers – if you can, use CryptoAPI on Windows or /dev/urandom on Unix-like systems – you as good as guarantee that each salt is unique, and thus that it really is a “number used once.”
Are we there yet?
Nearly, but not quite.
Although we have satisfied our three requirements (non-reversibility, no repeated hashes, and no hint of password length), the hash we have chosen – a single SHA-256 of salt+password – can be calculated very rapidly.
In fact, even hash-cracking servers that cost under $20,000 five years ago could already compute 100,000,000,000 or more SHA-256 hashes each second.
We need to slow things down a bit to stymie the crackers.
Attempt Five – hash stretching
The nature of a cryptographic hash means that attackers can’t go backwards, but with a bit of luck – and some poor password choices – they can often achieve the same result simply by trying to go forwards over and over again.
Indeed, if the crooks manage to steal your password database and can work offline, there is no limit other than CPU power to how fast they can guess passwords and see how they hash.
By this, we mean that they can try combining every word in a dictionary (or every password from AA..AA to ZZ..ZZ) with every salt in your database, calculating the hashes and seeing if they get any hits.
And password dictionaries, or algorithms to generate passwords for cracking, tend to be organised so that the most commonly-chosen passwords come out as early as possible.
That means that users who have chosen uninventively will tend to get cracked sooner.
→ Note that even at one million million password hash tests per second, a well-chosen password will stay out of reach pretty much indefinitely. There are more than one thousand million million million 12-character passwords based on the character set A-Za-z0-9.
It therefore makes sense to slow down offline attacks by running our password hashing algorithm as a loop that requires thousands of individual hash calculations.
That won’t make it so slow to check an individual user’s password during login that the user will complain, or even notice.
But it will reduce the rate at which a crook can carry out an offline attack, in direct proportion to the number of iterations you choose.
However, don’t try to invent your own algorithm for repeated hashing.
Choose one of these three well-known ones: PBKDF2, bcrypt or scrypt.
We’ll recommend PBKDF2 here because it is based on hashing primitives that satisfy many national and international standards.
We’ll recommend using it with the HMAC-SHA-256 hashing algorithm.
HMAC-SHA-256 is a special way of using the SHA-256 algorithm that isn’t just a straight hash, but allows the hash to be combined comprehensively with a key or salt:
- Take a random key or salt K, and flip some bits, giving K1.
- Compute the SHA-256 hash of K1 plus your data, giving H1.
- Flip a different set of bits in K, giving K2.
- Compute the SHA-256 hash of K2 plus H1, giving the final hash, H2.
In short, you hash a key plus your message, and then rehash a permuted version of the key plus the first hash.
In PBKDF2 with 10,000 iterations, for example, we feed the user’s password and our salt into HMAC-SHA-256 and make the first of the 10,000 loops.
Then we feed the password and the previously-computed HMAC hash back into HMAC-SHA-256 for the remaining 9999 times round the loop.
Every time round the loop, the latest output is XORed with the previous one to keep a running “hash accumulator”; when we are done, the accumulator becomes the final PBKDF2 hash.
Now we need to add the iteration count, the salt and the final PBKDF2 hash to our password database:
As the computing power available to attackers increases, you can increase the number of iterations you use – for example, by doubling the count every year.
When users with old-style hashes log in successfully, you simply regenerate and update their hashes using the new iteration count. (During successful login is the only time you can tell what a user’s password actually is.)
For users who haven’t logged in for some time, and whose old hashes you now considered insecure, you can disable the accounts and force the users through a password reset procedure if ever they do log on again.
The last word
In summary, here is our minimum recommendation for safe storage of your users’ passwords:
- Use a strong random number generator to create a salt of 16 bytes or longer.
- Feed the salt and the password into the PBKDF2 algorithm.
- Use HMAC-SHA-256 as the core hash inside PBKDF2.
- Perform 200,000 iterations or more [October 2022].
- Take 32 bytes (256 bits) of output from PBKDF2 as the final password hash.
- Store the iteration count, the salt and the final hash in your password database.
- Increase your iteration count regularly to keep up with faster cracking tools.
Whatever you do, don’t try to knit your own password storage algorithm.
It didn’t end well for Adobe, and it is unlikely to end well for you.
Image of magnifying glass outline courtesy of Shutterstock.
“Also, because we can predict in advance how much password data we will need to store for each password, there is now no excuse for limiting the length of a user’s password. (All SHA-256 values have 256 bits, or 32 bytes.)”
Wouldn’t you limit the maximum password length to say 128 or so to prevent a denial of service attack? If the verification code allows for unbounded passwords and an attacker sends a megabyte of garbage in place of the password, then pkbdf2 with 10k rounds might well become the site’s Achilles’ heel.
Good points. I probably ought to have dealt with them in the article, but it had got long enough already, so I committed what I think you call a “sin of omission.”
I agree that allowing, say, a 1MB passphrase is pointless – and potentially time-consuming – especially as the hash you finally store if you follow my guidelines is only 32 bytes.
I think I shall change the wording to say something like, “There is no excuse for needlessly limiting password lengths as some sites do, e.g. to 10 or to 16 characters.” (I have seen both.)
I’d agree that, say, 128 or even 256 characters is hardly a limit – it’s more a sort of test of reasonableness 🙂
A nice and easy to digest explanation.
Good job Paul!
Excellent article!
Also see OWASP Password Storage Cheat Sheet (and linked “Secure Password Storage Threat Model”):
https://www.owasp.org/index.php/Password_Storage_Cheat_Sheet
Very good plain english explanation. Thank you!
Why couldn’t we use some of the user’s information as salt? For example, the hash could be calculated from the concatenation of information with one of these methods:
(login name)(password)
(subscription date)(password)
(subscription date)(a fixed key, like the name of the site)(password)
(login name minus x letters)(subscription date)(password)
…
Of course, this algorithm should be known only by the developer. What would be wrong with that method?
Also, would it be OK to store users details and the password hashes on separate servers? Thank you for your answers!
The point is that the salt isn’t supposed to mean anything – in fact, it shouldn’t mean anything, or be guessable, so that it really is a nonce – “number used once.”
Its purpose is to make the combination of salt+yourpassword unique *even if you choose a password that is commonplace*.
In your example, you suggest “salt = subscription date + a fixed key.” So everyone who signs up on that day will have the same salt.
In short: use a decent-quality random string, and that’s that.
And never rely on algorithms known only to the developer. Firstly, that’s poor software engineering, because it makes the code unmaintainable. Secondly, it’s security through obscurity, which fails totally as soon as the obscurity is removed.
Lastly, as for storing the usernames and the hashes in separate databases – why not? (As long as you remember that the combined security generally equals the security of the weaker server, not the stronger one 🙂
OK, thank you for your explanation! It’s very clear, as well as your article. I know that your article is meant for every language, but as for PHP, I found a function that already implements your advice for handling passwords: hash_pbkdf2(), so that’s perfect. Great article!
That was our plan – read, understand the back-story, and then find a trusted implementation that’s already available. You did the right thing.
So many people seem to read up about cryptography, realise it’s actually quite hard, start looking for ready-written libraries…and then say, “Nah, it’ll be more fun to try to carve my own algorithm out of string and some offcuts of wood I’ve been keeping at the back of the shed 🙂
By the way, if you are looking for test output (to verify that your PBKDF2 calculations are probably working OK), you will only find “official” results for PBKDF2-HMAC-SHA-1, published as an RFC:
http://www.ietf.org/rfc/rfc6070.txt
You may have to shop around a bit – searching for PBKDF2-HMAC-SHA-256 test vectors” is a way to start.
but date could be much more granulated, microseconds differences would make the salt completely different from a subscription to other
And if an attacker knows when you subscribed (perhaps by checking your email and reading the “Congratulations, you are now subscribed to X” welcome email that you received), what then?
The point of the nonce is that it should be used only once and it should be hard to guess. Your proposal fulfills only one of those prerequisites.
Coming to this comment several years later, I’d like to point out that the nonce is not the password. It is perfectly OK for the nonce to be public.
Hell it’s stored plaintext in the database.
It’s only purpose is to ensure it is unique data that is added into the password during the hashing phase to ensure that a rainbow table is useless. Knowing what the nonce is will not allow you to hack a website. You need a database dump to hack a password; and the database dump will by definition include the nonce, because the nonce is included in the database in plain text!
The only reason not to use the exact time and date of signing up (down to the nanosecond or whatever) would be that it means you are rolling your own implementation of a password algorithm.
The choice of that exact timestamp as the source of nonce isn’t the problem.
For a decently random nonce, just read /dev/urandom on Linux (and many Unices, including macOS) or use the BCryptGenRandom() API call on Windows. (CryptGenRandom() is the old Windows function; it still exists [2020-09-25T12:30Z] but Microsoft has tagged it as “deprecated” and urges you not to use it any more. The new one follows current NIST standards for cryptographic random generators, just to dot that particular “i”.)
Hi! Although I agree that obscurity cannot be the only security method, as it may fall suddenly if code is accessed, can’t it be used as an extra layer? Example is the way salt is joined, that can be in several ways, and only checking code someone can find that..
Example, I have my own algorithm for cripto and hashing (based on a mix of techniques). It may be weaker than I think, but it is “obscure”. I’m combining my own algorithm to SHA256, so I may have the advantages of both methods, the obscure and the very known but safe proven. Which should I let as outside layer, to be seen by an eventual invader? Should I apply my method first, then SHA265 and pretend it is a “regular” SHA to a cracker, or I should use SHA256 first, then apply my own method after, obtaining a hash according my pattern? (a cracker in this situation will realize that it is something alternative)
If you think your algorithm may be weaker than you think, then don’t use it at all. Combining a bad hash with a good hash doesn’t necessarily give you a result that is at least as good as the good one alone. For example, if your hash has a bias and produces non-random output, there is a chance that this could help an attacker find shortcuts that provide a faster way to compute the good hash.
You lost me as soon as you mentioned the word “hash”. I thought it meant Number, symbol #.
What services does Adobe provide that require it’s customers to have a username and password? In other words, who exactly is at risk?
Software purchases, cloud storage, software as a service, discussion and support forums, training, conferences for software developers, graphic designers, artists, managers, etc. Anyone who writes software for Flash or ColdFusion, anyone designing with Illustrator or Photoshop, anyone in charge of purchasing said software for a company, etc. is likely to have an account.
Thanks Kelson. My only association with Adobe is the fact that I use their free reader and flash player. I did get an email from them recently warning me that my username/password may have been compromised and it contained a link to a page where I could reset my login credentials. It looked legit but the purpose of the email sounded like Phishing to me. Since I wasn’t aware that I even had an account with them I ignored their email.
Thanks for a well written explanation.
I am working on my first Joomla (3.2) site and I am not real familiar with how passwords are stored in Joomla. As I understand it, Joomla uses an MD5 hash and it is salted. If my understanding is correct, since the MD5 hash is not safe, does it become safe once it is salted?
MD5 bcomes safe against collisions if it is used in an HMAC (key-hash-key-hash) construction.
Thing is, there is simply no need to use MD5 when less controversial alternatives exist, with or without HMAC. Since one of MD5’s stated design goals was to be collision resistant, the fact that it is not means that cryptographic prudence says: assume it is insecure in general.
That’s why any crypto expert (and any number of standards bodies) will tell you, “Do not use MD5 for anything new.”
As far as I can see, Joomla seems to consider salting passwords to be the sort of thing that deserves the name “Enhanced Password System,” which doesn’t fill me with much confidence about the cryptographic aptitude of the Joomla creators – to me, salting passwords deserves the name “heading in the right cryptographic direction but not there yet” 🙂
Nice article. Also, something tells me it’s no coincidence that Sourceforge’s Project of the month for November 2013 is Password Safe – well deserved – https://sourceforge.net/blog/november-2013-potm/
Well, between Adobe (150,000,000 password records ineptly stored) and Loyaltybuild (actually even worse – close to 500,000 records, including credit card numbers *and CVV codes* not encrypted at all) it is quite the month for revisiting data storage safety!
The LoyaltyBuild story is here if you want to indulge yourself in some righteous indignation and a spot of huffing-and-puffing:
http://nakedsecurity.sophos.com/loyaltybuild-attack
I would have thought storing the salt and iterations in the password file would make it pointless. If the bad guy knows the salt he could just add it himself.
As explained, the salt is not a secret or a “key”. It’s there so that if two users choose the same password, they get a different hash.
Yes, a crook can add it himself – but the point is he *has* to add it himself, a different salt for each user. So he can’t use a lookup table (e.g. a rainbow table).
You can certainly store the salts somewhere else if you like, though you need them accessible at the same time as the rest of the database, so they’re likely to end up on the same server, where they would likely be stolen at the same time by the same method. So it is not poor practice to keep them in with the hashes and the usernames.
Now, if you wanted to encrypt *the whole database as well*, with a remotely stored key, that would give additional protection in the case that a crook stole the database file without the key. But inside the database you’d still want salted hashes and a multi-iteration hashing algorithm.
Why we do store the salt? Shouldn’t this be secret? Again, let’s say the attacker guesses that the password is ‘password’. He can easily compute the hash for this (the same way we do when validating the password!) by using the salt that’s right there in the table, then compare his hash against the one that’s stored… he’ll know immediately if the user’s password was ‘password’.
You can separate the salt from the hash database if you like, and treat the process as some sort of keyed hash so that the crooks need to get both the hash database and the salt database to be able to crack passwwords offline. That’s not the purpose for which we are proposing the salt here – the salt is used for two main purposes: [a] to prevent crooks precacluating a password-to-hash dictionary that would work for everyone and [b] so that if two users choose the same password, you can’t tell just from the hash database.
Hi Paul, excellent article!! Following Benjamin question, I still didn’t understand why using login name couldn’t be an option for salt, as they are unique. Using them seems to acomplish your two goals: [a] to prevent crooks precacluating a password-to-hash dictionary that would work for everyone (as hashes will have an unique salt anyway and will not match any generic table) and [b] so that if two users choose the same password, you can’t tell just from the hash database (as their logins are different, there will be no eaual hashes for same passwords – this will even make the hash be valid only for a specific login). About login being accessible if the database is stolen, it is the same level threat than keeping the numeric salt at the database, as suggested. Usin login as salt seems to me very pratical, being possible to calculate the hash even before accessing the database.
The problem here is that usernames are neither randomly chosen nor secret, so they can be predicted (or, for many aervices, known with certainty) in advance of a successful attack to steal the password database. As you point out, you could calculate the hash before even accessing the database…so crooks could start preparing “crack lists” of likely passwords for some or even many already-known users of the service ahead of time. In return for giving the crooks that very significant advantage, you would save, what, 16 bytes of database storage per user?
He can add it himself, and this is actually a valid attack method against authentication schemes that use the same salt for each password in the database. The attacker simply needs to calculate a rainbow table using the single shared salt and then the database becomes compromised (if the table contains 500K passwords, he needs to compute a rainbow table once and he can recover most of those).
If this is properly implemented however (unique, high-entropy salt per password), this becomes an even longer attack, since the attacker has to brute force EVERY SINGLE HASH (in the above example, because the salt is unique per password, he has to compute a rainbow table FIVE HUNDRED THOUSAND TIMES before he can recover a majority of the passwords).
Storing a unique salt per password is not a problem, since if that salt is leaked, at most you compromise only one account.
A good summary of generally accepted practice today, Paul, but I think we need to go further.
[This comment edited for length]
Attempt Six – Protect from theft of database or backup tapes with a secret
Insiders or hackers may steal the entire file, and have all the information needed to crack it off-line with their parallel GPU systems. So make it harder, by not providing all the information. “Security by obscurity” should not be the entire defence, but it does make it harder for the attacker.
Attempt Seven – Protect additional information
The Adobe breach revealed far too much information as the email addresses and password hints were completely unprotected.
Attempt Eight – Protect and detect breach attempts
Only a small set of programs ought to be able to access the password database. So use operating system level protections to their fullest.
Attempt Nine – Isolate system and use hardware protection
Rather than keep the password hashes on the same system as the web server (etc), have a dedicated system that just performs authentication requests.
If done well, it should be infeasible for even the server administrators to get access to the password hash file. If you’ve gone to this level of protection, you could then consider reducing your password complexity rules, as those cracking attacks just won’t happen!
Ah! Perhaps this might be Part Two. (I know it’s a one part article. But if “The Matrix” can have a sequel – wasn’t that silly, though? they told you how it all ends at the close of the first film! – perhaps this one can too.)
I disagree strongly with one thing, though: “you could then consider reducing your password complexity rules, as those cracking attacks just won’t happen.”
Reducing the complexity rules just sets a low standard for your users. And never say never.
Rule complexity? Well, credit / debit card PINs are 4 digits and pragmatically work pretty well still. But they use hardware protection coupled with blocking access after a few wrong attempts.
It depends whch threat you are defending against. For online systems, long complex passwords basically defend against the failure of suppliers to implement the above rules, i.e defend against poor practice. (Complex passwords are still needed for static data, e.g. encrypted files, where blocking access isn’t possible.)
I keep seeing that Adobe lost credit card data too, but not a lot of sites are making a big deal out of the stolen credit cards, just the stolen passwords. After reading conflicting stories, now I don’t know if credit card number were actually stolen or not. If they were stolen, is there any reason to think Adobe did a better job encrypting my credit card than they did my password?
A very good question! (Sadly, a rhetorical one, I expect, even if you were hoping for an answer.)
For the record, we were pretty keen to remind people that the credit card data breach could be considered worse than that of the passwords. And we asked exactly the same question that you did – what if the CC data was encrypted as shabbily as the about how well they did. We don’t know the answer. We may never know, unless Adobe decide to tell us more.
Some of our thoughts on this can be found in amongst these (one article and one podcast):
http://nakedsecurity.sophos.com/adobe-owns-up-to-getting-pwned
http://nakedsecurity.sophos.com/sscc-119
Adobe claimed to be notifying people whose credit card details were stolen. But not the others in the stolen file.
Note also that some email addresses in the file contain typos. In these cases I doubt any notification would succeed, unless Adobe did it through the credit card companies.
Just a small note that nonce does not mean “number used once” but means … nonces (http://en.wikipedia.org/wiki/Nonce_word)
Errrrrrr, you can’t have a definition that says “nonce means…nonces” (it’s circular 🙂
There are three distinct meanings that I know of for the word “nonce,” viz:
1. nonce [n]: short for “nonsense word”.
2. nonce [n]: (cryptography) contraction of “number used once” .
3. nonce [n]: (British slang) a child sex offender.
The correct meaning in this case ought to be obvious.
Did you clicked on the link ?
What I meant is: the regular English word “nonce” exactly matches the cryptographic usage, making any other etymology highly suspect.
“A nonce word is a lexeme created for a single occasion to solve an immediate problem of communication.”
It’s an 800-year-old word (http://www.etymonline.com/index.php?term=nonce).
You missing one definition (the good one) in your list. Number two is an “uncited” definition that people constantly copy without knowing…
The funny thing is that if a definition gets repeated enough and accepted by sufficiently many people then it becomes, by usage, a recognised and well-cited meaning. (Like “egregious”, which used to mean “super-extra-excellent” but now means exactly the opposite, i.e. “extremely bad”. Or “begging a question”, which now acceptably means “raising a question” rather than “jumping past it”.)
A nonce word in the sense of “nonsense” is in no way “a word that is only ever used once”, for all that it might have been coined for a specific occasion. Nonce words can enter our collective vocabulary because they sound cool. Edward Lear’s “runcible spoon” and Lewis Carroll’s “Jabberwock” are excellent examples.
In the cryptographic sense, “nonce” implies a random (and, indeed, “nonsense”) string of bytes that literally is only ever used once. The fact that the name “nonce” captures both the nonsense nature and the fact that is must not be used again (unlike a “nonce word”, that has no limitation on being referred to again) is presumably what led to it sticking as a simpler way of saying “random string never repeated”.
IMO that usage is sufficiently well-documented and well- accepted to be considered mainstream, for all that we had a word in Middle English that wasn’t “nonce” but led to the literary sense of that word.
Language can and does evolve. If it couldn’t, and didn’t, then we would probably our verbs in English at the end of the sentence still put.
question, regardless of the hashing algorithm (MD5 or stronger), if someone obtains my database and php code, will they be able to reverse the hash no matter how strong it is?
Thanks
The hash cannot be reversed. If they get your PHP file, they will just see that you use the iteration algorithm + salt to create the final hash. In order to find a user’s password they will still need to run through all possible password combinations, attach the salt, do the 10000 iterations per each combination of password + salt per user…
Great article. I have a question from the other side:
I have to use a website that doesn’t follow these procedures. I can point them to this article and explain until the cows come home, they won’t change. How can I protect myself and my information on this site?
Difficult question. Assuming that this site actually retains PII (personally identifiable information) such as your date of birth, credit card number, home address…there isn’t a lot you can do, except to be aware that if they can’t be bothered to store your password correctly, it’s reasonable to assume they’ll be careless with your other information.
Do you give bogus information so they can’t leak genuine PII about you? (You shouldn’t have to, but it might be prudent in this case.) Do you get a debit card specifically for this site, and keep it at a low balance? (Again, this could be prudent.)
I guess it all depends on how much is at stake…
A friend of mine writes his passwords down on a sheet of paper and stores it in his desk. A hacker would have to find his physical address, break into his house and search it to get his passwords. That’s something most hackers are not willing to do.Sometimes simpler is better.
Randy.. We’re talking from the point of view of the website where the username and password are required. The article tells websites how they must write code to make sure a hacker can’t steal your friend’s password from the website’s database.
Were you being sarcastic?
Fantastic article, well explained.
As a side topic, and related to MickTravels’ question – given a database that upholds all of these password techniques, and stores some personally identifiable information about me (email, DOB, address, etc) – if the database is breached, then sure, it seems fairly safe that a hacker would not be able to discover my password and protects my account – but if they have the database, they can likely read all of that other information about me (I would guess in most systems these fields are stored in plaintext / datetime fields) and have the potential to commit other crimes (ie identity theft).
Is there a way of protecting this PII further? (In most cases you want this data reversible so it can be displayed in account summaries in a web application for example, but reversible is not much better than plaintext).
I also really enjoyed this article, it is very informative even now in late 2017.
I also second this question, and it’s a great point that, surprisingly, doesn’t seem to get near the attention that password security receives. Why does it matter if you securely store the password if your data that the password is protecting is stored un-encrypted. The hacker wouldn’t even have to worry about cracking the password, just steal the data.
In my mind I came up with this solution, but I may be completely wrong. You could use 2 way encryption for the PII (such as a secure variant of AES-256). You could then use the user’s own password as the key. The user’s password is securely hashed in the database using the method described in this article. Since the user’s data is encrypted, and the key is hashed (instead of just stored in a readable form somewhere) I would expect this to be secure. To retrieve the information, you have the user enter their password during login. You then verify it against the hash, and if it matches, you then use it to decrypt the PII and read it into memory. It is only stored in memory during the user’s session, and then it is removed from memory when they logout.
The problem I see with this is that the PII is exposed to anyone on the server while it is stored in memory (such as an in-memory session state engine). I’m not sure how you get around this though because it would be unfeasible to ask the user to enter their password on every single http request to the server. At the very least though, it would be better to only expose the cached data for users with an active session than the entire database of data (not that exposing even that amount of data is good).
Can anyone tell me a better way or what would be wrong with this method?
As you say, using the password directly as the decryption key for the database would mean keeping the password in memory and passing it along to the database engine. So a derived password (using another key derivation function) would be better.
Using a user-supplied decryption key like this would also stop the database being accessed when the user was not actually logged in – sometimes that is not what you want. (For example, to calculate your phone bill and send you a statement, the phone company will need to access your call records while you are not logged in. Of course, it also needs to add those call records in the first place while you aren’t logged in.)
If your goal is to store data that only the user can read (for example, a cloud backup service) then you might as well receive, store and transmit it encrypted so that it only ever exists in plaintext on the user’s computer. That way the decryption password never leaves their computer. This also means you can’t be subpoenaed to reveal the data because you simply cannot unscramble it – assuming your local laws permit you to operate such a service.
How about users making random passwords maybe 50 characters long, upper case, lower case, numbers, symbols, etc.
Put them on a spreadsheet and store it on a USB drive. Use the USB (copy and paste) when accessing your accounts and immediately remove the USB stick when you are done.
Keep the USB stick in a safe when you are not using it along with a printed copy of the spreadsheet in case of data corruption in the flash drive.
Small electronicly locked pistol safes are available that could be bolted on top of (or under) a workstation to keep the USB stick within easy reach. I’ve seen them on sale for under $80 US.
This isn’t about how you store your own passwords for logging in to multiple services. It’s about how operators of web services store a representation of all their users’ passwords so they can check them in real time when anyone logs in.
For advice on handling your own passwords, you might like to check this video we made:
https://nakedsecurity.sophos.com/2014/10/01/how-to-pick-a-proper-password/
Thanks for the article. I’ve read about safe storage of passwords, but never really understood about the 2 last attempts (salt & iterated hash). This did a great job not only to understand, but which algorithms to search for.
what happens if a bad(rogue) administrator /DBA notes down the salt and count for any user
With the salt+count+hash you can mount an offline dictionary attack. A rogue sysadmin could indeed abuse the data for that purpose.
However:
1. The count makes each dictionary attempt take longer.
2. The salt makes each hash different, even for the same password.
Can anyone comment on their approach to storing the TYPE of algorithm used? Think about this, say our primary application is built entirely on a Java 1.6 framework and there’s no option to upgrade immediately. With Java 1.6, we only have access to PBKDF2WithHmacSHA1. The plan for the next 3 months includes a move to Java 7 then in 9-12 months to Java 8.
With Java 8, we’re presented with even more secure hashing options including PBKDF2WithHmacSHA512 and for whatever reason we’ve decided to move to this new option. We now have a need to somehow know the TYPE of algorithm(s) used for each user login.
Today there are users with salted, hashed passwords constructed using PBKDF2WithHmacSHA1 — today they log in, we do the match using that algorithm type, access is permitted if matched. Consider what happens to that same that’s gone dormant for a while, but shows back up after 9 months where we’re on Java 8 and have moved to this new algorithm.
There’s a few options obviously, but is there an approach that would be recommended, perhaps new logic that enables today’s login match but also, once matched successfully, uses the cleartext to construct a new PBKDF2WithHmacSHA512-based password that’s then stored along with it’s updated algorithm TYPE. I’m not sure how I feel about that.
Thoughts?
Linux supports a number of different password salt-and-hash schemes. The hash that was used is stored along with the hash itsaelf, denoted by a special substring at the start of the password hash field, e.g. $1$ means MD5, $6$ means SHA-512. You could do something similar.
Or, in your case, because the only choices are SHA-1 and SHA-512, you could differentiate between the two sorts simply from the length of the hash.
Any, yes, you could run both mechanisms in parallel for a while, and transparently update users’ hash types (and hash values) when they next login. That’s a clean way to leave the past behind. The only potential SNAFU I can see with that, from an audit or change control point of view, is that you will probably need to give write access to the password file to a part of the system that didn’t need it before.
After a reasonable time, you can identify all the accounts that have old hashes, and invalidate them for inactivity…
BTW, really enjoyed the detail in this article — unparalleled explanation of a difficult subject! Thanks so much Paul…
Excellent article. Can I suggest updating it to recommend the Argon2 algorithm?
I’ll just post this here instead, where Argon2 gets explained:
https://password-hashing.net/
Argon2 is a fairly new algorithm that was the outcome of a competition to pick a new-look password hashing function. Unfortunately, the above site, which is apparently the official one, it surprisingly terse and gives you very little to go on. That makes it hard to know why you ought to switch from, say, PBKDF2 with SHA-256.
Here’s a great idea to never forget your passwords. First make your own excel sheet with all numbers, letters, and allows those weird symbols like @#().
Keeps this excel sheet in the cloud somewhere, preferably Google.
Then instead of writing your passwords plainly on a piece of paper, write the corresponding cell numbers from that excel sheet on a piece of paper. So your password ‘IamCool’ would be written like something as A13,F10,H23,D11,K12. Will you get the point right? Just never keep both pieces in the cloud. Oh yes abd you have to remember the cloud password only.
So if you forget your password, simply translate the cell entries ( from the piece of paper on your wallet) using the excel sheet (fromm the cloud).
Pretty cool if you asking me. Forget all those rubbish softwares. Do it yourself.
Substitution ciphers are an even worse idea that MD5, generally speaking, if you care about secrets. People solve these as puzzles for fun, the value as a practical tool for anything other than semi-secret notes in elementary school is minimal.
In other words, your bar for “pretty cool” is rather low.
Do you use the meta headings on web pages then? A friend
told me to not bother with them a few months ago as
they help competitors
Added a link on Facebook, hope you dont mind
Isn’t it a bad idea to store both the iterations and the salts? With that info, any attacker could just hash with the provided salts and iterations the most commonly used passwords and probably succeed with a few users just because we helped them!!
Also, if we store that salt in the table, at least I would concatenate it with some non-modifiable user fields, so the actual salt used is not exactly the leaked one. For example, UserID + Username (unless it’s modifiable) + SSN or any other ID + some fixed random characters (not stored in the DB).
Finally, I’d rather use a not round number of iterations, as that also simplifies things for the intruders, who would obviously only try 1k, 5k, 10k, 20k, etc. iterations. and not, let’s say, 18739.
The iterations and salts are not there to perform as a cryptographic secret. They effectively become part of the hash, which is why they are stored with the hash.
You could also choose to encrypt the authentication database on the server side, using a cryptographic key stored somewhere else, and to decrypt each record as needed, but that’s an extra layer of security that’s beyond the scope of this article.
Sorry for the late reply. I know the key is hashing the password, but I insist: if an attacker obtains this table, isn’t placing the salt and the iterations there helping them? I think that if the salt was something like the username + some predefined constant, the password would be more secured than having the salt just lying there. From what I’ve found, it seems storing the encrypted HMAC salt is the recommended option.
What I mean is that the attacker needs password_guess, salt and number of iterations (supposing he guesses the PBKDF2 algorithm) in order to discover a password, and here we are giving him two of them already.
You are right that the only thing the attacker will be missing is the user’s plaintext password. However, this flaw is not specific to hash stretching (repeated hashing), but also to single hashing. The benefit of hash stretching over single hashing is simply that it takes the attacker longer to run the hashing computations with any possible password.
Of course, but I think that’s beyond what I’m pointing out here. It does’t matter if it’s bcrypt or MD5, giving away the salt definitely helps the attacker.
Giving away the username and the hash helps the attacker, too :#45;) Storing the salt separately from the hash is feasible, but because the authentication process requires both to be accessed at the same time, you need to assume that any hack that gets the crooks access to the hash data will probably get them access to the salts as well. Storing them in separate databases won’t make things *easier* but it’s unlikely to make things much *harder*, if indeed it makes any difference at all.
After all, if you can find a way to store the salts more securely than the hashes…
…why not use that same additional security on the hashes as well?
This might be a dumb question, but if I have a hash of a password created with 10k iterations and then I choose to go to 20k, isn’it it somehow possible to just “add” 10k iterations to the 10k-iterated hashes? I mean, when you hash a plain text password with 20k iterations, will the intermediary result you reach after 10k iterations be different?
With PBKDF2, you have to feed the password in at each iteration, so you can only calculate (or extend) the salted-and-stretched hash when you have the password in memory, in other words, when the user logs in.
So you may as well choose a new salt and a new iteration count, and redo the whole calculation
Hi,I have an inquiry….
If system administrator of a portal has stored all the password of users in plain text without mentioning it in any terms and condition then is it any cyber law against that system administrator that admin didn’t masked any users password and it is in easily readable tabular format.
I don’t think any countries have a law about it. But I think that in some countries the privacy watchdog might use it as evidence of poor practice if there were some kind of data breach investigation.
could use it as evidence
Can you comment on the security of the password_hash and and password_verify functions in PHP? Are these more or less cryptographically secure than this PBKDF2 algorithm?
According to the PHP website, versions from 5.5 on have used bcrypt.
I’m struggling to understand the reasoning behind this: “The point is that neither you, nor any of your fellow system administrators, should be able to look up a user’s password.”
Exactly what is the danger or fear there? That an admin will access a resource intended for the user? As an admin of a small shop, I already have access to all systems anyway. I don’t need their password to access what is on their computer or in their email account. But, what having that password does do is it helps me support them better.
So I’m curious if you could elaborate on this with a real example of why you consider it bad.
The explanation is set out in “Attempt One” of the article. Passwords are meant to be a verifiable secrets known only to each user, and there is simply no need to treat them any other way.
If your motivation is that you can “help” your customers by reading back their passwords when they forget them, then IMO you are teaching them bad habits (that passwords are OK even if not secret) and softening up the company for social engineering (e.g. when you are not around.
As in the article:
“The point is that neither you, nor any of your fellow system administrators, should be able to look up a user’s password.
It’s not about trust, it’s about definition: a password ought to be like a PIN, treated as a personal identification detail that is no-one else’s business.”
That’s my 2c, and I’m sticking to it 🙂
(sorry about a late-ish additional reply, but there’s an additional consideration I think needs adding) There’s an additional future-proofing issue around password support like originally commented on. Sure, you can read back users’ passwords and that works when you’re a single person running a small site with only a few dozen users.
What about in 50 years’ time, when your site is admin’ed by a bunch of people who aren’t as tech-savvy as they perhaps should be, and you’ve got a userbase of several thousand. Or more? That kind of support becomes much, much harder.
And this protection system’s harder to reverse-engineer into your site once it gets to that tipping point. It’s far, far better to build it into your site from the beginning and let it run… and, perhaps more importantly, get your users used to *not* having the expectation of that level of support.
Oh, and one other thing about that kind of support around passwords. Identity theft. You might have a user ring up and say “I’m so-and-so with such-and-such login but I can’t remember my password.” Can you guarantee absolutely 100% the person on the other end of the phone line/e-mail message is who they say they are? You might have been hacked, had the username pulled out of the database, but the hacker then attempts a phone call rather than a password crack… and then you’re broken into anyway with what might be a lot less effort! I get there are ways to protect against that (such as SMS a one-time-code for them to then read out… but even that’s not entirely secure as phone calls can be hacked and intercepted too), but best not to rely on them since all it takes is one failure to follow procedure and you’re back to the original scenario.
So, from the perspective of potential identity theft, you’re helping your users less than you might think by offering that kind of support around passwords. And you’re helping them more than you think, by *not* offering such support.
Excellent, excellent article. I made it a “must read” for all of our staff programmers.
I absolutely love the article’s layout:
1. Here’s 1 method.
2. That’s not a good idea.
3. Here’s why that’s a bad idea.
4. Here’s another method that solves the above shortcoming.
Unfortunately, as the article progresses, step #3 started being skipped more and more.
(Probably for space/time concerns.)
Example: The salt should be random, never predictable. But the salt it only used for two
reason. (Avoid making the same password have the same hash. Avoid precalculated lookup tables cracks.) But you *GIVE* them the salt right in the database.
I don’t want a hacker predicting my salt is xyz…. but I’ll tell you ahead of time that it IS xyz.
Why can’t the salt be something 100% unique in our database? (Like each user’s email address or sequential number.) Predictable, yes…. but 100% unique.
The salt should *not* be predictable, or else a crook could start calculating a password cracking dictionary for selected users in advance, and leave that going while he started trying to get hold of the database of hashes.
By all means keeps salts and hashes in separate databases if you think that will make it harder for a crook to steal both at the same time. (Of course, this is likely to make your password verification software more complex, and could therefore introduce errors you would otherwise avoid.)
The reason why the salt should be properly random and never predictable is explained in the ATTEMPT FOUR section.
Please update what you advise the number of iterations developers / designers should use? – Currently it is 20,000 as of June ’16.
NIST 800-63B, 5.1.1.2 still recommending 10k iterations. Can you elaborate?
Very nice article, thanks. A small nit, I don’t see that Adobe tried to invent anything, they just went with something basic and fragile – it gave up all the passwords at once. And of course the passwords themselves still need to be strong, or at least strong-ish, to help make them hard to break either cryptographically or by guessing.
I am new to the security world and this article made some basics clear
Thanks Paul 🙂
This is a very useful article – thank you.
Is the bulk of the advice still current in 2017? I know that SHA1 is now considered insecure and the article advises against it because SHA256 is better. Are there any other items that might benefit from an update?
“A salt is also known as a nonce, which is short for ‘number used once.'”
It should be noted that a salt and a nonce are not the same thing. A salt is in fact reused (as the article states) every time the hash is computed for a particular user’s password. A nonce, however, is used once and only once–often as part of an authentication protocol when establishing a connection.
Nonce is also NOT short for “number used once.” It’s named after “nonce word” which is a word used “for the nonce” or “for the time being.”
“Nonce” in cryptographic jargon is well-established through usage (including in the literature) as being short for “number used once”. Whether this is a back-formation (i.e. the explanation was invented later because it sounded good) I don’t know, but that is irrelevant to current meaning and usage.
As for “nonce word” (see the works of Charles Dodgson, who wrote under the nom de plume Lewis Carroll), I think you will find that stands for “nonsense”, meaning that it was made up, not for transience. Nonsense words aren’t necessarily transient – indeed, decent ones and may end up in the dictionary. For example, jabberwocky appears in in my New Oxford American Dictionary, defined as “invented or meaningless language, nonsense.” (Note that final word!)
Great article, thanks! I don’t understand how the server can recover the user password in the clear. I understand that if they use CHAP, for example, the server will need to be able to compute the hash of a challenge plus a secret (being the user’s password). With only the hash of the user’s password I am confused as to how that can work. Can you please help me out?
In the system described above, the client does need to send the password to the server during the login process (via HTTPS, of course!), but the password only ever needs to be in memory temporarily – long enough to perform the salt-hash-stretch process – after which it can be discarded and never needs to be stored.
Hi Paul,
Great to see this article started in 2013 is still active.
I understand the article and I think it is great, however I don’t like one part, why do you say “Whatever you do, don’t try to knit your own password storage algorithm.”?
Seriously, I can do what the above article says and add in more “reasoning” would only make it more secure. Example Ceasar-Cipher. Why would you say not to do anything but the standards? To me it smells like the GPU can crack this easily so don’t change anything that stops the GPU can crack it.
Paul’s at Infosec right now but I managed to extract this from him by way of response:
“You’d think that if you added home-made crypto on top of secure crypto that the result would at worst be the same (i.e. that the security might go up but could never go down).
But that is not true – crypto is not only about the algorithm but how you use it. After all, when you try to code up a variant of a trusted algorithm you might introduce a mistake – and your variant will not have had the scrutiny and expert attention that the original had, and you will not have any trusted reference implementations to refer back to in testing.”
Perhaps more effective than simply doubling the iteration count annually would be to select an iteration count between 10-11K for each user? That restricts a rainbow table for 10,NNN iterations to solving at most 0.1% of the passwords.
The salt means you already need one rainbow table per user, because each user has a unique (salt+password), even if two users do choose the same password. The iterations are there to extend the time needed to test each (salt+password).
I thought I’d place this here as it’s more visible to others who may have the same question:
So if I have a salt generated with PHPs cryptographically secure random_bytes function which is then hashed using HMAC-SHA-256 with the password twice then hashed again using PBKDF2 over 50k iterations it’s still worth using the PHP password_hash than knitting my own? (I’m sorry I am still new to this whole thing)
If so, is there anything worth doing with PHPs(or any languages) other hash functions or just sticking entirely with password_hash?
If you are going to use PBKDF2, then there is no point in adding a couple of additional, home-made HMAC-SHA-256 “magic iterations” at the start – just use PBKDF2.
As far as I know, the password_hash() function in PHP now uses bcrypt by default, and generates a salt for you by default using a half-decent random generator. So you might as well just password_hash() directly with a suitable iteration factor (which IIRC in bcrypt is the logarithm-base-2 of the iteration count, not the actual count itself). But please verify for yourself that the password_hash() function in the PHP version you are using works as you expect, and isn’t some older, less suitable sort of password hashing function.
Very nice article. One question. Would it be practical to separate users ID data from users other data in the database by using hashed ID keys, to link the user and the user’s data in the database. For example in order to separate users name and address from user credit card data?
I wouldn’t put anything but the authentication data (e.g. username, hash, salt, iteration count) in the authentication database. (A. Why? B. This makes it possible to move the authentication part to its own server.)
Hi, I want to make a personal password storage, where it would store my passwords with salt and hash. I want to also decrypt the passwords and view when required. E.g. I want to see my outlook password , I should be able to decrypt it and view it.
Mostly I have read so far is the passwords are matched to salted hash and verified. But in my case I want to store my passwords securely and view only myself as there are so many passwords now a days. I dont want any unknown person to view it.
Is is possible to have this ability with salted hash password storage?
Nope. The hash is derived from the password and its principle virtue is that you can’t decrypt it. For personal password storage, use a password manager…
https://nakedsecurity.sophos.com/2016/07/19/why-you-should-use-a-password-manager/
I’m coming at this from a web app perspective. This article is fantastic from a server perspective, but I’m wondering what to do when adding a client to the mix. Even over HTTPS, the client’s password should be hashed. Before sending. Is it sufficient to hash the user’s password without salt on the client, send the hashed password to the server, and then perform salting and hash stretching (as described in the article) before saving the result to the database?
The alternative would be to add salt before hashing the user’s password on the client. However, we can’t request the user’s salt from the database, or else the attacker would be able to figure out which usernames do or do not exist. We also can’t use a random salt, or the hashed password will not end up matching what’s in the database. The only other options seem to be using a fixed salt for all clients, which seems a bit pointless, since this code will be on the client and available for all to see.
So is it correct that singly hashing the user’s password on the client is sufficient before transmission to the server?
If you’re going to do salting and stretching on the server, what advantage is there in hashing the password on the client? TLS obviously protects the password/hash in transit, so are you trying to prevent the server ever seeing the plaintext password?
As to salting on the client, you could generate a random salt on the client and pass it back to the server along with the hash.
If you want to avoid supplying the actual password to the server (even though it’s going over TLS), then you probably want to check out SRP (Secure Remote Password Protocol).
From my point of view in 2019 Argon2 must be used as KDF
If the whole Argon2 selection process were a bit better documented and its own website a bit more forthcoming I might be inclined to consider it. Unfortunately, Argon2 is a bit of a hammer looking for a nail – and for that reason it still hasn’t really attracted any public cryptanalysis (by admission of its own website) even though it has been around since 2015.
Excelent text!
Use a remote salt – pass the user ID to a remote server behind a firewall, which returns the salt. Rate limit the salt requests. Even if the bad guys get your entire database it’s essentially uncrackable.
If “they get your entire database” then you have to assume that they have both the password hashes and the salts. If they can hack in and get the hashes, you should work on the pessimistic assumption that they could get the salts, too. (After all, access to the usernames and hashes ought to be no less secure than the salts, with both a firewall and a rate limiter – if you are willing to go to so much extra trouble for the salts then why would you scrimp on protecting the rest of the authentication data?)
Remember that the crooks aren’t supposed to be able to get any part of your authentication database in bulk. So splitting the salts and the hashes (and, for that matter, the usernames and iteration counts) and securing them on separate servers in different ways might help – but that still doesn’t add anything against an *offline* attack, where your firewall and rate limiter are no longer in the equation.
@author
Is there a typo about order in HMAC-SHA-256 image?
Arcording to wikipedia
https://en.wikipedia.org/wiki/HMAC
inner padding FIRST(repeated bytes valued 0x5c)
and THEN outer padding (repeated bytes valued 0x36)
Is it right?
The flow of calculations in the diagram starts at the very top of the image.
The first hash (the ‘inner hash’) uses the key XORed with the ‘inner pad’, which is 0x5C repeated.
The output of the first hash calculation feeds into the second hash (the ‘outer hash’) where the key is XORed with the ‘outer pad’ of 0x36 repeated.
The names inner and outer come from the functional notation:
H = F(k XOR opad,F(k XOR ipad,data))
Note that the inner F has to be calculated first, even though it appears second in the function above. So I placed it at the top of the image to make that clear.
@Paul Ducklin
I just confused about order of hash
Which one is the first?
By https://en.wikipedia.org/wiki/File:SHAhmac.svg
and https://tools.ietf.org/html/rfc2104
[comment shortened to save space]
IPAD FIRST (0x36) […]
and then
OPAD SECOND (0x5C)
Tks
Aaaaaaargh. The diagram had ipad and opad the wrong way round!
The ipad string 36 36 36… should be at the top of my image, feeding into H1 (the inner hash). The opad string 5C 5C 5C… should be lower down, feeding into H2 (the outer hash).
I have edited the diagram [2020-02-05T12:00Z] – please see if it looks right now!
yes, just a typo in image
Hello,
I understand that salts have to be unique, but by randomly generating it is there a possibility that it will generate something that has previously already been generated? If yes then will this cause any problems?
Sorry if this may seem obvious but I’m very new to this.
Thank you!
If you use a decent random generator and a long enough salt (e.g. 16 bytes or 128 bits) then the chance of a duplicate in your database is so small that it can basically be ignored.
Having worked as a Linux System Administrator in a Public Health England IT department where IT security was super-paranoid (due to patient data protection concerns), maybe the model for storing ‘passwords’ needs to be looked at?
Instead of storing a hackable password, what’s wrong with using a public/private key pair like an SSH login does?
So instead of a website storing a potentially hackable password, it stores the public part of your SSH key pair. If the public part of your SSH key pair gets stolen, it’s no use to the hacker that does not have the private part of the key pair.
Public & Private SSH Keys
These SSH keys are based on asymmetric cryptography, which uses two different, but mathematically related keys (known as a key pair), to authorize and encrypt connections. In the case of SSH, private and public key pairs are generated to authenticate users for remote access. The public key sits on the remote system and provides access to any user or device who has the corresponding private key, which serves as a method for authentication.
The above taken from:
https://info.keyfactor.com/what-is-ssh-key-management#how_ssh_works_understanding_ssh_keys
Lots of online services do use so-called public key authorisation, which works as you say… it’s like web certificate checking (where you use a signed public key sent by the server to validate that the server must have access to the corresponding private key), but the other way around, where the server validates that you have the private key to match the public key you uploaded earlier.
I guess the reason that only some sites (or some types of site, e.g. those related to coding) use this process is that it’s slightly more complex to set up… and if you want to login from multiple devices it typically means duplicating your private key, which increases the chance for it to get stolen or lost. Also, you ideally need a password to protect your private key, too, in case it does get stolen (or to prevent automatic authentication where you didn’t intend it), so for the average user the workflow is similar – you still need a good password for your private key – but with the extra step of generating the keys, securely storing the private one and uploading the public one.
FWIW, my own ssh server is protected with a username+password pair AND by public-private authentication, with a second password for my private key. That’s because I only ever use it for interactive shell logins, so I don’t need to pare it down to an unencrypted private key only, as many coders do when automatically synching to a source code server or similar.
“FWIW, my own ssh server is protected with a username+password pair AND by public-private authentication, with a second password for my private key.”
It seems like you are not taking any chances there Paul whatsoever 🙂
Well, it’s a sort-of 2FA… and it limits where I can login from, to discourage me from thinking, “I know, I haven’t got my own laptop with me so I’ll just login from someone else’s computer so I can do some quick work, and then rush home and change my password :-)” It also stops me logging in inadvertently from inside one of the many VMs I may have at any time, e.g. to get at test source code. I’m forced to get data into my research use (read: possibly insecure) VMs using a less sniffable method, e.g. get it into a read-only share on the host OS first, or copy it *into* the VM from outside via a local-only TCP port listening on the host and terminating inside the VM.
Great article! This really cleanly explained how hashing works and how companies can securely validate login credentials. Building a program as a programmer with little experience on this side of things, I was having trouble finding something that concisely communicated what I needed to make happen in my database; this was the perfect answer! Thanks!
Thanks. Glad you found it useful.