As you may know, Philips recently suffered a data breach, when a hacking group exfiltrated a bunch of small databases and dumped them on a public drop site.
One of the databases included about 400 password hashes – a handily compact but real-world sample set for demonstrating some important points about password choice, use and storage.
To get a feel for the sort of passwords Philips customers had chosen, I decided to have a crack at them, using the popular open source software John the Ripper.
I wrote yesterday about some of the egregiously bad passwords I found, such as 123456, 12345678, 999999, and (several times) the rather obvious philips, but the actual passwords I recovered weren’t as interesting as the rate at which I recovered them.
Let me show you what I mean, using one of the trendiest media instruments of 2012: an infographic! Or, in this case, an mini-infographic:
The image above covers a two hour period during which I set a single CPU core of my not-very-fast laptop at the Philips password hashes. The graph traces out a cumulative total of how well I (or rather my laptop) was doing.
(In the interests of science, please don’t read too much into the look of the graph above. The sample size is small; we can’t be certain that the hashes are genuine, since we only have the word of cybercrooks to go on; and cumulative graphs tend to have visually appealing shapes anyway because they only ever go upwards. It’s the thought that counts.)
There’s a huge and rather obvious lesson to learn here: don’t be at the left hand side of the graph.
A significant number of users chose passwords that as good as guessed themselves – I’d cracked close to 20% of the hashes in the first second of John the Ripper’s run.
However, after I’d cracked about half of the passwords, which took about 50 minutes, the law of diminishing returns kicked in. So I repeated the cracking experiment.
This time I didn’t rely on John the Ripper’s password generation algorithms, but used a collection of dictionaries, including lists of Dutch words. Philips is a Dutch company and, judging by the names in the database and the passwords recovered at my first attempt, so were many of the users.
With about 20 million potential dictionary passwords in my list, downloaded from free and easily available public sources, I got much the same result, with an intriguingly similar shape to the graph:
There’s an important difference, though: the purple-tinted image above covers not a two hour period but just two minutes. Remember, this is using a single core on a laptop that’s several years old.
The dictionary-driven attack also recovered about 50% of the passwords before running out of puff; combining the two sets of results revealed 66% of the passwords in the list.
Let me say it again: don’t be at the left hand side of the graph.
(Enjoy this video? Check out more on the SophosLabs YouTube channel.)
Hint, try not to use passwords or accounts in the first place. If data is of value then split it up – if your service provider refuses then get another who will take your conserns.
Try to have a last-log in stamp, but if not ask for IP address range restrictions to be placed on your account access.
Last, dont eva trust so called security experts – if its high value then keep it away from a computer on any connected network! Also I wouldn’t trust a single box sitting in the corner unless its in a RF cage but thats just me.
You don't really have a choice nowadays. Take three guesses how much you can do without an e-mail account.
Now take three guesses how much someone *else* can do with *your* e-mail account.
Strong passwords are definitely a necessity, and while there may be other, better options, they're not always available for what you need to do.
This raises some interesting questions,
– With the law of diminishing returns not withstanding, how long would it have taken to crack all of them?
– The 34ish% that were not revealed what complexity might they have had?
– What additional protection might be offered by using encryption?
– Why not simply use encryption when storing sensitive data?
– Is 2 factor authentication an obvious and clear conclusion here?
Errrr, here we go. You did ask 🙂
1. How long to crack _all_ of them is something of a "how long is a piece of string" question – I might have stumbled on another whole raft of hashes if only I'd waited another few minutes; more likely I'd still be grinding away yet.
2. See 1. Since I don't know what the missing passwords were, I can only guess how complex they might have been. My John the Ripper "generating passwords" attack didn't get as far as trying anything longer than 8 chars in the first two hours, so the missing passwords needn't have been terribly complex to have escaped my lightweight attention. The bottom line is that the passwords I did recover were woefully weak.
3. The hashing of passwords is already a sort of password encryption system (though in this case they ought to have been salted and iteratively hashed, too). If you mean "what if the whole database file were encrypted as well as hashing the password fields," then that probably wouldn't have stopped this attack, which was most likely SQL injection. So the SQL server would have been authorised to "see through" the encryption in real time. But having the database file or the whole server encrypted anyway wouldn't do any harm – especially when the time comes to retire the server disks! It would also help boost security against attacks by means other than command injection.
4. See 3.
5. 2FA can be a big help, depending on how you implement it. Of course, if you lose the server-side secrets which let you verify the 2FA responses, then you're back where you started – 2FA protects more against poor user-side security hygiene (for example, keylogging and shoulder surfing) than against shabby server-side security. And 2FA costs extra money/effort, so we're unlikely to see it used for the sort of microsites which seem to have been hacked here.
Well, as for your how long would it take to crack the rest question. Something to take into account is that this test was done using a single core of a slow processor. Modern GPUs are MUCH more efficient at this kind of work and you can even link multiple GPUs to work in parallel. So, if someone serious attacked this selection of hashes I am willing to bet most would have fallen quickly.
This is not to say you would get them "all". That all depends on how secure the passwords become. I can say though that brute forcing passwords these days has become trivial for anyone with resources to spare. I would never trust a password if I learned the hash make it into the wild. Best bet is to always quickly change passwords after a hack. You never know who might be board and have a spare computer to dedicate to cracking. Even a secure password will fall if someone is willing to let a computer hack away on it for months.
I'm afraid I can't agree with your claim that "brute forcing passwords these days has become trivial for anyone".
Much easier, much faster, more likely to yield impressive results, yes.
Trivial for anyone is putting it too strongly. If it were trivial there would already be no bitcoins left.
I've just got one question.
Does John The Ripper have precomputed hash values (like rainbow tables) for cracking passwords or does it generate its hashes on the fly?
The latter.
It really doesn’t matter Mark if JTR does or doesn’t. He used a laptop CPU. If he was really serious about cracking passwords he would have used a graphics card or better still several graphics cards.
I'm not going to show my math, but I wasted some time this morning estimating oclHashcat-plus on a Radeon hd7850 is about 600x faster at MD5crypt than single-threaded JTR on an Intel E8400 Wolfdale-based processor (which is probably beefier than the laptop Paul tested with). So… if JTR takes 50 minutes to brute 50% of the hashes, Hashcat is reaching the same point in about 5 seconds.
I keep waiting to hear some evil genius has deployed a botnet-based GPU cracker. At that point, the "left side" of the infographics is going to get a lot more crowded.
Thanks for doing the measurements – I didn't post corresponding figures for hashcat as the graphics-card-enabled versions don't support OS X, which I'm using, and the licensing expressly forbids business/commercial use (presumably for legalistic reasons). It also forbids use on hashes which aren't your own (again, presumably for legalistic reasons).
It kind of does matter mate. If it's using a library of precomputed hash values the password cracking process is going to be A LOT faster than if the computer has to calculate hashes for each possible password on the fly (ESPECIALLY if you are using a slow computer like Paul was ;)).
Interesting information, thx for sharing.
But explain a user in a production company that he has to choose a 13 chars long password….
As Graham's video shows, it's not *that* hard to think up long passwords which probably won't ever be guessed but which you can remember…
You could try the approach of telling him he has to have a 26-character password and then pretending to be nice by pretending to change your mind and halving the minimum to "only" 13 characters 🙂
The longer passwords (such as correcthorsestaplebattery) are harder to use on the internet, as many sites now limit the length of passwords.
I have recently been through the process of changing all of my passwords and I discovered that 8 characters, starting with a letter but including at least one number, was the favourite format.
I wrote a blog some time ago (see goo.gl/occvS) about creating multiple, complex passwords which are easy to remember.
you said about 'F+Wsdfadoe&H' and 'g4STHGs2wi'veDh': "They have the sort of length, complexity and weirdness you need, but they appear online."
could you pls tell how to check if a password is online without getting into a risk? thx a lot!
This is actually a very good question. Every time you type a password into a search engine to see if it appears online, you're putting it online.
I guess you could search for "password is" and similar phrases and then search through the results locally.
That said, it's perfectly fine to *include* passwords found online in your password. For instance, until I posted this, "swordfish is.a.*very*insecure password ~cupcakes" was probably an extremely secure password — depending on how it was stored, and assuming that the salted hash doesn't collide with something easy to guess.
In practical terms, it often comes down to the old joke about how fast you have to run to escape a charging bear — the answer being "faster than the guy beside you." If you choose a password that is harder than 75% of the others in the list, the likelihood is your password will never be cracked unless a) your account is being specifically targeted or b) 75% harder is still trivially simple for the systems tasked with cracking the list.
Perhaps I overstated my case a little.
As @Andrew Ludgate points out, strictly speaking you "put X online" by searching for X online. Worse still – at least as far as the search engine provider is concerned (and anyone else in the internet cafe if you aren't using HTTPS :-), X is also incontrovertibly and intimately connected to you.
I suppose you could search for just the first half of the password you had in mind and check that you don't get anything back. If you don't get any hits for correcthorse, then you should be OK with, say. correcthorseegalitarianplastic.
Or you could assume that your search term won't become public knowledge any time soon. (AOL revealed anonymised search terms "for research purposes" a few years ago and it ended badly – as a result it's no longer the done thing.)
For really important stuff, like banking, consider picking a service which protects each transaction with a one-time password using something like a 2FA token (see question from @Tony above).
This year my 76 year-old grandma got her first computer since the 1980's. She had a couple concussions a while back, so her memory isn't what it used to be. I sat down with her and talked to her about security, and was able to convinced her to use the correcthorsebatterystaple method for making passwords. I got her a password manager too, for backup, but she usually doesn't need it.
You're telling me that people who have access to business secrets can't be bothered to memorize something an old lady with brain damage can remember? That's just sad.
I suspect @Matthias was being ironic, and that his question was more about convincing people who don't want to bother than teaching those who can't be bothered.
"As long as my banking's secure, I don't really care about the rest" is a common and understandable reaction by many people. Why _should_ you care about the apparently less important stuff?
(The answer, of course, is that the less important stuff may be a stepping stone – e.g. your birthday revealed in one place helps a crook acquire your address and SSN stored by someone else lets them get credit in your name means someone else is riding around on that Harley you paid for. I am glibly oversimplifying for effect…but you get the idea.)
The *big* problem though are the web sites that insist on 'good' passwords for access to trivial stuff.
I know of one major consulting company that insists on the whole upper / lower / digit / punctuation thing merely to create an account to read their published content.
Those are the kinds of sites that give us all the irits!
I have no problem re-using the same trivial password for sites that need no personal information (or are happy with false data!). As soon as a site owns some kind of unique / personal data, the 'proper' rules kick in.
I'm fairly sure this attitude is the reason many of the hacks reveal such a plethora of easy-to-guess passwords, passwords that will work on other sites. Despite all the warnings about re-use, I think people are generally more pragmatic than the experts give them credit for. And I think (hope!) that this explains the number of 'easy' passwords Paul discovered in this instance.
I know we have seen reports of a breached email address / password list being used to authenticate on a different site, but how often has this been proven to occur where the second location contains personal information? In fact (wondering out loud) have any of the researchers tried the password against the actual email account?
Your saying this test cracked all those passwords so quickly, does that mean your system runs passwords through at warp speed? What if a computer was set up to recieve password attempts only once every five seconds. Would that put a stop to your system?
The problem here (and in cases of this sort, where password databases have been stolen) is that *I* get to choose how fast to process the password attempts, because I have the password hashes locally on my own computer.
Rate limiting password attempts is an excellent idea (like banks do with ATM cards, for example – three tries and the card gets swallowed) but it only works *if I have to connect to your service* to do the password checking.
If I can do my checking offline, you don't get to tell me how fast to go or when to stop…and that's why you shouldn't let your password database get stolen in the first place, no matter how funky the hashing that you've used to store the password data.
My password policy has been to store randomly generated 20 character alphanumeric symbol for each site in a master database stored on trusted hardware.
This database must be encrypted and the key protected by a rotating password.
I like the Keepass platform as it is open source and has many implementations for most platforms with a cpu.
I don't believe in cloud password storage providers or anything too closely connected to a browser — too many vectors of attack. It also opens (international) legal issues on lawful disclosure.
Another advantage is that if you can deny yourself physical access to the database, it is impossible to extract passwords from one who does know (and could not possibly) know them. mnemonic methods such as xkcd and Cluleys are vulnerable to rubber hose cryptography.
Your comparison, 2hr vs 2min, was that done against the whole list both times?
Either way it is an interesting example of just how effective password cracking can be with even minimal effort.
As others have pointed out, GPU’s are much better suited to hacking
In the last couple of days I read a blog about a guy who used “free time” on his AWS account to test some cracking.
” Amazon are offering free 24/7 ‘cloud’ services for free for a year to new customers.”
” that’s 140 simultaneous instances for over 5 hours! (per month)”
http://www.leighbicknell.com/free-wpa-crack-amazon-clustering/
And here is a 2 year old blog on how to setup AWS to emulate GPU for password cracking:
http://stacksmashing.net/2010/11/15/cracking-in-the-cloud-amazons-new-ec2-gpu-instances/
That is legit resource. Illegal botnets represent much larger networks of computers at apparently lower prices (since the hacker is stealing the time …)