As you may know, Philips recently suffered a data breach, when a hacking group exfiltrated a bunch of small databases and dumped them on a public drop site.
To get a feel for the sort of passwords Philips customers had chosen, I decided to have a crack at them, using the popular open source software John the Ripper.
I wrote yesterday about some of the egregiously bad passwords I found, such as 123456, 12345678, 999999, and (several times) the rather obvious philips, but the actual passwords I recovered weren't as interesting as the rate at which I recovered them.
Let me show you what I mean, using one of the trendiest media instruments of 2012: an infographic! Or, in this case, an mini-infographic:
The image above covers a two hour period during which I set a single CPU core of my not-very-fast laptop at the Philips password hashes. The graph traces out a cumulative total of how well I (or rather my laptop) was doing.
(In the interests of science, please don't read too much into the look of the graph above. The sample size is small; we can't be certain that the hashes are genuine, since we only have the word of cybercrooks to go on; and cumulative graphs tend to have visually appealing shapes anyway because they only ever go upwards. It's the thought that counts.)
There's a huge and rather obvious lesson to learn here: don't be at the left hand side of the graph.
A significant number of users chose passwords that as good as guessed themselves - I'd cracked close to 20% of the hashes in the first second of John the Ripper's run.
However, after I'd cracked about half of the passwords, which took about 50 minutes, the law of diminishing returns kicked in. So I repeated the cracking experiment.
This time I didn't rely on John the Ripper's password generation algorithms, but used a collection of dictionaries, including lists of Dutch words. Philips is a Dutch company and, judging by the names in the database and the passwords recovered at my first attempt, so were many of the users.
With about 20 million potential dictionary passwords in my list, downloaded from free and easily available public sources, I got much the same result, with an intriguingly similar shape to the graph:
There's an important difference, though: the purple-tinted image above covers not a two hour period but just two minutes. Remember, this is using a single core on a laptop that's several years old.
The dictionary-driven attack also recovered about 50% of the passwords before running out of puff; combining the two sets of results revealed 66% of the passwords in the list.
Let me say it again: don't be at the left hand side of the graph.
I'm not going to argue here whether you should use the correcthorsebatterystaple approach favoured by the XKCD comic, or the Fred and Wilma Sat Down for a Dinner of Eggs and Ham approach favoured by Naked Security's own Graham Cluley.
(Enjoy this video? Check out more on the SophosLabs YouTube channel.)
XKCD's approach combines a small number of dictionary words in a bizarre and deliberately meaningless way to make an unusual and lengthy combination of characters; Graham's approach combines a larger number of dictionary words into an unusual but memorable phrase to achieve a similar result.
Use whichever method produces password strings that are easier for you to remember, but try to pick something that is both lengthy (e.g. 13 or more characters) and that neither appears online nor can be derived by simple algorithmic substitution from something that might appear online.
By the way, don't choose F+Wsdfadoe&H, and don't choose correcthorsebatterystaple. They have the sort of length, complexity and weirdness you need, but they appear online.
And as much as you might want to g4STHGs2wi'veDh [*], remember that's online too.
[*] As far as g4STHGs2wi'veDh is concerned: go for something similar to what I've done here.