Think that a passphrase of multiple, random dictionary words is as unguessable as long strings of gibberish, but easier to remember?
Research from the Computer Laboratory at the University of Cambridge suggests that this might not be so.
While passphrases using dictionary words may not be as vulnerable as individual passwords, they may still be cracked by dictionary attacks, the research found.
Security researcher Joseph Bonneau reports, in a recent paper written with Ekaterina Shutova, that his team studied the problem by turning not to the theoretical space of choices but rather the real-life passphrases that people actually string together.
To find such a selection of passphrases, his team used data crawled from the now-defunct Amazon PayPhrase system, introduced last year for US users only.
The goal wasn’t to evaluate the security of the scheme as deployed by Amazon, Bonneau says, but rather to learn more about how people choose passphrases in general.
Amazon’s was “a relatively limited data source”, he writes, but the research results do “suggest some caution on this approach”.
In the original version of the Amazon site, passphrases had to be at least two words long. Error messages indicated when a passphrase was already in use.
The first experiment was a dictionary attack using lists of movie titles, sports team names, and dozens of other types of proper nouns crawled from Wikipedia, along with idiomatic phrases crawled from sources including Urban Dictionary.
Here’s what the researchers said:
We found about 8,000 phrases using a 20,000 phrase dictionary. Using a very rough estimate for the total number of phrases and some probability calculations, this produced an estimate that passphrase distribution provides only about 20 bits of security against an attacker trying to compromise 1% of available accounts. This is far better than passwords, which are usually under 10 bits by this same metric, but not high enough to make online guessing impractical without proper rate-limiting.
The debate about how easily dictionary attacks can break passphrases is interesting. I am not adept at the mathematics involved, but random word passphrases certainly do have their proponents.
Take, for example, the Slashdot discussion on this issue.
A random selection of commenters’ thoughts on the entropy (i.e., the password strength/resistance to brute-force searching) of common-word passphrases:
- »IMHO, you CANNOT use straight dictionary words (regardless of language, and yes, I do mean Klingon and Sindarin!) in your passwords without some sort of numeric or symbolic character replacement pattern.
- »Of course you can. If they're selected randomly, an attacker has to use the complete source space for the random selection in a brute force attack.
- »diceware.com gives you 12.9 bits of entropy per word. Brute forcing that is already more trouble than it's worth at three words, and five would require nation-state resources to crack.
These issues are delightful and productive to ponder for those with a love for password generation nuance, but most laypeople just want to know how to choose a safe password.
We don’t want to have to remember crazy combinations of uppercase and lowercase and random words with letters swapped out Leetspeak-ishly, plus of course the added special character &$!! or two and some digits glued to the bottom. (See xkcd for the graphic representation of the insanity this causes.)
The research takeaway is that while passphrases are safer than passwords, they’re not all that safe, depending, of course, on length.
Length is another matter entirely. Paul Ducklin and Chester Wisniewski discuss passwords and complexity in detail in a recent Sophos Techknow podcast:
(11 March 2012, duration 14’35”, size 10.5MBytes)
“[The password myth] that annoys me the most [concerns] Leetspeak,” Chester said in the password podcast. “They pick a nice word, and they say, ‘Well, it’s not a dictionary word. I added 0 instead of o.’ But most password-cracking apps try that right off the bat, because they know how much people rely on this false sense of security from complicating their password.”
But combining passphrase abbreviation with Leetspeak combines the best of random characters mixed with the implicit, coherent meaningfulness of a phrase.
The debate over whether passphrases are guessable seems moot in the face of this user-friendly approach.
I’m not saying that because I write for Naked Security; I’m saying it because I’ve found it actually works.
Using this hybrid approach, I can call to mind random strings of characters reaching a dozen or more characters which, when I decipher them, form phrases that are simple for me to associate with important sites: for example, that of my neighborhood bank.
If you’re not convinced that this is the best approach, either for you or your end users if you set organizational password policy, I’m curious to hear your thoughts on how you approach password generation. So please, comment away.
32 comments on “Multi-word passphrases not all that secure, says Cambridge University”
Thanks for the comprehensive look at password security. While the passphrase method has been praised for adding length, the fact is that it is rarely criticized for shrinking the domain from which its elements are taken. A passphrase is much less secure compared to a non-dictionary password of the same length for a dictionary attacker.
Fortunately, you only need to make a trivial change to the words in order to take them out of the dictionary. 133t-speak is, as you note, too widely known to be useful. I like using other, non-substitution symbols and punctuation to simply *break* the words on non-word borders, thus taking the fragments out of the dictionary. The extra characters do the double-duty of adding entropy (chosen randomly versus 133t) as well as adding characters which are required by security-conscious sites like banks.
However, I think when we talk about passwords, we make two mistakes: we expect far too much of the average user, and at the same time we make far too little requirement of the password scheme. Not only should it be easier to remember than all any scheme I've seen so far, but should also allow the user to *remember* different passwords for every site they visit. And be relatively secure, at least, more secure than their old password method.
The approach that works for me – and what I think is the closest we can get to the 'expect less from users and more from the password scheme' – is to use a password manager loaded with unique, randomly generated > 16 character passwords for each application. It has an obvious and fairly critical weakness in that everything is protected by one umbrella password but that's the weakness I feel I can most easily protect.
Posterous link is invalid.
Thanks – I removed the now-dud link from the comment above.
That how iNve been doing it for years.
Hah! As soon as I started reading this, I was thinking of XKCD's "Correct Horse Battery Staple", even before I saw it was referenced in the article!
More on topic though, what are the concerns then in trying to choose a secure word or phrase in a company that also requires regular changes? Generally I have found when talking to people in one of my last jobs that they ended up choosing incremental passwords, using the same word or phrase and just substituting the current month or incrementing a number, essentially using most of the same password as previously.
Paul and Chester cover exactly this scenario in their recent podcast. It's well worth a listen. http://nakedsecurity.sophos.com/2012/03/11/bustin…
Sow, Peepel hoo spel badlee hav gud parsswerdz?
Interesting and helpful video. A couple of questions: Are you suggesting that we use a different sentence "Fred and Wilma…." with the attendant character substitutions for EACH password, or is it sufficient to vary the substitutions and re-use the sentence at least for a couple of passwords? Second, how does one manage all these passwords with multiple machines – say, an home computer, a laptop, an iPhone, a Nook?
All the linked article tells us, is that users often pick bad passphrases, just as they often pick bad passwords.
On the other hand, if a sysadmin enforces a policy of completey random passwords then most users will strugle to rember them, and will write them down, or use a lot of help desk time on password resets etc, but if the sysadmin has a policy of random passphrases (eg diceware), then it is much more likey that users will rember them.
In other word the solution to weak passwords is user education (as allways), and to allow and encorage users to use long passphrases.
I like the hybrid approach, but there is a big detail not taken into account: restrictions placed by the site itself. Most places will require a slew of different requisites for their passwords, from a minimum to a maximum length (or both), whether you can have caps, symbols and how many and which symbols are permitted.
It is impossible to conceive of a sane, logical approach to password management while limited by these arbitrary parameters.
So it’s “pA$$w0rdoo1” for all my sites for me.
Ted, thanks so much for the input. Guy, thank you for bringing up the regular-password-changing requirement. My advice on how to deal with that is to suggest you have the security people listen to the Techknow podcast by Chester and Paul (mentioned in my article). The idea, Chet tells us, that regular password changes introduce more security is a myth that dates back to the days when passwords were stored in plain text files on Unix systems. Regular password changes actually decrease security, for a few reasons: 1) the poor users are going to start using sucky passwords because they're easy to remember and to increment (password12, password13, etc.–is it any wonder people opt for these easily predictable passwords?), and 2) doing something security-related on a regular, predictable schedule (quarterly? monthly)? is a gift to hackers. Plus it distracts the IT department for a predictable chunk of time on a predictable schedule.
I highly suggest you listen to the podcast, as P&C have other great password myth debunking tips, and I am just feebly rehashing what I remember (no pun intended, heh. heh.).
Has anyone studied the effect of inserting numbers and special characters in the middle of pass phrases? See spot run see spot jump = ssR7$@ssJ8@$
I'd like to know how long it would take to crack that with a dictionary attack.
Suggesting to the BOFH that he has configured the system wrong isn't always going to end well 😉
Unfortunately most of the problem is not the people who worry about the level of entropy in their passwords. The problem is everyone else. Most of the responsibility here is with the systems designers that permit their users to use bad passwords. Hopefully the outcome of research like this is that trending phrases are added to filters for password selection so that when a user puts "AngelinaJoliesleg" in as a password thinking that they are brilliant and random and unique the system will let them know it is a bad password and they should try a different method of deciding a password.
I think the first step is to change the terminology from password to pass phrase, that way people are more likely to use longer passwords. As pass phrases get longer it seems reasonable that they would become more divergent.
No-one seems to be looking at this from the perspective of the average user. I'm an ordinary soul who uses the Internet for shopping, email and banking. I recently worked out that I have over 50 online and telephone banking accounts – all of which require a password.
Honestly, in the real world, what is an ordinary person supposed to do to remember 50 passwords? They will do what everyone does – have two or three passwords and swap them round. That's all you can honestly expect someone to do.
What you folks do works brilliantly in one user application. But no-one ever seems to discuss what this means in a multi-account world. And we have engineered ourselves into a world driven by passwords that no ordinary human has any hope of remembering.
Password management software is your friend. Watch the video and deals with this very question.
This discussion seems pointless to me. Password management software is free and secure. I use KeePass2, and I used AnyPassword before that, going back a decade. I have over 100 passwords, they are all different, I didn't have to think of them (pseudo-random password creation is provided as well), they all have as high a level of security as the application permits, and I don't remember any of them. Just the one to open the password store, which is quite tough – and unless my PC is stolen there's no opportunity for that to be attacked.
So I think it's irresponsible to recommend any other approach.
Or, you know, your hard drive crashes. Or are you storing that password store unencrypted in the cloud somewhere?
I think it's irresponsible to suggest that someone keep their passwords in one and only one place. If that is compromised or lost, then someone is up the creek without a paddle.
What we need is a solution that is cross platform (Windows, Mac, Linux, iOS, and Android), web accessible (or at least syncable across devices), and easily maintained.
Er … no … why would I do that?
I have backups of the file, still encrypted, in more than one location, none of them in the cloud. Of course. I have had several hard disks crash and survived all of them. But this is computing 101. I would get a robust backup process in place on the same day I get a new PC out of the box if I were you.
Here's the problem I've had with that video, and the similar instructions that were on the now-defunct Microsoft page on how to make a strong password: That approach works fine in, say, a situation where you only have to log in to something once a week, or once a month. If you have a need to log in to, say, 30 – 40 servers a day, or you have to unlock your workstation every time the screensaver goes on, trying to *remember* something that esoteric, let alone type it quickly and efficiently, is a daunting prospect.
Better to have a sufficiently strong, easily remembered passphrase with some entropy in it. "Getthelittlegirl aUnicornPapoy!" (courtesy of Despicable Me) is going to be much stronger, but more importantly, easier to remember and type than trying to remember the first letter of each word in a paragraph, and one that has also been pseudo-1337ed.
The debates over how to choose a good password will go on and on for years. At the end of the day there’s more than one way to do it.
One thing I will say though is that is that the research in the paper seemed to be built around how secure passphrases are as part of a system when that system is being attacked, and when the attacker only needs to break 1% of the passphrases in order to compromise it (I’m basing that off this article, I haven’t personally read the paper yet). If that is indeed the case then that’s very different from how likely someone is to crack any one individual’s password or passphrase.
In other words, think of it like this. If I know an organization promotes the use of passphrases, and they have 10,000 users, and I’ve been hired to do a penetration test against it, then all I need to do is crack 1 password to penetrate the system. But if I have been hired to do a penetration test against a company and I have no idea what type of passwords they promote (I.e., what the minimum length is, whether they promote passphrases, whether they provide password vaults like Keepass and encourage people to use randomly generated passwords, etc) then my job just got exponentially harder.
By the same token, if I’m attacking an individual user (hypothetically of course) and have no idea what type of password or passphrase they’ve chosen then life just got really, really hard.
With that said, my method of choosing passprases is similar to Ted’s comment. I like choosing 4 or 5 words then throwing some random punctuation into it. The phrase “80thvinyl” will be cracked in a very short time by someone with the right tools and with the knowledge that they’re attacking a passphrase, then phrase “80’th=vin”yl” will not.
Are longer passwords still better? Is having a long password of say 20 characters, comprised of plain dictionary words, better than an eight character password comprised of variety?
Which is safer: H9*g4aw or moopapertractorcandy
It’s not necessarily about which one is better. I could work the math behind the choices you just gave but right now I don’t have the time. Suffice it to say that both options you gave are going to take a very long time to crack.
The other question that you have to consider, though, is which option is easier to remember? The discussions about passphrases aren’t usually about what’s right for you or me but about what’s easier for a system as a whole. Most people find it much easier to remember passphrases. Personally I love randomly generated passwords that I store in a password manager, but I don’t always use those. Sometimes I use passphrases (usually if it’s a password I’ll be typing a lot). For example, I love passphrases for laptop encryption. If I’m having to type it a few times a day then I’d much rather type father44todaynow!will than GHAJweroui&!*#%$0asd
One thing a lot of people forget about passphrases, is that they are susceptible to shoulder attacks. That is, someone looking over your shoulder can very easily crack the password if they can identify half (or maybe less) of your keystrokes.
For example, below is a passphrase generated at random, with half of the characters hidden. I'll leave it to you to crack:
I know, I know- a day late and a dollar short, but I have to put my two cents in. The reason why random dictionary words are not secure is because the lack of entropy. For a passphrase to be secure it must have maximum entropy. In security that means that there should be as much randomness as possible. Very little or no order – as much disorder as you can put in that sucker is best. The more entropy you put into the password the longer it will take to crack using brute force. A 12 character random password using case sensitive alpha numeric symbols gives you around 80 bits of entropy. That means it would take, on average, .5 * 2^80 tries. That's a lot of tries. A 32 character password will give you about 256 bits of entropy which is uncrackable by today's computers.
I am a bit confused by this article. Are you saying that the problem is that real passphrases used can be hacked because one can make a dictionary with common passphrases ie “patrickswayze” ? So what if you just use 2 or 3 random phrases. ie elephantghosttelephone. would that be ok?
I think the researchers are saying that many people combine two words assuming they square the security of their password but choose badly. So the deal is whether your three “random” words really are randomly combined or if there is a bias in your choice (like “tuskelephantgrey”).
Indeed, the crux of the matter is: “Are your passwords generated by a truly random process, or by a human brain?”
If you’re relying on a human brain, then of course passphrases are not going to fare much better than passwords, because human brains are terribly adapted to that job. We instinctively seek for patterns and structure, so when we’re asked to do something random – we’re really bad at it.
The benefit of passphrases is that it becomes feasible to generate something randomly and then actually remember it. That pattern-seeking instinct means that we can take a string of four or five completely random words and invent a structure that puts them together into something memorable. That’s how XKCD’s “correct horse battery staple” password method works.
type those long passwords on mobile
I’m moving to a password manager so I’m just starting to learn about password entropy. I found a site where you can enter passwords and it calculates the entropy, then rates them as weak, reasonable, strong, very strong, etc. After learning how entropy is calculated, I experimented with this tool by entering a simple string of repeating characters (all t’s, for example) to see how many would be needed to get to a strong or very strong rating. I found that entering 30 t’s, for example, gets you to a very high entropy (128 bits) and a “very strong (“often overkill”)” rating. Starting the same string with one single upper case T only takes a string of 25 t’s to get the same level. I realize this type of password would be very vulnerable to ‘over the shoulder’ attacks, but guarding against that, it seems like a very simple way to generate a strong password (strictly in terms of entropy), would be easy to remember and easy to type on any machine/device. But I’m sure this type of password would have other weaknesses, so I’m curious what more expert people would say about this kind of password. I wouldn’t use this type of password myself because my intuition tells me it’s not a smart kind of a password to have, but it did make me want to ask the experts about it for the sake of learning more.
Entropy is not a measure of password strength, it’s a convenient and flawed approximation of password strength. I wrote about it here in my article “Why you STILL can’t trust password strength meters”
The ‘strength’ of a password only guards against brute force attacks, which are but a small part of security nowadays. By far the majority of compromises today are the result of phishing, or hacked databases, at which point the actual contents of the password and how it’s made up become irrelevant.