One month ago today, we wrote about Adobe’s giant data breach.
As far as anyone knew, including Adobe, it affected about 3,000,000 customer records, which made it sound pretty bad right from the start.
But worse was to come, as recent updates to the story bumped the number of affected customers to a whopping 38,000,000.
We took Adobe to task for a lack of clarity in its breach notification.
Our complaint
One of our complaints was that Adobe said that it had lost encrypted passwords, when we thought the company ought to have said that it had lost hashed and salted passwords.
As we explained at the time:
[T]he passwords probably weren't encrypted, which would imply that Adobe could decrypt them and thus learn what password you had chosen.
Today's norms for password storage use a one-way mathematical function called a hash that [...] uniquely depends on the password. [...] This means that you never actually store the password at all, encrypted or not.
[...And] you also usually add some salt: a random string that you store with the user's ID and mix into the password when you compute the hash. Even if two users choose the same password, their salts will be different, so they'll end up with different hashes, which makes things much harder for an attacker.
It seems we got it all wrong, in more than one way.
Here’s how, and why.
The breach data
A huge dump of the offending customer database was recently published online, weighing in at 4GB compressed, or just a shade under 10GB uncompressed, listing not just 38,000,000 breached records, but 150,000,000 of them.
As breaches go, you may very well see this one in the book of Guinness World Records next year, which would make it astonishing enough on its own.
But there’s more.
We used a sample of 1,000,000 items from the published dump to help you understand just how much more.
→ Our sample wasn’t selected strictly randomly. We took every tenth record from the first 300MB of the comressed dump until we reached 1,000,000 records. We think this provided a representative sample without requiring us to fetch all 150 million records.
The dump looks like this:
By inspection, the fields are as follows:
Fewer than one in 10,000 of the entries have a username – those that do are almost exclusively limited to accounts at adobe.com and stream.com (a web analytics company).
The user IDs, the email addresses and the usernames were unnecessary for our purpose, so we ignored them, simplifying the data as shown below.
We kept the password hints, because they were very handy indeed, and converted the password data from base64 encoding to straight hexadecimal, making the length of each entry more obvious, like this:
Encryption versus hashing
The first question is, “Was Adobe telling the truth, after all, calling the passwords encrypted and not hashed?”
Remember that hashes produce a fixed amount of output, regardless of how long the input is, so a table of the password data lengths strongly suggests that they aren’t hashed:
The password data certainly looks pseudorandom, as though it has been scrambled in some way, and since Adobe officially said it was encrypted, not hashed, we shall now take that claim at face value.
The encryption algorithm
The next question is, “What encryption algorithm?”
We can rule out a stream cipher such as RC4 or Salsa-20, where encrypted strings are the same length as the plaintext.
Stream ciphers are commonly used in network protocols so you can encrypt one byte at a time, without having to keep padding your input length to a multiple of a fixed number of bytes.
With all data lengths a multiple of eight, we’re almost certainly looking at a block cipher that works eight bytes (64 bits) at a time.
That, in turn, suggests that we’re looking at DES, or its more resilient modern derivative, Triple DES, usually abbreviated to 3DES.
→ Other 64-bit block ciphers, such as IDEA, were once common, and the ineptitude we are about to reveal certainly doesn’t rule out a home-made cipher of Adobe’s own devising. But DES or 3DES are the most likely suspects.
The use of a symmetric cipher here, assuming we’re right, is an astonishing blunder, not least because it is both unnecessary and dangerous.
Anyone who computes, guesses or acquires the decryption key immediately gets access to all the passwords in the database.
On the other hand, a cryptographic hash would protect each password individually, with no “one size fits all” master key that could unscramble every password in one go – which is why UNIX systems have been storing passwords that way for about 40 years already.
The encryption mode
Now we need to ask ourselves, “What cipher mode was used?”
There are two modes we’re interested in: the fundamental ‘raw block cipher mode’ known as Electronic Code Book (ECB), where patterns in the plaintext are revealed in the ciphertext; and all the others, which mask input patterns even when the same input data is encrypted by the same key.
The reason that ECB is never used other than as the basis for the more complex encryption modes is that the same input block encrypted with the same key always gives the same output.
Even repetitions that aren’t aligned with the blocksize retain astonishingly recognisable patterns, as the following images show.
We took an RGB image of the Sophos logo, where each pixel (most of which are some sort of white or some sort of blue) takes three bytes, divided it into 8-byte blocks, and encrypted each one using DES in ECB mode.
Treating the resulting output file as another RGB image delivers almost no disguise:
Cipher modes that disguise plaintext patterns require more than just a key to get them started – they need a unique initialisation vector, or nonce (number used once), for each encrypted item.
The nonce is combined with the key and the plaintext in some way, so that that the same input leads to a different output every time.
If the shortest password data length above had been, say, 16 bytes, a good guess would have been that each password data item contained an 8-byte nonce and then at least one block’s worth – another eight bytes – of encrypted data.
Since the shortest password data blob is exactly one block length, leaving no room for a nonce, that clearly isn’t how it works.
Perhaps the encryption used the User ID of each entry, which we can assume is unique, as a counter-type nonce?
But we can quickly tell that Adobe didn’t do that by looking for plaintext patterns that are repeated in the encrypted blobs.
Because there are 264 – close to 20 million million million – possible 64-bit values for each cipertext block, we should expect no repeated blocks anywhere in the 1,000,000 records of our sample set.
That’s not what we find, as the following repetition counts reveal:
Remember that if ECB mode were not used, each block would be expected to appear just once every 264 times, for a minuscule prevalence of about 5 x 10-18%.
Password recovery
Now let’s work out, “What is the password that encrypts as 110edf2294fb8bf4 and the other common repeats?”
If the past, all other things being equal, is the best indicator of the present, we might as well start with some statistics from a previous breach.
When Gawker Media got hacked three years ago, for example, the top passwords that were extracted from the stolen hashes came out like this:
(The word lifehack is a special case here – Lifehacker being one of Gawker’s brands – but the others are easily-typed and commonly chosen, if very poor, passwords.)
This previous data combined with the password hints leaked by Adobe makes building a crib sheet pretty easy:
Note that the 8-character passwords 12345678 and password are actually encrypted into 16 bytes, denoting that the plaintext was at least 9 bytes long.
A highly likely explanation for this is that the input text consisted of: the password, followed by a zero byte (ASCII NUL), used to denote the end of a string in C, followed by seven NUL bytes to pad the input out to a multiple of 8 bytes to match the encryption’s block size.
In other words, we are on safe ground if we infer that e2a311ba09ab4707 is the ciphertext that signals an input block of eight zero bytes.
That data shows up in the second ciphertext block in a whopping 27% of all passwords, which, if our assumption is correct, immediately leaks to us that all those 27% are exactly eight characters long.
The scale of the blunder
With very little effort, we have already recovered an awful lot of information about the breached passwords, including: identifying the top five passwords precisely, plus the 2.75% of users who chose them; and determining the exact password length of nearly one third of the database.
So, now we’ve showed you how to get started in a case like this, you can probably imagine how much more is waiting to be squeezed out of “the greatest crossword puzzle in the history of the world,” as satirical IT cartoon site XKCD dubbed it.
Bear in mind that salted hashes – the recommended programmatic approach here – wouldn’t have yielded up any such information – and you appreciate the magnitude of Adobe’s blunder.
There’s more to concern youself with.
Adobe also decribed the customer credit card data and other PII (Personally Identifiable Information) that was stolen in the same attack as “encrypted.”
And, as fellow Naked Security writer Mark Stockley asked, “Was that data encrypted with similar care and expertise, do you think?
If you were on Adobe’s breach list (and the silver lining is that all passwords have now been reset, forcing you to pick a new one), why not get in touch and ask for clarification?
Poor old Adobe, hiding behind the door when security smarts were being given out…
As Randall points out:
There’s only one group that comes out of this looking smart: Everyone who pirated Photoshop.
Photoshop is TOO expensive, of course we had pirated it, lol…
(Might be a double post, I don’t know if my first one went through at all.)
I have a couple of web dev friends who tell me that they take the user’s password and encrypt it with AES-256, using the password as the key. When they want to authenticate the user, the decrypt the scrambled password with the password offered and see if it matches.
Is there any reason to suggest that Adobe might have done this, and why is this usually not a recommended technique?
Well for starters they will be storing the actual password (even in an encrypted form) so any compromise of the db will yield the encrypted passwords, then through use of rainbow tables (lookup tables containing known passwords and encrypted representations) it would be relatively easy to extract the commonly used passwords.
The point is that storing the actual password in a recoverable manner means that there is inherent weakness in the system as the password can be recovered so the loss of the data means that any one with enough time and resources could decrypt every password.
The preferred way of storing passwords is to hash and salt the value that is stored. You can think of the process as fingerprinting the password, so that all that is ever stored is a pattern which matches what the password is. The salting aspect is needed to ensure that no two hashes are the same with the salt value being a randomly generated unguessable value that can be recalculated/regenerated to create a hash to compare against the stored hash.
You can infer the level of security a website places on its passwords by the its password reset mechanism. If a site will email you, or display your unecrypted password then personally i avoid them, and use sites where they only have a password reset functionality.
This is not recommended because – as you see above – it doesn't disguise password patterns. Everyone with the same password gets the same encrypted data, which is very bad…especially when you have 150,000,000 records to choose from, greatly increasing the likelihood of two people choosing the same password.
Anyway (I ran out of space in the article to cover this – perhaps I should have), it's not the method used here because of data like this in the dump:
2fca9b003de39778 e2a311ba09ab4707 = "password" (from hints, 0.3%)
2fca9b003de39778 d23e6fe47a8c787c = "password1" (from hints, 0.02%)
…and many more combinations where the hint makes it obvious that the first 8 bytes of the password are "password", and the first 8 bytes of the ciphertext stay the same.
Similar reasoning with the ciphertext e2a311ba09ab4707 (the encryption of 8 zeros), which appears in byte positions 9-16 27% of the time, but also occasionally in ciphertext bytes 17-24, 25-32 and beyond, denoting passwords exactly 16, 24, etc. bytes long.
That wouldn't happen if the plaintext were also the key.
So you can infer that the same encryption key must be used every time.
Your web dev friends are being silly, by the way. Don't try to knit your own crypto. (And don't use block ciphers directly as hashes. Their design goals are different. Use a function designed to secure passwords. Try PBKDF2, bcrypt or scrypt.)
This is a bad idea because if a hacker knows that is what you do they can use a fast computer to pre-encrypt the most popular passwards (a fast computer can encrypt several million passwords a second these days) and then compare the encrypted passwords with the pre-computed ones to find out what the passwords are.
With hashing, you can use a separate salt for each password (which means that even if several people use the same password the hashes will be radically different) and use a hashing algorithm that can repeat the hashing a number of times, which makes it orders of magnitude harder to find passwords.
Agreed. However it is worth noting that one could potentially add a salt to the above algorithm and effectively negate attacks that rely on pre-computing certain values in a similar manner that one does with hashing algorithms. With this implemented, hashing algorithms do not necessarily have an advantage in that regard.
I do think that hashing algorithms would be better though. They are more tried and tested, and as others have noted they are less likely to reveal patterns in the password data/plaintext.
What would be the benefit of hash and salt if the whole database is available to you for inspection. Rainbow table or exhaustive scan would have the desired result?
The salt renders a rainbow table useless, as you'd need an individual rainbow table for each unique salt (which is of course completely inpractical).
GPU computing passed rainbow tables in password cracking efficiency within the past couple of years. A salted hash created by a hash algorithm designed for throughput (most of them) is negligibly stronger than an unsalted one. The best cracking setups can test tens of BILLIONS of potential passwords in a few seconds, making the salt almost irrelevant if you use an efficient hash algorithm. Always use a salt in your hash, but also use an algorithm like bcrypt that hasn’t been efficiently implemented on GPU rigs yet.
(I don’t intend this to be contrary or patronizing. I’m sure you already know the above, but the person you’re responding to probably didn’t. I generally feel that the illusion of security is worse than knowing you’ve got no security. You can’t properly mitigate risks if you don’t know what they are.)
GPU computing being faster than rainbow tables? Where did you get that from? Sounds wrong to me.
Hashing is used so you don't need to store anything from which the actual password can be recovered.
Salting is used so that the same password doesn't produce the same hash every time.
So you have to use an exhaustive search, and with salting you need a new rainbow table for each salt.
However…never forget that *the whole database isn't supposed to be available*. Hashing and salting is only an extra layer of defence to soften the blow of being hacked…it doesn't exonerate the leak!
This analysis also shows that password hints are a huge leak, given the flat-out stupid hints that people use.
Oh, yes, indeed.
I cherry picked the ones above to make a point, but still. One bloke was obviously worried that if he put the password hint as "password", he might forget that was actually the password itself, so he carefully explained himself in full: "the password is password" 🙂
If Adobe was going to all the trouble of encrypting the passwords reversibly, you have to wonder why the coders didn't encrypt the hints as well.
For fun, out of my 1,000,000-strong sample, the following words showed up as follows in the hints:
dog = 11,000
cat = 5000
rabbit/bunny = 260
hamster = 73
guinea pig = 20
parrot = 10
I have the feeling that your stopped your article before the end of the story – and that you have reverse-engineered the (single) encryption key.
There's more to the story. Like the fact that lots of the adobe.com addresses – the ones with usernames – have identical passwords, presumably autogenerated when the accounts were created. Like the fact that, hashed or not, Adobe still managed to put 150,000,000 email addresses directly into spammer's hands, with a hint at many of their interests…
…but I do not have the decryption key.
Don't really need it. I'm guessing at least 10% of the passwords are recoverable from the hints that go with someone else who chose the same password. (I did nearly 3% without actually trying 🙂
Other people are saying "it's 3DES", presumably because they've already got through an exhaustive DES search (just look for the key that encrypts 0000000000000000 to e2a311ba09ab4707) without success…so you know where to start if you want to have a try yourself.
I don't know where the 3DES assumption comes from except that the block size looks like 64 bits, and that sort of rules DES and friends in…
FYI it is easiest to search based on the 12345678 plaintext because you don’t know if they use NUL or space or dash padding. And you don’t know if they case-fold passwords with [a-z]
Blowfish is also a 64-bit block cipher, but I presume that big companies are more toward standards/gov type of algorithms, so DES family is indeed a better bet.
Given the level of access the hackers seemed to have to the Adobe servers (all the source code they managed to get, etc), isn't it likely they may also have managed to take a look at how user logins are validated? They may very well already have the information needed to decrypt the encrypted passwords, ie knowledge of the encryption algorithm and encryption key.
“It should come as no surprise to discover that this is because the input text consisted of: the password, followed by a zero byte (ASCII NUL), used to denote the end of a string in C; followed by seven NUL bytes to pad the input out to a multiple of 8 bytes to match the encryption’s block size.”
How do you know this? There could be other explanations (such as they messed up working out how much padding was needed and add a block of padding when the password length is divisible by 8.
I assume you are assuming this. Or do you, in fact, know more than you are saying?
Hmmm. Poor choice of words.
I'll tweak the sentence to make it clearer that when I said "It should come as no surprise to discover," I meant, "You shouldn't be surprised to hear me treat it as if it were a proven fact that…"
Or I'll leave out the bit about the NUL. It might be "8 zero bytes added by mistake", rather than "they added one zero byte and had to add 7 more."
Considering that the length of the password needs to be indicated somehow, I am assuming that it was a trailing NUL that bumped 8 password characters to 9.
Ensuring that every decrypted password includes a terminating byte, even if it's a multiple of 8 bytes long, is probably a better coding practice…for what that's worth here. (If you are going to produce schlock code, it may as well be carefully written schlock code 🙂
In short: I reckon am right but you are right that I cannot claim that as a fact.
Actually you are more than likely wrong. The most common padding is https://en.wikipedia.org/wiki/Padding_%28cryptogr… In this case it would be 8 bytes of byte value 8 (0808080808080808). Other common ways are a single bit of 1 followed by zeros (8000000000000000) and NULL padding but normally you don't encrypt a block of NULLs like you guessed (0000000000000000). I should mention that PHP messed this up awhile ago and did NULL padding exactly how you guessed. If I had to guess it would be 0808080808080808, 0000000000000000, and then 8000000000000000.
Also Adobe said it was 3DES.
PKCS7 is for digital signature padding (it's to avoid ambiguity in extended messages), and I'm guessing it wasn't used here…just guessing. Nothing else was done in any sort of compliance with any sort of standard, so…I'll guess I am more than likely right 🙂 Think is, if we find out for sure, it'll probably be because the password got recovered.
(I'll laugh if it turns out they weren't even using DES 🙂
If this is padding it should be possible to distinguish all the 8-char passwords by looking at the 9-16 block.
Try searching the article for the text string "e2a311ba09ab4707".
Paul: huh? PKCS7/5 padding is defined only for symmetric encryption — and widely used. I know it’s the default for OpenSSL “high level” (EVP) and Java JCE, and I’d bet others. That doesn’t prove it was used here of course, but it’s certainly a reasonable possibility.
RSA signatures almost always one of the two paddings defined in PKCS#1: “v1_5” (original) or PSS. DSA and ECDSA signatures use no padding. I don’t know what you mean by “extended messages” — no known public-key signature scheme handles values of more than a small fixed limit. Digital signatures handle arbitrary length data by hashing it first, and that hash generally uses a Merkle-Damgard padding: single 1 bit, 0 bits as needed, and the length of the (valid) data. HMAC, which can be considered a form of signature, uses that padding of the underlying hash, plus two fixed constants.
And a smaller point: where you say “one or more people at Adobe must know” the masterkey, that’s not strictly true. You can use a hardware security module where the key is generated inside trusted hardware and never exported, but instead data is sent to the module to be encrypted and decrypted. I think it’s unlikely someone who knew how to use an HSM correctly wouldn’t make the other mistakes visible, but it’s conceivable.
With all this pattern knitting, have you thought of producing a 'Fair Isle' sweater?
But a truly brilliant article, illustrating a predisposition by the target company for crassness & complacency of mamoth proportions.
I'm not sure your conclusion is totally fair about encryption vs hashing.
From what I've read, *none* of the Adobe passwords have been cracked.
Cetainly many have been *guessed* from the password hints, but not *cracked*.
Whereas if a known password hash algorithm had been used, you would expect a significant proportion of the password hashes to have been cracked by now.
So this demonstrates that Triple DES (if that is the case) is still strong enough to protect such a database, for a significant length of time. Maybe millennia – if you can crack a single DES key in a second, triple DES would take 2^56 times longer, namely 2 billion years. Other estimates say it might take 30-40 years of development and billions of dollars instead.
Yes, it would have been better if salt had been used so identical passwordds could not be detected. It would have been much better if password hints had also been encrypted. And it would have been far better still if the system had been architected so that the database could not have been stolen, for example using hardware and pyhsical protection.
I think you're splitting hairs a little…when you "crack" hashed passwords, you succeed quickly by choosing the most likely passwords as early as possible.
In other words, there's an element of guesswork whenever you do anything other than a brute force ("try every one" attack.
Also, the deal with encrypting the passwords is that the master password has to be stored somwhere. So one or more people at Adobe must know it. Or if Adobe hsd a breach, the crooks could have found it…
….er, hang on…Adobe did just have a breach. A big one.
This breach is a bit different to most, so we ought to take the opportunity to learn the proper lessons.
In this case, the failure to encrypt the password *hints* is the key blunder. In fact, no-one is sure the true passwords have been found, unless they tried actively (and illegally) to use them. Whereas in the other examples of password hash breaches, we know for sure that the cracked passwords are correct.
This breach also demonstrates the serious weakening of the process caused by using password hints, in that people just give away the answer. Password reset questions ("what is your pet's name") are likely to be a bit better (not that I think they are very secure either), as the answer can go through a one-way hash.
Perhaps this case study indicates that it might be a good idea to encrypt your password hashes, so long as you can protect that secret key (i.e. salt the passwords, hash them, then encrypt the result). That way it won't be possible to simply crack the backup tapes, or wherever the Adobe data was taken from.
use salt and hmac (with a key) for your hash.
Great article.
With my reinforced scepticism about Adobe’s security competence, I couldn’t help noticing a smaller — but nonetheless stupid and dangerous — bungle on their instructions for installing Adobe Reader on Mac. They tell you to turn on JavaScript, and illustrate it with an image telling you to tick the check box that enables Java.
Another little chip off my confidence in them, if one were needed.
As a developer, I read your posts with both amusement and horror. On one side I can read this breakdown and think to myself that even I know you shouldn't store information in whatever poorly chosen scheme is being dissected in one of your posts. But on the other hand, as a developer, I'm not up to speed on what is the best practices. It is hard enough to keep up to date on app development and language trends let alone something complicated like security that we don't deal with very often. I'm constantly worried that I could find myself easily making an assumption that seems logical to me but which introduces a critical weakness into my solution. That kind of stuff keeps me awake at night and makes me shy away from working on web apps. (Plus, I'm in health care software development which makes it a $10K per user mistake if PII and PHI is leaked!)
You guys have a wealth of knowledge on these subjects. I would love to see you take what you know and publish some developer guidance articles. Something like "Username and password storage 101". "Federated Security and what developers need to know", etc. It wouldn't need to be language specific… discussion of best practices, steps we would need to take to implement those practices, and some pseudo code demonstrating the flow. A repository of security articles for developers would be a huge resource that I know I would visit often. I understand it wouldn't be a replacement for good design and stringent testing, but if it elevates my understanding of the subject, I can do a better job earlier in the project and bake-in the features we need.
Check out www.owasp.org. They're all about secure coding. They have lots of documentation, videos, and good information. They can get very specific about how to securely implement things like passwords, session management, etc. Their focus is web apps/services; but much could apply to fat client as well.
Adobe is apparently a member of the owasp.org group, along with many of the other internet companies and organizations people routinely trust.
I’ve believed for several years now that if large tech companies such as Adobe and Sony, and major banking institutions like Citigroup cannot keep customer data secure, then all data security is questionable and the risks cannot be known by customers. Therefore, it would seem like common sense for companies to NEVER store credit card numbers longer than needed to complete a transaction. But seems most companies are storing them indefinitely nowadays regardless of whether the transaction is online, over the phone, or in person! (Mostly because they provide a unique tracking number).
In my estimation, using a credit card at all is a security risk.
Among software vendors only Microsoft is more arrogant than Adobe. This story simply smacks of the corporate arrogance and indifference that seems to have become the pillars of their corporate culture. Remember that a significant number among the 150,000,000 people compromised in this way were paying a subscription for this 'service' and may have invested their work in it. It's galling.
Microsoft spend a fortune on updating and improving security for their services and products, more than any company on the planet. Why do MS haters seem so compelled to spread FUD! Apple on the other hand are arrogance personified!
Turns out the encrypted credit card data might be using similar encryption. My card that is on file with Adobe was just used to pay for a flight with Qatar Airlines. I wonder where I am going?
I looked up my various emails in the dump. The passwords attributed to my accounts are several years old at the very least (I keep a history log of when I used a password and when it was changed).
I don't think this is a live dump. IMHO this is an old backup.
My password hint was related to a years old password and Adobe don’t appear to offer password hints any more so you may well be right.
The LastPass checker gives a hit for an account that was made in Nov 2012 so either that’s a false hit or it’s not *too* old a backup.
It isn't that old.
Accounts I created only a few months ago are in there.
Any bets they picked a really bad 3des password, probably in ascii and not a random 112bit string.
There's a chance that they didn't "pick" it so much as just generate it programmatically, which would mean better entropy. (If they already had a make-encrypted-password-database code module, for example, this could be explained because it was *easier*, not because it was more secure 🙂
And, of course, I don't think we actually know it's 3DES. Lots of people are reporting that as a fact…but I don't think I have seen anything from Adobe to confirm it.
It seems pretty likely it's a 64-bit block cipher, as I argued above, which probably means a key no more than 64-bits, *unless* it's 3DES with its twice-56-bit key. But it doesn't have to be. Could be a CRC-64 of some sort, for instance. Who knows? (I wouldn't bet on anyone being 100% sure any more, not even Adobe 🙂
A while ago I changed my poor Adobe password to a much stronger one. I don't remember changing any password hint at that time. Now I used the Lastpass service to check if my account was hacked. The answer was YES and I also got an email with a "Adobe Password Hint Breach Notification". It states that my leaked password hint was that for my OLD password and it also suggests that 935 other people share my password and their hints suggests that the password should be my OLD one. So, it makes me think: Was the leaked database an old copy?
Interesting – the email I got from the LastPass service didn’t mention anything about how many other accounts shared my password.
I wonder if that means my password was unique? Maybe they have stopped providing that information?
The XKCD link should not be "nofollow". This is poor etiquette.
Else, nice article.
Putting "nofollow" on everything means never getting sucked into bunfights about why you did it on one link and not on another…
This sounds like alibism to me… Yuck.
Three purchases at Walmart.com made on my card (that was used with Adobe) a week ago. Thanks Adobe!
Snap!
Adobe and Security – what a great combination – does anyone else here remember the great idea that they had for securing PDF files a few years back – XORing each byte the data EIGHT times for added strength, without realising that 8 x XOR == 1 x XOR ?
I think the disaster here is not only the bad encryption (or to say better, the encryption instead of the hashing with 1K times salting). The bad thing is the fact that hints suggest the passwords for the users in a sea of similar data. And encryption, or bad block encryption, helps us to recover equal encrypted block with different hints.
Encryption does the guessing easier!
If you need to store the encrypted passwords (password recovery? single sign on?), at least divides the encrypted data from the metadata associated (encrypted password and username, hints, user ID). The division could be vertical and horizontal. At least the attacker won’t have all the treasure in a place!
Some obfuscation techniques probably could avoid the huge data leak…and probably the false sense of security induced by the encryption allows the disaster.
I was just taking a look at C++ 11 when this Adobe thing hit and it seemed like it would provide an interesting sample problem. Plowing through the whole 9GB file only took about three minutes on Xeon E3-1230. Apart from one 7 byte entry, all the password fields were a multiple of 8 bytes long. Furthermore, there are only about 48 million distinct 8-byte blocks (out of 210099808) in the whole file.
[Comment edited for length]
One of the best articles that I have read on this subject. Simply describing a purely technical topic in words that the common man can understand.
I believe this is Adobe’s top management fault. Usually, in such big companies, the Officers are big cheap ******** who don’t won’t to spend a penny on security. I wish they get sued or something.
This doesn’t really surprise me. I’ve noticed similar crypto-stupidity with adobe’s content server (used to sell e-books by just about everyone). The key used to download one of these books is the hash of known information about the book in question, concatenated with the admin password of the content server. Work out this password (which adobe has provided you the hash of) and you can not only get any book for free, but you can control their server as well.
After checking the file, I conjecture that there be collisions (i.e., different plaintext passwords are mapped to the same encrypted/hashed password). This would imply that the encryption function is not one-to-one (injective). This however would imply that hashing would have been used (instead of pure encryption). Possibly in combination with encryption?
What made you think that there are collisions?
(I noticed identical password fields for users with a different hints, e.g. the same password data for “dog” and “cat”, but that just made me smile. Nothing to stop you having a dog called “Thomas” or a cat called “Rover.” Or using an ironic ir misleading hint. Or updating your password and not the hint.)
The only explanation I can come up with is to consider users who updated their passwords but not the hints. However, I found several occurrences of the form (a) “name of my pet/dog/cat” and (b) “process of producing a metal from its ore” (the actual cases were different).
Interestingly, for both variants (a) and (b), there were much more than a single user. So the conjecture that passwords were updated w/o updating the hints, would be very unlikely. Furthermore, no pet is named after the process described above. I found more such cases in which I do believe that the variants (a) and (b) were pretty different from each other.
Even if Adobe has used more than a single encryption key, the probability of encrypting different plaintext passwords to the same encrypted password (under different keys) should be negligible.
I think “lanman” did something like this: Hash first 8 bytes, if longer than 8 hash the second 8 bytes and store that hash.
This results in a some very common second-block hashes: All 9-character passwords have just one letter hashed to the second block. Precomputing some 18000 hashes means you get one two or three last characters of the password. That allows you to eliminate many, many possible passwords for the first 8 characters.
I’m not ruling out “hashing” just yet.
It was slightly worse in LANMAN’s case: the password was padded to 14 characters (who would ever type a full 14 characters!) and then hashed as two 7-byte halves.
The “hashes” in LANMAN’s case were actually just two DES encryptions – the 7 bytes of each half gave the 56 bits needed for a DES key, which was used to encrypt a fixed string.
(As a general rule: never use a block cipher directly as a hash. Use a hash, since it was designed for that purpose. There are reliable way to convert block ciphers to hashes, but then you may consider them to be hashes.)
is there a way to find out if a domain was compromised?
I’m a mail admin and I need to check if I have users who were on the list but don’t want to try one by one 🙂
Woo hoo, I made the hacked list! Now I am mad. What an utter security failure. Adobe has lost me as a customer. Perhaps if they had taken more care to keep my information safe that would not be the case.
This is why I never let any website keep my credit card “on file.”
90% of linkedin salted hashes were converted back into passwords within 1 week. Yeah, Adobe could have done better, but they already *did* do 89% better than a salted hash would have fared in this particular situation.
Don’t be so fast to believe “recommendations” – because typically they are RUMOURS rather than ADVICE – try to think for yourself. PBKDF* exist for a reason, as do TPMs, and even common sense tells you never to leave your key in your front door – so why on earth would you ever put a salt with a hash? Everyone still does though… Except not Adobe.
The linkedin passwords were UNSALTED SHA1 passwords. It’s no surprise they were quick to be cracked. I think you don’t know what a salt is or its purpose. You mention PBKDF as an alternative, but that also uses a salt.