There’s yet another new groovy exploit name on the “this is an interesting security problem” block.
We’ve had already had bugs with names like Heartbleed, POODLE, Shellshock and FREAK; here comes rowhammering.
The name row hammer is reasonably new, and seems to originate with a range of Intel patent applications filed in 2012.
The problem stems from the way Dynamic Random Access Memory (DRAM, or what we usually just call RAM for short) works.
It’s hard to explain row hammering without getting bogged down in detail, but what matters is that data in DRAM chips is stored as an electrical charge in rows (and rows, and rows) of tiny capacitors.
A DRAM chip might have, say, 256K rows of 64K capacitors, each storing one bit, for a total capacity of 16Gbits, or 2GB.
For performance reasons, you can’t read an individual bit out of DRAM, but you can energise an entire row of capacitors at the same time, and read out all their values.
The thing is, when you read out a row of bits, they discharge, thus losing their values; so once you read them, you need to write that data back immediately by recharging the capacitors in the row.
So a DRAM read cycle is actually implemented in hardware as a read-and-refresh cycle.
As it happens, DRAM capacitors discharge steadily of their own accord, whether you read them or not.
So the DRAM hardware does a read-and-refresh on every row automatically anyway, on a regular basis, in order to keep the memory topped up.
That’s why it’s known as “dynamic RAM” rather than “static.”
But DRAM specifications insist that storage bits must retain their value correctly, without being refreshed, for at least 64 milliseconds, which is ages by computer-clock standards.
So, if you explicitly read a memory address over and over again, you can force much more frequent reading-and-recharging of a row. (Remember, you can’t read without immediately refreshing the memory contents.)
Now rewind 30 years to cassette tapes.
Remember how, if you didn’t listen to a tape for a while, your favourite tapes generated “echoes“?
In the quiet bits between tracks, you’d hear a ghostly echo or two of the end of the song before; the echoes would be spaced apart by the time it took for the reel to rotate once at that point in the tape.
The reason was that the magnetism from one coil of the tape would influence the magnetic coating of the tape pressed up against it, one tape-thickness away. (That’s why longer cassettes, which used thinner tape, were more likely to have echoes.)
You wouldn’t get a perfect copy, but you would get bleed-through, causing audible corruption.
It turns out that the same sort of one-guy-affects-his-neighbour problem applies to DRAM, too.
And, in the same way that high-capacity C120 (two-hour) casette tapes corrupted faster than their C60 (one hour) counterparts, the problem gets worse as the capacitors in DRAM are squeezed ever closer to increase memory storage sizes.
In other words, by repeatedly and frequently reading from the same memory address, and thus triggering a read-and-refresh of the same row in DRAM, you may inadvertently introduce flipped bits – data corruption – in close-by capacitors in neighbouring DRAM rows.
Saved by the cache
In normal programming, DRAM read-and-refresh is limited by caching, which is where memory data you accessed recently is temporarily held in storage that’s even faster than DRAM, to save time if you need the same data again soon, which you often do.
So reading the same byte over and over doesn’t usually cause “row hammering,” and thus bleed-through is avoided.
But, as researchers at Carnegie Mellon University (CMU) wrote in 2014, a malicious programmer can deliberately sidestep the cache by repeatedly using a handy machine code instruction called CLFLUSH, which stands for Cache Line Flush.
CLFLUSH makes the processor forget that it already knows what’s in DRAM address X, so that your next access to X really does read from DRAM, and thus triggers a read-and-refresh.
So you can deliberately hammer a row, and as the CMU researchers found, provoke DRAM bit errors in surprising quantity.
In the CMU paper, with millions of repeated reads at carefully-chosen intervals, the researchers observed induced error rates as high as 100,000 incorrect bits per gigabit.
Enter Project Zero
More recently, Google’s Project Zero team wondered if you could actually exploit row-hammering bit errors to run unauthorised code.
In other words, instead of simply corrupting data and causing a program to crash or to produce dud results, could you predictably make a program or an operating system misbehave through unpredictable random changes in unpredictable memory locations?
To cut a long story short, the answer turned out to be a qualified “Yes.”
One method involved deliberately targeting the memory cells where the operating system was storing its page tables, which is the map of which memory belongs to what process.
With careful advance preparation, the researchers found that it was indeed sometimes possible to trick the operating system into mapping in their own, malicious memory content instead of what was supposed to be there.
The details are rather complicated, but the overall result is that even a modest number of random-seeming changes in DRAM, if orchestrated cleverly, can produce very non-random failures.
In particular, the Googlers figured out how to redirect the CPU to run code from the wrong memory addresses, by shaking up the operating system’s map of which memory went where.
In other words, they didn’t need to hammer the DRAM so that it suffered a specific and predictable sequence of errors, and thus to write out malicious code bit-by-bit.
They just needed to hammer it so that it caused the operating system to look in the wrong place, and by guessing where those wrong addresses might be, they could fill lots of candidate locations in advance (a trick known as spraying) with the malicious code they hoped to run.
What to do?
The saving grace here is that the Googlers relied on a very lightly loaded computer, meaning that they were able to hammer more predictably and aggressively than on a computer where their hammering code was just one of many processes competing for the CPU’s attention.
Also, apparently, speeding up the rate at which the routine DRAM refresh happens (the automatic “read for the purpose of refreshing the capacitors” operation that DRAM must perform regularly in order to work at all) will significantly reduce the row hammering effect.
So for most of us, this research almost certainly doesn’t represent a clear and present danger.
But if you’re a DRAM designer, this is a clear future danger that needs addressing: perhaps minimum refresh rates should be increased to reduce the effectiveness of row hammering?
Similarly for CPU designers: for example, perhaps it’s time to make CLFLUSH into a privileged instruction to limit a crook’s ability to perform deliberate row hammering?
Image of wielded hammer courtesy of Shutterstock.
17 comments on ““Row hammering” – how to exploit a computer by overworking its memory”
Don’t really understand how to fix it
There’s really nothing to fix. It’s more of a “look at this weird thing on the horizon” type article.
If Naked Security were a book, I’d read the s**t out of it. You guys come up with the most interesting articles and you explain it in words that anyone could understand. Kudos to you Paul and your team!
Kudos on an excellent piece of technical reporting both explaining this proof-of-concept exercise and putting it in historic and technical/policy context. It belongs in the exemplar file of everyone who teaches technical or science writing.
Is this an equal issue on all dram in general? Or is it less likely on server (ECC) memory? And are server processors (XEON) less likely to be fooled into running malicious code?
With ECC (error-correcting code) memory you are a bit safer (hahahahaha!), precisely because of the ECC. Some bitflips will be detected and fixed in time. But not all…
Seems that the hammerability depends on many factors, some of which were not covered in the papers. I’ll guess temperature comes into it, maybe humidity, and – one can only hope – cosmic rays 🙂
Never say never, but IIRC the results in the papers above were measured with non-ECC memory.
As far as I understand, DDR2 and older RAM types are not affected. The issue affects mostly notebooks and their RAM. The appropriate counter-measure seems to be replacing your RAM modules with ones that are row hammer safe. Apparently only certain vendors, models or charges are affected albeit quite a lot. I could reproduce the row hammer effect on a Samsung RV720 notebook within 10 minutes but not on any desktop machines.
I would not agree at all that this is “nothing to fix”. In the past, many exploit methods were deemed to be too complicated to be used in the wild. However, once you secure a system as much as possible these complicated exploits become more worthwile because they may be the only (known) way left – for the time being. Also many exploits are being improved upon and end up in toolkits even. Google has demonstrated a very real exploit which doesn’t just work in a lab. Computers that run fairly idle for long times are not something rare in businesses, so that isn’t a condition that would this exploit unlikely.
This is certainly something I would expect to find in a targetted attack. Well, of course, if your security standards are fairly low, you won’t have to worry about this one at all.
Row hammer is not random, nor single bit upset, so ECC does not handle it. That’s why it can be exploited and why this is a big deal. Row hammer was well understood in the 1980’s because DRAM were susceptible to it back then. It has nothing to do with process geometry, it is quite simple that they cut the corner too close on the bitcell/array design. In the ’80s the CPU had direct access to the DRAM (no cache), but as caches were introduced the statistical likelihood of row hammer went away. In a normal system it is not an issue, but if you bypass the cache, or deliberately write code that can hammer rows you are going to have problems if it is a crappy DRAM design.
Row hammer can be designed out in the memory array, but that doesn’t help the installed base.
ECC helps, because there is likely to be *some* warning if things go wrong, just as increasing the refresh rate above the minimum seems to help, too. (There are some stats in the paper.)
Define what you mean by ECC ‘helps’. It won’t warn you any more than running the latest version of memtest warns you. Either the memory is susceptible or not. The system I’m typing this on is susceptible and it has Elpida DRAM manufactured in 2011. How would having ECC help me more?
Increasing refresh also reduces the probability that a row hammer happens by chance (or as you describe ‘helps’), but it will do nothing to prevent an exploit of a susceptible system.
Most things in security come down to “reducing the probability” in the end…
Without having read the paper I’m assuming they are overwriting a JMP opcode… if that’s the case, I’m wondering why its location isn’t protected by something like ASLR. Shouldn’t UEFI be able to provide that kind of protection to the Windows bootloader?
You probably need to read the paper. ASLR doesn’t help when the memory “write” wasn’t supposed to happen in the first place. (From the CPU’s memory management point of view, there *is* no write. The memory changes underneath the CPU’s feet due to electrical interference.)
Paul, i would like your final word on this. With meatiest i’m able to get a Note stating that my Memory may be susceptible to row hammer bit flip, do i need to change my memory, or the row hammer does not happens with the normal use of a Computer. I think that artificial tests obviously make such errors appear easily, but not sure if everyday usage is the same.
I’m not worrying about my own laptop, even though it almost certainly has hammerable RAM. (I’m running OS X and IIRC, Apple did some firmware tweaks to increase DRAM refresh rates between then and now, which reduces the probability of successful hammerage.)
Great article Paul, kudos!
Tsutumo Shimomuro (famous for Mitnick “takedown”) worked at San Diego Nat Lab on similar issues a long time ago…which means USCYBERCOMMAND must have some powerful cyber weapons. You might want to track down the specifics of Shimuro’s work, especially from the viewpoint of an adversary being unable to digitally counter-strike.