This is the first in an occasional series looking at some of the techniques we use in SophosLabs to help us take malware apart.
We hope you enjoy this article – if there are any topics you’d like us to cover in future articles, please let us know!
Many thanks to Mike Wood of SophosLabs in Vancouver for his behind-the-scenes effort that made this article possible.
On the trail of rootkits and other malware
When an interesting new piece of malware makes the news, the first questions people ask are usually, “How does it work? What does it do?”
In the old days, back when there were no more than a few hundred new viruses each year, almost all of them written in assembly language, we’d often start with a static, analytical approach by disassembling or decompiling the machine code itself.
Once we knew what sequence of operations the malware performed – for example, that it scanned through the directories on the C: drive and appended itself to every .COM file – we would then run the malware on a freshly-prepared computer and confirm our analysis using a dynamic, deductive approach.
But these days there are hundreds of thousands of new malware samples every day, written in a variety of programming languages, and delivered in a variety of ways.
The vast majority of the samples we get aren’t truly new, of course.
They’re unique only in the strictly technical sense that they consist of a sequence of bytes that we haven’t encountered before, in the same way that Good morning and GOOD MORNING are not literally the same.
Indeed, most of the new samples that show up each day are merely minor variants that we already detect, or known malware that has been encrypted or packaged differently.
Nevertheless, that still leaves plenty of samples worth looking at.
So, these days we usually start dynamically and deductively, using automated systems that run the malware in a controlled environment, instead of first trying to deconstruct each new sample by hand, like we did in the 1980s.
And that leaves us with the questions behind the questions that we asked at the start, namely, “How do you tell how it works? How do you keep track of what it does?”
On the trail
Common monitoring techniques when you are following the scent of a suspicious program include:
Snapshotting |
Take a “before snapshot” that records the state of the system, for example including the names of all the files (and their checksums), and the contents of the registry, and store it somewhere safe. Run the malware. Take an “after snapshot” and compare it with the first. |
System call tracing |
Keep track of system calls, such as the self-explanatory CreateFile(), CreateProcess() or URLDownloadToFile(), and record the parameters that were used. |
The snapshotting technique tells us how things ended up, and the tracing technique tells us how we got there.
For example, the snapshot can pinpoint files downloaded by the malware, and the trace can identify where they were downloaded from.
But relying only on snapshotting and tracing can leave gaps in our understanding of a malware sample.
Potential problems include:
- Noise. A new file that shows up as a single item in a snapshot might be created by hundreds of thousands of one-byte-at-a-time calls to WriteFile().
- Timing. How long should we wait between snapshots? Too long, and the malware might have been and gone; too soon and it might still be waiting for a malicious download to start.
- Certainty. Because we are taking our measurements inside the operating system, we run the risk that the malware might deliberately feed us incorrect or diversionary results.
Most importantly, how do we tell if malware does really sneaky things, such as installing a rootkit, writing to unused parts of the disk via system calls that we aren’t monitoring, or using undocumented features or exploits?
Using virtualisation
Virtualisation can help here.
Unless we are dealing with malware that deliberately behaves differently when we run it inside a virtualised environment (e.g. VMWare, Xen, VirtualBox) we rarely use “bare metal” computers with the malware running directly on a real computer.
Virtual machines, which are effectively software computers, have many advantages, notably:
- One physical computer can contain many different starting images for trying out malware.
- Multiple malware samples can be analysed simultaneously by running multiple virtual machines.
- Virtual disk images are stored as regular files and can easily be backed up and restored.
The last item turns out to be especially useful in looking out for changes, because we can compare the state of a disk image before and after the malware is run.
The comparison happens from the host computer itself, when the virtual machine is frozen or stopped, so it can’t be tricked by the malware hiding itself by feeding us bogus results. (This behaviour is jocularly known as stealth or anti-anti-virus.)
Effectively, we end up with a sector-level snapshot of everything that changed in the virtual disk image, including changes that might not show up in a conventional snapshot.
That includes data written to temporary files, the swap file, the disk’s boot and partition sectors and even to officially-unused parts of the disk.
That’s a trick that some rootkits use to great effect: they implement a proprietary filing system, hidden in empty sectors on the disk, in which they can store programs, data, and configuration files that are as good as invisible to the operating system.
So a sector-level record of what changed on the disk, and where, is a good way of counter-attacking the malware, because changes outside the remit of the operating system show up clearly, and can immediately be flagged as suspicious.
Speeding things up
But one problem with sector-level snapshotting is that looking for changes between the “before” and the “after” images can be time-consuming.
A virtual disk image of a basic Windows 8.1 install, for example, weighs in at 8GB or more, so checking every sector in the “after” file against every sector in the “before” file means reading at least 2 x 8GB’s worth of raw data, even if only a handful of sectors have changed.
However, there is a handy shortcut that we can use.
Most virtualisation systems include a snapshotting feature of their own, also known as disk differencing, to make it easy to undo any changes after running a virtual machine for a while.
This is very handy when you are testing new software, or analysing malware.
Instead of writing changes back to the master disk image, a separate “difference image” is used to store changes.
When reading back in from the disk, the virtualisation software checks to see if the needed sector is in the difference image first, only reading from the master image if it is not.
In other words, if we run our malware inside a virtual machine that is in differencing mode, we automatically end up with a list of what changed, and where; and when we examine the differences, we can’t be tricked by any self-protection or stealth features built into the malware.
Of course, the difference image itself only tells us which sectors have changed, so we still have to work out for ourselves which files those sectors belong to, but changes that don’t belong to files (for example because they are part of a rootkit that works outside the operating system), stand out at once.
What the changes tell us
The most obvious benefit of tracking malware-related disk changes with difference images is the ease and reliability of spotting what we might call unauthorised disk modifications, such as those made by a low-level rootkits.
But we can use difference images to track file level changes, too.
By working backwards from the difference image, through the NTFS file allocation table (called the MFT, or Master File Table), we can quickly work out which file “owns” each modified chunk of the disk.
That gives us a rapid list of what changed without processing the entire virtualised master disk image.
If any of the changed objects look to be of interest, we can then extract them directly from the virtual disk image files for further analysis.
This may sound like a lot of work compared to simply mounting the virtual disk images and scanning through their directory listings, as we would in a regular “before” and “after” snapshot system.
Indeed, at worst, on a computer where every file changed while the malware was running, the differencing image might end up as big as the master image.
But, in practice, if we time our snapshots carefully to minimise the amount of change while the malware is running (remember that the difference image records all changes, including uninteresting and unimportant ones), analysing the changes this way can be significantly faster than traditional techniques.
In SophosLabs, the speed improvement we have measured is around 60-fold, so that what used to take a minute now takes a second.
So this is a nice example of how we can work smarter and faster at the same time!
Find and kill rootkits with the free Sophos Virus Removal Tool
This is a simple and straightforward tool for Windows users. It works alongside your existing anti-virus to find and get rid of any threats lurking on your computer, including rootkits and other stealthy malware.
It does its job without requiring you to uninstall your incumbent product first. (Removing your main anti-virus just when you are concerned about infection is risky in its own right.)
Download and run it, wait for it to grab the very latest updates from Sophos, and then let it scan through memory and your hard disk. If it finds any threats, you can click a button to clean them up.
The new formatting on this site makes it very difficult to read from an Android phone. The text will not auto scale to for the screen.
Thanks for letting us know. We’ll look into it asap.
Which browser are you using?
I use Firefox for Android and the usual Android-style “double tap” works fine for me. For example, if you double tap inside one of the paragraphs of the article, it expands to the full width of the screen. Double tap again and it contracts to how it was.
With an 800-pixel wide screen, it’s pretty easy to read when the article part (which is usually only about 540 pixels wide) is scaled up and the sidebars are out of the way.
Nice article,
How do you find rootkits and malware designed not to run, or to run differently, in a VM?
Thanks.
The answer, I’m afraid, is “it depends.”
In fact, the problem of getting malware that seems perfectly vivacious in the wild to behave realistically in the lab is not just down to VM avoidance.
There is a range of factors that can make your synthetic environment give misleading results – sometimes it’s deliberate trickiness on the part of the malware authors, and sometimes it’s down to assumptions that just “worked well enough” for the crooks.
Obvious examples are – which VM, which .NET Framework, which OS version, which bitness, which other apps installed, which country, what browser, which plugins, how much memory, and (an important issue for malware that calls home using a DGA or Domain Generation Algorithm) even what date or time you choose.
Sometimes, you can win through by trying a representative sample of different combinations (which is one reason why we ask for pertinent information about your environment when you submit a sample).
Sometimes, the answer is obvious from the system trace, based on the system call at which the malware stopped or seemed to deviate from what you expected.
Failing that, you might have to try more than one virtual machine; try throwing the malware onto real hardware; try automatically decompiling it looking for evidence of anti-anti-virus code; or, if all else fails, you can party like it’s 1989, and do it all by hand.
All malware processing is beset by the Halting Problem, which says that no program can ever reliably determine how another program will behave, so all anti-malware work is effectively “heuristic”, i.e. relies on some inspired guesswork and luck:
http://nakedsecurity.sophos.com/2012/06/23/in-memoriam-alan-turings-100th-birthday/
However, as veteran golfer Gary Player is supposed to have said, “The harder I practise, the luckier I get” π
Thanks for the thorough answer. All of it makes interesting reading.
I’m glad I have lucky people writing signatures for my AV (-:
Cheers,
Guy
Back in the 1980s, the crooks had a bunch of tricks based on the fact different CPUs (e.g. 8088, 8086, 80286, 80306) were, when running DOS, as good as 100% identical. But they did have certain unusual instructions that worked very slightly differently on each chip. So if they figured you’d just invested in a bunch of faster computers for your lab, say 386es, they could write their code so it behaved innocently on a 386 but was virulent on a 286 or below, which still put a goodly percentage of users at risk. You had to watch out for that. (And keep some legacy PCs, just in case π
In those days, it still felt like an arms race, though the crooks were almost always into virus writing as a crime in itself, not as a means to something worse. Losing your data was no less delightful just because the virus writer didn’t intend to make money out of you, of course…
Hello Paul,
Have you ever had malware break out of the virtual machine and infect the host? I’m curious as there have been plenty of sand box attacks in the past and I do run virtual machines that have the ability to copy files / clipboard between the host and guest. I’ve always thought that was a suspect (but also incredibly useful) ability. With applications like Docker making cross platform a breeze I can’t help thinking more people will start doing this purely for convenience and speed of development, with the downside being unexpected problems.
Alan
I haven’t personally had that happen, but it is a risk you should never ignore. Most importantly, don’t treat virtual machines as a replacement for a properly-segregated Lab facility with its own controlled network, but merely as a tool to give you more versatility inside your Lab.
Thanks – I appreciate your advice!
Alan
wheres the mac version ?
The nice thing about the difference imaging “change log” technique is that it works independently of the operating system.
Of course, if you want to trace individual changes back into the filing system, then you have to tailor your difference-processing software to suit the filing system (e.g. HFS+, ext4). I stuck to NTFS and $MFT above because it’s the combination we meet by far the most frequently.
But even if you don’t know (or don’t care, or don’t yet have a processing module for) the filing system, you can still get useful information out of the differenced disk sectors – new file content is in the difference list *somewhere*, so if you can’t look for it cleverly and with the same sort of context you’d get from the operating system, you can nevertheless look for it, even if all you are doing is some kind of binary grep.
If you suspect that the malware rewrites the /etc/hosts file, for instance, blocklisting a bunch of domain names by redirecting them to 127.0.0.1, *that list will be in the differences somewhere*, and [a] you don’t have to look through the whole master image to find it [b] if it’s in the differences, you can be pretty sure it’s new, i.e. was a side-effect of the malware.
As for the “inside the OS” monitoring techniques mentioned above, you might like to try experimenting with some of OS X’s built-in utilities, e.g. Time Machine (which makes a handy snapshotting tool, because that’s pretty much what it is π and dapptrace (which is a quick way of starting to use the lower-level DTrace system monitoring tools).
PS. I didn’t actually mention any free software for Windows snapshotting and tracing, but if you are interested, you might want to look at RegShot and various of the Sysinternals tools.
You guys work on some cool stuff. You should open source some of your automation for provisioning, tracing and snapshotting the system, so QA people don’t have to write crappy versions of this for their dayjobs.