Anatomy of a malicious email: Crooks exploiting recent Word hole

Thanks to Gabor Szappanos of SophosLabs for
his behind-the-scenes work on this article.

SophosLabs has drawn our attention to a new wave of malware attacks using a recent security bug in Microsoft Word.

The bug, known as CVE-2015-1641, was patched by Microsoft back in April 2015 in security bulletin MS15-033.

The vulnerability was declared to be “publicly disclosed,” meaning that its use wasn’t limited only to the sort of crooks who hang out in underground exploit forums.

Of course, turning a potential Remote Code Execution (RCE) vulnerability into a reliably-working exploit isn’t always as easy as it sounds, but that has happened here.

Here’s how the new attacks go down.


Word-based attacks are usually delivered via email rather than over the web.

Email-based attacks are generally self-contained: once you have received the email, you will probably end up infected even if you are offline when you open it.

Also, booby-trapped emails can easily be wrapped in believable stories, as we see here.

Example attachment names in our malware collection include:

First seen Attachment filename
2015-08-04 WUPOS_update.doc
2015-08-04 ammendment.doc
2015-08-24 Anti-Money Laudering & Suspicious cases.doc
2015-08-26 Application form USD duplicate payment.doc
2015-08-27 AML USD & Suspicious cases.doc
2015-09-01 Amendment inquiry ( reference TF1518869100.doc
2015-09-01 Information 2.doc

You can imagine why a potential victim might open an innocent-sounding document that arrives in an email like this one:

Documents are supposed to be data, not programs, so they ought to be safe to open.

Anyway, how else to see whether the document is relevant or not without opening it?


Infected documents are actually in Rich Text Format (.RTF), not Word Document format (.DOC), despite the attachment names.

RTFs are handy for attackers because they make it easy to package and deliver multiple components in one file.

Attack files have five parts, four of them official and one unofficial, like this:

{\rtf                                       ---Official header 
{\object\objocx{\*\objdata                  ---Part 1
{\object\objemb..{\*\objclass Word.Doc..}}  ---Part 2
{\object\objemb..{\*\objclass Word.Doc..}}  ---Part 3
{\object\objemb..{\*\objclass Word.Doc..}}  ---Part 4
}                                           ---Official end
1$çk4àjöd~».Né*&fñõ>ãëxCW...                ---Part 5 

The last part, not normally found in RTF files, is a lump of raw binary data known punningly as a BLOB (Binary Large Object).

The BLOB in Part 5 is simply tacked onto the file where it is ignored by Word but can be read in by the exploit later on.

Part 5 contains a mixture of:

  • Directly-executable machine code, known as shellcode, that will be used as part of the attack that injects malware onto your computer.
  • A hidden malware program that is installed by the above shellcode. Because the malware travels along with the RTF file, the attack is self-contained.
  • A hidden document displayed during the infection process. Word often crashes when a security vulnerability is exploited, something this decoy document helps to disguise.

The other parts perform the following functions:

  • Part 1. Instructs Word to load an ActiveX component called OTKLOADR.DLL. This, in turn, loads a Microsoft runtime DLL called MSVCR71.DLL. This DLL does not support Address Space Layout Randomisation (ASLR), so an attacker can predict where it will be in memory.
  • Part 2. This contains a large BLOB of embedded data that includes the executable shellcode that the attackers want to run. By filling a large area of memory with shellcode, the attacker has a larger target to aim his exploit at.
  • Part 3. This contains data in XML that triggers the CVE-2105-1641 vulnerability. This allows the attacker to make unauthorised adjustments to memory that will send control off to the shellcode in Part 2.
  • Part 4. This seems to be an experimental section. It contains a second vulnerability that, if successful, would run CALC.EXE, the Windows calculator. We couldn’t get it to work, and we assume the attackers couldn’t either, because of the still-experimental payload.


We’ll try to keep this simple and reasonably non-technical.

Basically, the attacker aims to run shellcode that is stored inside Part 2:

005c 31c9      xor   ecx, ecx         
005e 648b7130  mov   esi, [fs:ecx+0x30] ; Find PEB
0062 8b760c    mov   esi, [esi+0x0c]    ; Find PEB_LDR_DATA
0065 8b760c    mov   esi, [esi+0x0c]    ; Get module list
0068 ad        lodsd                   
. . . 

We’re not going to analyse this here; it’s enough to say, if you’re wondering, that this sort of code is commonly found in Windows exploits.

→ Shellcode usually locates the Process Environment Block (PEB), and from there the list of DLLs from the PEB_LDR_DATA block. This allows the shellcode to build up a list of useful system functions that are already loaded. For example, this shellcode finds and uses VirtualAlloc, GetFileSize, CreateFileMappingA and MapViewOfFile.

The shellcode in Part 2 deals with finding the data in the BLOB from Part 5 and transferring control to it.

In other words, Part 2 is the “dispatcher” part of the exploit – it sets things up to hand control cleanly to the attacker, and can remain the same, or substantially similar, across every sample.

(Part 5, tacked onto the end of the RTF file, is the “warhead” part that actually contains the malware and the decoy document that the attacker wants to deliver, and can easily be changed to vary each sample’s payload.)

The problem the crooks have is that Part 2 is loaded by Word as data, so the shellcode is deliberately stored in memory blocks that are labelled NX, short for NO EXECUTE.

If you simply jump to the shellcode, the CPU will tell the operating system, “That’s data and can’t be executed,” and the operating system will block the attack.

This is what’s known on Windows as DEP, or Data Execution Prevention, introduced as a very useful security measure to make things harder for the crooks.


In this attack, the crooks get around DEP by implementing the first part of their shellcode as data, not as code.

This trick is called Return Oriented Programming, or ROP.

ROP involves stringing together a list of code fragments that are loaded in a block of memory marked as executable, such as a system DLL, and sending control off to them one-by-one.

ROP shellcode therefore consists of a list of memory addresses where the CPU will find the code fragments you want to run, known as gadgets, instead of the code fragments themselves.

In Part 2, the “gadget list” looks like this:

In theory, gadget lists of this sort should be impossible to determine in advance, so that attackers who want to bypass DEP in this way have to guess blindly, and will almost certainly fail.

That’s the idea behind Address Space Layout Randomisation (ASLR), whereby DLLs are loaded at a slightly different address every time you reboot Windows or re-start an application.

Simply put, DEP and ASLR are designed to work together, because:

  • DEP prevents an exploit from jumping directly to its own shellcode, even if it knows where to find it in memory. The shellcode simply won’t run.
  • ASLR prevents an exploit from jumping indirectly to shellcode, because the exploit doesn’t know where to find it in memory.

Unfortunately, just as one rotten apple is said to spoil the barrel, so one non-ASLR DLL can spoil the full-court-press that DEP and ASLR together are supposed to create.

And, as we mentioned above, Part 1 of the malicious RTF forces a fixed-location DLL to load.

MSVCR71.DLL always loads at the same, predictable memory address, so an attacker can hard-wire the gadget list into his malicious file in advance.

The gadget list

Breaking up your entire shellcode into tiny machine-code fragments and creating a gadget list for the entire exploit is, in theory, possible, but rarely necessary.

Indeed, the gadget list in this exploit only has about 12 steps; all it really does is to call the system function VirtualProtect to turn off the NO EXECUTE setting on the rest of the shellcode.

Then, the ROP chain jumps to the now-executable shellcode in Part 2.

This works because the data memory containing the shellcode was allocated by Word.

As the owner of the memory block, Word has sufficient privilege to change its access control settings; and because the ROP gadget fragments are running inside Word, the exploit has sufficient privilege, too.

Firing the gadgets

The most important step in the attack, of course, is to trick Word into running the ROP gadget chain in the first place.

Typically, that means corrupting an area of memory where Windows has already stored the address of a program component or subroutine that it will later execute.

That is achieved in Part 3, where Word’s processing of XML smart tags can be tricked into saving data at memory addresses it shouldn’t:

Again, we aren’t going to analyse this here, but the important part is the first fragment, where the value 150997000 (x09000808 in hex) is buggily written at the address 0x7C38BD80.

The ultimate outcome of that memory corruption is that Word will start using address 0x09000808 as its stack, and the stack determines where the machine code instruction RET, or return from subroutine, will go next.

(Sneakily taking over where the RET instruction will go is how ROP got its name.)

And, as the attackers know, probably by conducting repeated experiments, 0x09000808 just happens to be where Part 2 of the malicious RTF gets loaded in memory, so it contains the ROP gadget list and the shellcode.

Malware delivery

We have seen two different malware families packaged into Part 5, as follows:

Attachment filename Malware type
WUPOS_update.doc UWarrior
ammendment.doc UWarrior
Anti-Money Laudering & Suspicious cases.doc Toshliph
Application form USD duplicate payment.doc Toshliph
AML USD & Suspicious cases.doc Toshliph
Amendment inquiry ( reference TF1518869100.doc Toshliph
Information 2.doc Toshliph

Toshliph is what’s known as a downloader: its primary function is to fetch more malware, so that the attackers can keep changing what gets delivered.

UWarrior is what’s known as a backdoor: it allows crooks to control your computer remotely.

UWarrior’s main functions are to acquire a list of all software installed on your PC, to find and steal files on command, and to download additional programs and run them.

What to do?

  • Keep your patches current. The bug that allows Part 3 of this attack to corrupt memory was patched back in April 2015. Once you are patched, the shellcode in these files is just harmless data.
  • Keep your anti-virus current. In the case of the Toshliph samples mentioned above, three malicious files will be opened on your computer: the malicious RTF, the Toshliph downloader dropped by the shellcode, and the malware finally fetched by Toshliph. That gives you a triple chance of heading off infection!
  • Avoid opening unexpected or unsolicited attachments. Unless an attacker hacks one of your customers or suppliers and sends the email straight from there, there will almost always be telltale signs that the email is not actually part of your usual business workflow.
Sophos Anti-Virus detects and blocks the various components of this malware as: Troj/DocDrop-FK (malicious RTFs, regardless of embedded malware); Troj/Uwarrat-A (UWarrior backdoors); Troj/Agent-AOOB, Troj/Toshliph-A, Troj/Toshliph-B (Toshliph downloaders).