From the Labs: New developments in Microsoft Office malware

In September 2014, we wrote about a resurgence in VBA malware.

VBA stands for Visual Basic for Applications: it is a powerful and very widely-used programming tool that can be used right inside applications such as Microsoft Office.

That makes it common, and indeed perfectly usual, in legitimate files.

But, as we we wrote last time:

Visual Basic code is easy to write, flexible and easy to refactor. Similar functionality can often be expressed in many different ways which gives malware authors more options for producing distinct, workable versions of their software than they have with exploits.

In short, what is good for the gander is equally good for the goose.

Indeed, over the past six months, malware that arrives as a VBA program inside an innocent-looking document has become an all-too-common occurrence in the threat landscape, and an essential weapon in spam campaigns.

Backward compatibility

Obviously, attackers who use VBA rely on their victims having some version of Office installed.

As you can see, SophosLabs statistics show that malware writers prefer Word and Excel to PowerPoint.

The reason for this is likely because malware delivered in spam very commonly pretends to be a courier delivery notice or an invoice, or similar, and these are typically stored as Word documents or Excel spreadsheets.

But the crooks also greatly prefer the older “1997-2003” Office file format.

Files in the 1997-2003 format are stored in what Microsoft calls the Object Linking and Embedding (OLE) Compound File format, often just called OLE2 for short.

OLE2 uses a FAT-like structure to define various streams (which you can think of as files in a disk image) consisting of fixed-size blocks; these streams declare the structure and content of the document.

The rest of the VBA malware we see is in the more recent “2007 and later” format.

These files are denoted with an -X appended to the file extension (e.g. DOCX instead of DOC, XLSX instead of XLS).

Dash-X files are stored in a file specification known as Office Open XML (OOXML).

Files of this type take the form of a ZIP archive containing a series of XML files that define the document’s content and presentation.

We can only guess why malware writers have been reluctant to commit to the 2007 format, but a good bet would be the increased likelihood of a successful infection.

Newer versions of Office can open both new and old file types, thanks to backward compatibility, but the old Office versions were never patched to let them handle the new formats.

Office XML

Interestingly, there is another, little-used file format that was introduced way back in Office 2003.

Files in this format consist of a standalone XML file, and they are sufficiently unusual that they don’t appear at all in the pie chart above.

To our surprise, however, we have recently seen a surge in brand new VBA malware packaged in this old and unusual format.

Once again, we have to guess why the crooks have decided to revive this format, which might simply be down to the fact it is little used, and thus not commonly associated with attacks.

Perhaps, also, malware authors hope that the rarity of XML-type files means that some security products are unable to deconstruct it properly.

→ Sophos products can decompose OLE2, XML and OOXML type files and extract their contents in a similar way. In other words, the same malware saved in three different formats will be detected identically.

Opening the container

The process of extracting a VBA program from an Office file depends on the container format that is used.

In “1997-2003” files, VBA code is stored in a number of streams which are enclosed within the same OLE2 container as the other document streams, such as the WordDocument stream which contains the document’s text.

Office 2007 files also store their VBA code as streams in an OLE2 file, but the other document data is detached into separate XML files in the main container file, which is in the ZIP format.

So the OLE2 container that holds the VBA code is simply a file named vbaProject.bin inamongst the XML files in the outer ZIP file.

And the Office 2003 XML format also uses a dedicated OLE2 container to store VBA code, with the structural difference that the data is compressed into MSO format (a proprietary Microsoft format also used for email attachments) and then text-encoded into Base64.

If we extract the Base64 data and decode it, we obtain the MSO file, indicated by the text “ActiveMime” at the start.

Unpacking the MSO file leaves us with an OLE2 container with the VBA progam.

What next?

Using a recent malware example, we extracted the VBA code from its XML wrapper.

Here’s what we found:

At first glance the code might appear complex but it is actually very simple code that has been deliberately padded out in an attempt to disguise its true intentions.

This subroutine is the entry point of the VBA and the first points of interest are the seemingly nonsense strings declared at the start of the file and what appears to be the same four lines of code repeated in groups of three.

We will look at the strings in depth later but first let’s look at the duplicated code (highlighted in red).

These four lines appear to have no effect on the final outcome of the subroutine.

The code declares a variable that is never usefully referenced, a for loop whose termination condition assures that it is never executed and a conditional if statement that is always false.

Programming like this is often referred to as dead code, probably created automatically by a code generation engine.

Removing this dead code leaves us with a much smaller, more readable subroutine, although it is still not clear what the code actually does:

A noticeable trait is the repeated function calls to “ho3NnG”.

Each call is accompanied with one of a number of hardcoded string constants declared at the top of the file.

Jumping to the “ho3NnG” function, contained in a separate code module, once again seems to plunge us into complexity.

But notice that there are numerous GoTo statements scattered amongst the function’s body:

Since these jumps are non-conditional, and there are no labels between each jump and its destination, the code sandwiched between them can never be triggered.

Code like this is known as unreachable code, we can simply remove it from consideration.

Without the unreachable code noise, and with a little bit of re-arrangement, we are left with a much simpler function:

The code above loops through the passed-in string and XORs each character with the decimal value 255. (This has the effect of flipping each bit in each byte.)

The result of each XOR is appended to a new string which is returned to the caller.

This sort of text-unscrambling function is very common in malware, because it is a simple way of disguising data such as filenames, messages and URLs that would otherwise be both obvious and suspicious.

We can now simply replace the original calls to “ho3NnG” with the unscrambled data that comes back each time.

Now it looks more like malware:

With the formatting cleaned up a little and the variables renamed, the true intentions of this file are clear.

Simply put, this code:

  • Makes an HTTP connection to port 8080 on the server 173.xx.xx.xx
  • Downloads a file on the server called abs5ajsu.exe.
  • Saves it in the TEMP folder as fdgffdgdfga.exe.
  • Runs it.

Why use a downloader?

The crooks could simply have embedded the content of abs5ajsu.exe as scrambled data in the VBA code, so that the malware would work even when offline.

But by using a downloader, they delay showing their hand until the last moment.

Only when the Office file is opened (rather than when it is received) do they reveal what malware they are actually using in the attack.

That gives them extra flexibility: they can change the malware at any any time; adapt it depending on the geolocation of the victim; or even download clean files as decoys.

In this example, the malware that was downloaded next was a variant of Dridex, a banking Trojan derived from Cridex.

This particular sort of VBA downloader is commonly associated with Dridex payloads, accounting for around 70% of all VBA-based malware in the past three months.

What’s old is new again!

→ Sophos detects and blocks the malware described above as Troj/DocDl-GO (VBA downloader part) and Troj/Dridex-AZ (dropped malware part).