Wolf in sheep’s clothing: a SophosLabs investigation into delivering malware via VBA

Thanks to Graham Chantry of SophosLabs for the behind-the-scenes work on this article.

The document threat landscape has in recent years been dominated by Microsoft Word and Excel spreadsheet malware. This is thanks, in no small part, to the drastic resurgence of Visual Basic for Applications (VBA) being used as a delivery method for malicious payloads.

It’s a topic we’ve delved into before, most notably in this article from senior Sophos technologist Paul Ducklin. As he did back then, researcher Graham Chantry recently dug into the data and mechanics of the trend as seen from the SophosLabs’ perspective for an updated picture of the problem. What follows are his updated findings for the last six months.

By the numbers

First, some statistics that show the current state of affairs. In the pie chart on the left, we see that 68% of the file types used to deliver malware in the last six months were Word. Excel spreadsheets accounted for 15% and PDFs accounted for 13%. When it comes to the threat type, we see in the right-hand chart that 81% is VBA based, while embedded droppers account for 10% and phishing is 6%.

VBA Droppers first started to surface in July 2014 and became synonymous with the banking Trojan Dridex when they started to utilize them in aggressive spam campaigns. Since then, we have seen VBA Droppers used with a variety of other payloads that have evolved from simple 10-line droppers to verbose, complex and heavily obfuscated code.

And it’s not just the code that the bad guys have experimented with. In the same time period, Chantry said SophosLabs has seen attackers utilize a variety of file formats, such as the short-lived Office 2003 Standalone XML format, the MHTML Web Archive format and, in much rarer cases, embedding Office files within other document formats such as RTF and PDF.

The Matryoshka doll approach

The latter of these file formats has actually become far less rare. In just the last few weeks SophosLabs noticed a significant increase in the number of ransomware campaigns housing VBA droppers in PDF documents.

SophosLabs discovered one spam campaign where ransomware was downloaded and run by a macro hidden inside a Word document that was in turn nested within a PDF, like a Russian matryoshka doll. The ransomware in this case appeared to be a variant of Locky.

Most antivirus filters know how to recognize suspicious macros in documents, but hiding those document inside a PDF could be a successful way to sidestep it.

These attachments arrived in spam emails where the body was entirely empty, but the subjects started with either “Document”, “File” or “Copy” followed by a series of random numbers (File_78564545). The distinct lack of social engineering suggested the crooks are relying on curiosity alone for victims to open the enclosed PDF.

The PDF attachments themselves appear to always have a nonsensical filename such as “nm.pdf” (as shown in the screenshot above). If the recipient is naïve enough to open this attachment it will trigger the infection.

But before we replicate that, lets have a look at what’s actually inside this PDF file.

Anatomy of a malicious PDF

SophosLabs started by opening nm.pdf in a text editor and with little effort we can see an immediate red flag: the file contains an OpenAction event (see screenshot below). An Openaction event defines what will happen when the user first opens the document. In this case, the PDF reader will execute a JavaScript function called submarine. So what does this submarine function actually do? In order to find its definition, SophosLabs had to parse the remainder of the PDF.

PDF files consist of objects that define all aspects of the document’s content, such as images, fonts and of course the actual text. The OpenAction screenshot (above) can also help illustrate the format of a simple PDF object. Each object starts with a unique Index number (in this case decimal 14) and a version number (in this case version 0). The actual contents of the object are housed between the header obj and the footer endobj.

PDF objects can also indirectly reference each other and they do so via these unique index numbers.

In the screenshot above we can see that Object 14 (which holds our OpenAction event) references object 13, which itself references object 11, which references object 7 which finally references object 6. Object 6 is what is known as a “stream object” and the format tells us that it is 380 bytes in length and that its content is Flate encoded. This is illegible content when shown in text editor, so SophosLabs deflated it.

The screenshot above is Object 6’s deflated stream and right at the bottom is that submarine function for which Labs was searching. Unlike most modern JavaScript malware, this code was very straightforward, with little to no obfuscation.

Submarine consists of a single call to abc, which is a pointer to the inbuilt exportDataObject API. This API extracts an embedded file (in this case HGG4X.docm) and saves it to disk. If the nLaunch argument is non-zero the application will also open the extracted file in the default application. In this case the value of nLaunch is set to 2 which will result in the embedded file being saved to a temporary directory and then opened.

The next question was: where is HGG4X.docm? By tracking back to the root object (14), SophosLabs saw that Object 13 not only references JavaScript in object 11 but it also references “Embedded Files” in object 12.

The next question was where HGG4X.docm was hidden. By tracking back to the root object (14), we saw that Object 13 not only references JavaScript in object 11 but it also references “Embedded Files” in object 12.

Chantry said:

Unlike most document malware these days, the social engineering effort leaves a lot to the imagination: it simply asks you politely to open the embedded document. But if our user is naive enough to open an attachment from an unknown recipient, there is a good chance they’ll be naive enough to follow these instructions. We click “OK”; the JavaScript completes its mission and HGG4X.docm is dropped and opened into Microsoft Word.

As we anticipated, the second the user opens the attachment, the JavaScript kicks in and attempts to open the embedded VBA document. It’s not plain sailing; however, as Adobe Reader identifies that this might be something malicious and suspends the action. In order for the infection to continue the user will have to explicitly approve it.

Only two lines into the program, the Labs found the first indicator of something malicious, starting with an unconditional jump. As there is no label between the Goto statement and where it’s jumping to, the code wedged between them is unreachable (aka dead code). This is not very common in clean files as developers will often remove unused code. This sort of trick is very common among VBA downloaders and aims to try and confuse analysts trying to reverse engineer it. Unlike most samples that utilize this trick, however, the dead code in this file appears to be clean code snippets, likely taken from MSDN’s or other online resources.

Ingenious methods

Jumping over the junk code we see that Synomati starts by creating an Object (of type Cooper) and immediately calls one of its methods. Strangely, it doesn’t reference the method directly though, instead using the VBA function CallByName. This technique of calling an Object’s method allows the caller to specify the name of the method as a string argument rather than hardcoding it. In this code that name is stored in a TextBox component located on a VB form (called Window1).

Above is the Window1 VB Form as it appears in the Visual Basic editor. The red boxes indicate their names. Various attributes of these components are referenced throughout the program’s code.

Storing strings within form components is an ingenious method of concealing the true intentions of malicious code, as it’s often the strings that give the game away – eg suspicious IP addresses or calls to processes such as powershell.exe. We first started to see samples using this method in early 2016 but the majority of VBA droppers still prefer to obfuscate their strings, usually a variety of Xor, Base64 or RC4 encryption.

The CallByName function call from the previous screenshot was referring to the Text field of the TextBox T2. As seen in the bottom right corner of the form, that is the string “ratatu”. By searching for that expression in the Cooper class’s implementation, Labs found the method.

Just like its caller, ratatu also references strings stored in form components. This time though it’s in the Tag field of the ComboBox imaginatively named ffrrggbb.

The attribute isn’t visible from the Form Designer View so Labs needed to look at the properties tab for ffrrggbb. As you can see at the bottom of the screenshot below, the Tag field contains a long jumbled string.

Ratatu uses the VBA split function to divide this string into an array of smaller strings using the delimiter “FSUKE.”

The resulting array is a veritable who’s who of VBA dropper strings and, based on this information alone, Labs confidently predicted that the code was likely to download and run something. The array is stored in the variable AsStringName which is global in scope. This means it will be accessible from every other subroutine or function.

Another global variable is Vaucher which is assigned on the following line. The value it’s set to is at offset 0 of this newly created array “Microsoft.XMLHTTP.” This is because FreshID is a constant set to 0; so (0+0 * 2 / 13) is just a deliberately verbose method of declaring 0.

The function then proceeds to call SubMui. The IF statement at the start of the subroutine is always true for this file (the ActiveDocument.Kind property is 0) so it proceeds to create 4 ActiveX objects using the strings from the global array we populated earlier.

Crucially, SubMui also generates another string array using the exact same Split method. This time, however, the delimiter is stored in the Label component named Command (string value “V”). This array is stored in the global variable MovedPermanently and contains four URIs that all point to the payload.

So we now have four ActiveX objects and also an array of URIs to download but SubMui isn’t finished. It also generates a path to the user’s “temp” directory, and it does so, by calling the Environment method of the recently created WScript.Shell ActiveX object. This method returns a dictionary of Environment variables for the current process. Using this object it looks up the value for the Environment Variable “Temp” and assign the value to the variable PUKALA_LAKOPPC.

At this point the code passed the baton to the misleadingly named MoveSheets subroutine, as its name is wholly unrepresentative of its functionality. This subroutine actually loops through the MovedPermanantly array (which contains all those dodgy URIs) and calls SaveDataCSVToolStripMenuItem_Click for each one.

Although we haven’t yet analyzed the SaveDataCSVToolStripMenuItem_Click subroutine at this point, it predicted that it’s likely downloading something, as the Status field of the Microsoft.XMLHTTP object (stored in CuPro) is being checked immediately after the call.

The HTTP status code 200 signifies that a request has been successful, so the IF condition here will raise a runtime error if a download was unsuccessful. In VBA, runtime errors can be caught and processed by error handlers defined using an On Error statement. MoveSheets defines an error handler at the label d13. All this label does though is to call Next which jumps back to the start of the For loop. Essentially if the download failed for any reason we get the next URI in the array and carry on.

By implementing this functionality, the bad guys ensure that if one of their domains is taken down before the victim opens the document there are still three others in the queue waiting to serve up the payload.

Cooper’s Challenge

So, on to another misleadingly named SaveDataCSVToolStripMenuItem_Click subroutine and we can see it starts by creating a full path to the current URI using the “http” string hidden in the ZK component. Similar to the Synomati function we also create a Cooper object to call its Challenge method. Note The IF condition is redundant as the parameter e is always less than 488.

Cooper’s Challenge method has a pretty basic implementation. It consists of two calls to the same subroutine:  Vgux. The first of these calls has the parameter value set to 1 and the second with the value set to 8. If we navigate to Vgux’s code we can see that its behavior is in fact dependent on these values.

If the parameter is set to 1 it calls the Open method (of our “Microsoft.XMLHTTP” object) to initialize it as a GET request and to set the URI to download. The second time round, when the parameter is 8, it will call the setRequestHandler to initialize the User-Agent field.

So when we return to SaveDataCSVToolStripMenuItem_Click we now know our “Microsoft.XMLHTTP” object is initialized with all the right values and is ready to go. Predictably the next operation is to call Send on the object which will initiate the download of the payload.

Regardless of whether the download succeeds of fails the code flow returns to the MoveSheets subroutine. As we touched on before if any failure occurs we simply retrieve the next URI in the array and repeat the process until one succeeds or we run out of URIs. Whichever happens first?

In the case of the Labs’ investigation, the first HTTP download was successful so they proceeded to call the function Assimptota6, which immediately calls PUKALA_ProjectSpeed. Again the bad guys have attempted to complicate analysis by embedding dead code but, if we disregard that, it’s clear it’s just responsible for creating file paths for the dropped payload.

The function makes use of the temporary directory path stored in PUKALA_LAKOPPC (which we in SubMui) to generate two file paths stored in the global variables: PUKALA_Project and PUKALA_ProjectBBB. Note the integer value ProjectDarvin, which is included in both file paths, signifies which URI served up the payload: 20 indicates the first URI, 22 the second, 24 for the third and so on. The actual file paths generated can be seen in the table below.

Returning to the caller Assimptota6 and we find more redundant code. The conditional branch highlighted in red can never be true as the parameter NumHoja is always 22.

When we filter out this irrelevant code we can see that this function uses the Adodb.Stream object, we created earlier, to write the payload to a file on disk. It does so by first opening a stream of binary data, populating it with the content of the download and writing that stream back to disk using the SaveToFile method.

You might have noticed that the path of the file being written to is PUKALA_ProjectBBB. So let’s pause the program just after the SaveToFile call and take a look at what was actually written to eewadro20. The contents of the file don’t appear to be a recognizable file format; it appears to just be a series of random bytes. So we can probably make an educated guess that the payload is encrypted in some manner. So let’s resume the code execution and see how the program makes use of it.

Assimptota6 finishes up by calling the similarly titled Assimptota4. As the screenshot above shows, it consists of only two lines of code. Before delving into the subroutine call on the first line we can look ahead to the second line to see if that gives us any clues to what it’s trying to do. This line of code uses the Shell.Application object; we created earlier, to run the file pitupi20.exe. Of course this file doesn’t exist yet, so we know WidthA must be responsible for creating it. Looking at the arguments passed into WidthA only strengthens this assumption as it includes:

  • the path to the file containing the encrypted payload
  • the path to the Windows executable file that will be executed
  • a string that appears to be some form of decryption key


When we jump into WidthA’s definition we can see that it reads the contents of the encrypted payload into the byte array Gbbb and later writes this array into the Windows executable file. Sandwiched between these two operations is a subroutine call to Subfunc which ominously takes our payload byte array and the decryption key as arguments. So it’s no longer a question of if it’s a decryption routine anymore it’s just a question of how does it decrypt it.

Stepping into SubFunc, Labs saw that it started by translating the decryption string “QOfPWKYMzQzNuuzBQGeax2Lkh3Y0oWEl” into an array of bytes using the VBA function StrConv. It then proceeded to perform an exclusive or (Xor) operation on each byte in the encrypted payload with the bytes in the decryption key array. Note the function Ashnorog is just a wrapper function for the expression bb Xor aa.

The diagram below shows the first 8 bytes of the encrypted payload array (at the top), the bytes in the decryption key array (in the middle) and the encrypted byte array after the Xor operation which we have renamed Decrypted Payload for readability.

In the first iteration of the loop CeLaP4 (the variable that is used to index the arrays) is set to 0. So we take the byte at index 0 of the Encrypted Payload (1c) and we Xor it with the byte at index 0 of the Decryption Key (51). The result of this operation (4D) is then written back into the encrypted payload array at index 0. The next iteration in the loop will Xor the bytes at index 1 (15 and 4F) and the result is written back to offset 1 (5A). This process continues until every byte in the array has been decrypted.

Note here that the Encrypted Payload Array is larger than the Decryption Key Array so we can’t Xor at the same offset in both arrays for every iteration. The code caters for this, however, by performing a Mod division of the index of the Encrypted Payload Array with the length of the Decryption Key Array. This means when the index reaches the last byte of the Decryption Key Array the next iteration will use the first byte in the array.

At this point, Labs let the decryption loop complete in the debugger and paused it just after WidthA has written the decrypted payload to “pitupi.exe”. Opening this file in a binary editor, Labs finally had a Windows executable payload.

Resuming the program, Assimptota4 proceeded to launch the newly decrypted payload using the Shell.Application object.

It then delivered the payload and executed it. This Windows executable now runs hidden in the background looking for files of interest and encrypting them. After a very short period of time, the inevitable ransom note and wallpaper change follows.

Enter Jaff

This ransomware calls itself Jaff and the bad news for the user is those treasured family pictures and tomorrow’s big presentation have all been renamed with .jaff extensions and their contents replaced with encrypted blobs. Chantry said:

The code analyzed in this paper is a far cry from those simple VBA downloader templates we saw at the start of the VBA boom back in September 2014. These samples conceal their strings in Form components, pollute useful code with redundant code and encrypt their payloads until the very last minute. All of this in no doubt in a bid to bypass AV detection that will look for specific strings or functions. The fact the functionality is split between so many procedures, however, and that it intermixes clean code with malicious suggested that it is also trying to prevent analysts from building a narrative when reverse engineering it.

Now what?

Just why the bad guys have decided to start hiding VBA Downloaders in PDF documents we can only speculate, but a good argument could be the tarnished reputation of Office documents as email attachments and perhaps a misguided interpretation of PDFs being somehow safer. IT administrators might by now have decided to automatically block VBA documents from entering their network, but it’s less likely they will have done so for PDFs. For want of a better analogy, it’s very much the wolf in sheep’s clothing.

Any AV vendor worth its muster can easily extract theses embedded file and this sort of attack requires the victim to have both PDF and Office software. That paired with the need for another level of social engineering means there are plenty of reasons the trend might not continue. In fact, this isn’t even the first time Labs has seen Office document malware being paired with PDF. The notorious CVE-2012-0158 vulnerability was exploited using PDF as a parent file but did so only briefly. Could VBA PDF files follow the same fate?

SophosLabs certainly isn’t betting against it.

Sophos detects the PDF and embedded Office Document as Troj/DocDl-IYE and the dropped Jaff payload as Mal/Ransom-FD. Our customers are protected.