Security experts are playing a game of cat-and-mouse game with malware authors who are continually looking for ways to bypass detection by anti-malware products.
As regular readers of Naked Security will know, one commonly-seen method of distributing malware is to embed an attack inside a malformed PDF. And, one way to hide code inside a malicious PDF is to use filters.
Filters are used by PDFs to compress or store data to either make the file smaller (Flate, CCITTFax) or allow it to be read as text (ASCIIHex).
By combining the filters in weird ways the malware author hopes to bypass detection by malware scanners and deliver a malicious payload to the victim.
Last April, we saw some PDF malware using /DecodeParams filter to obfuscate malicious code.
When I saw it I knew we would see more PDF malware using image filters to obfuscate malicious payloads.
Sadly, that prediction appears to have become true.
As you can see below, the stream embedded in this sample is encoded. You can see the use of
/Filter, which indicates that the data in that stream is encoded by one or more filters (shown in square brackets).
In the case of this sample, the filters in use are:
ASCIIHex (again) and
/Filter [/ASCIIHexDecode /CCITTFaxDecode /ASCIIHexDecode /FlateDecode] /DecodeParms [ null << /Columns 28176 /Rows 1 >> ]
According to Adobe documentation (PDF 32000-1:2008):
“The ASCIIHexDecode filter decodes data that has been encoded in ASCII hexadecimal form.”
“The CCITTFaxDecode filter decodes image data that has been encoded using either Group 3 or Group 4 CCITT facsimile (fax) encoding.”
“The Flate method is based on the public-domain zlib/deflate compression method, …”
Of the three filters used in this sample, only CCITT has parameters that allow it to be controlled. In this case:
Nullmeans that this is a Group 3 1-D encoding and that the image is on one row with 28176 elements
- CCITT Group 3 1-D encoding is a variation on a Huffman encoding scheme where the image is split into 1-bit white and black pixel runs (white run length and black run length written below as wrl and brl respectively)
- each run length has a tailored Huffman encoding
To illustrate how to decode the encoded stream, I am going to use just the start of the encoded stream:
This would be converted to hex values and then decoded by the CCITT decoder. As CCITT is a bit-based encoding stream (rather than byte-based), we must convert the above string to binary:
0000 0000 0001 0011 0101 1101 1101 0100 0111 0000 0110 0011 1000 1110 0011 0001 1101 0011 1100 1111
We can break this down into specific segments:
0000 0000 0001: the EOL marker and this should start the encoded data.
0011 0101: the code for a wrl of 0.
11: the code for a brl of 2.
0111: the code for a wrl of 2.
010: the code for a brl of 1.
1000: the code for a wrl of 3.
So, we can write these bytes as:
We can write this in two ways, depending upon whether we consider b as 1 or 0:
- b as 1 :
- b as 0 :
Needless to say I chose the wrong one when I first implemented it. The correct version is the second :)
So, the CCITT stream decodes to:
78 9c ed d3 f9 ...
Which when run through a ASCII hex decoder (which ignores spaces), produces:
Those familiar with PDF files will recognise that this looks like the start of a Flate encoded stream.
As ever, SophosLabs recommends that you make sure you are on the latest version of Adobe software.Follow @SophosLabs