PDF security under the microscope: A review of OMG-WTF-PDF

PDF iconAt the end of last year, while preparing for the presentation I gave at the Virus Bulletin conference, I intentionally avoided reading other papers about PDF security by other researchers because I felt that it would confuse my talk.

Of course, this didn’t stop me being made aware – by SophosLabs colleagues and sometimes the authors themselves – of the PDF research being done by others.

If I were to recommend papers on the subject, the list would include the following:

  1. Didier Stevens: “Free Malicious PDF Analysis E-book
  2. Rodrigo Montoro: “Scoring PDF structure to detect malicious files
  3. Selvaraj and Gutierrez: “The Rise of PDF Malware
  4. Sebastian Porst: “How to really obfuscate your PDF malware
  5. Julia Wolf: “OMG-WTF-PDF

Here’s a YouTube video of Julia Wolf’s presentation:

Many of the points of Julia’s presentation reinforce points in mine.

For instance, “Case study 5: Troj/PDFJs-MJ” which linked to Heuristic 2: If the objects or streams are mismatched look more closely is explained at approximately 21:00 minutes into the presentation under the slides “Stream Length and Stream Termination”.

One of the parts that piqued my interest were the slides entitled “More Than Necessary” (at 32:30 minutes in), which talks about having duplicate object names and how one object would win.

When one of our developers and I tried to write a test to demonstrate this behaviour I failed unless the xref table was invalid. So late last week I asked Julia for her test files. When I first looked at them the test files looked good and had a valid xref table.

When I passed the file along to the developer, however, he pointed out that the xref was actually invalid because the startxref was wrong.

In this case the command:

grep -bP "\d{1,5} \d obj" test.pdf

gave the following results:

10:1 0 obj
98:2 0 obj
147:3 0 obj
208:4 0 obj
400:5 0 obj
507:6 0 obj
621:7 0 obj
773:7 0 obj

and this matches the xref table:

0000000000 65535 f
0000000010 00000 n
0000000098 00000 n
0000000147 00000 n
0000000208 00000 n
0000000400 00000 n
0000000507 00000 n
0000000621 00000 n

with the proviso that the extra object isn’t referenced.

However, the startxref does not point to the start of the xref table:


from the file does not match the results of the following command:

grep -bP "\bxref\b" test.pdf



Julia also highlighted some other issues with the PDF format that I may talk about later.