At the end of last year, while preparing for the presentation I gave at the Virus Bulletin conference, I intentionally avoided reading other papers about PDF security by other researchers because I felt that it would confuse my talk.
Of course, this didn’t stop me being made aware – by SophosLabs colleagues and sometimes the authors themselves – of the PDF research being done by others.
If I were to recommend papers on the subject, the list would include the following:
- Didier Stevens: “Free Malicious PDF Analysis E-book“
- Rodrigo Montoro: “Scoring PDF structure to detect malicious files“
- Selvaraj and Gutierrez: “The Rise of PDF Malware“
- Sebastian Porst: “How to really obfuscate your PDF malware“
- Julia Wolf: “OMG-WTF-PDF“
Here’s a YouTube video of Julia Wolf’s presentation:
Many of the points of Julia’s presentation reinforce points in mine.
For instance, “Case study 5: Troj/PDFJs-MJ” which linked to Heuristic 2: If the objects or streams are mismatched look more closely is explained at approximately 21:00 minutes into the presentation under the slides “Stream Length and Stream Termination”.
One of the parts that piqued my interest were the slides entitled “More Than Necessary” (at 32:30 minutes in), which talks about having duplicate object names and how one object would win.
When one of our developers and I tried to write a test to demonstrate this behaviour I failed unless the xref table was invalid. So late last week I asked Julia for her test files. When I first looked at them the test files looked good and had a valid xref table.
When I passed the file along to the developer, however, he pointed out that the xref was actually invalid because the startxref was wrong.
In this case the command:
grep -bP "\d{1,5} \d obj" test.pdf
gave the following results:
10:1 0 obj
98:2 0 obj
147:3 0 obj
208:4 0 obj
400:5 0 obj
507:6 0 obj
621:7 0 obj
773:7 0 obj
and this matches the xref table:
0000000000 65535 f
0000000010 00000 n
0000000098 00000 n
0000000147 00000 n
0000000208 00000 n
0000000400 00000 n
0000000507 00000 n
0000000621 00000 n
with the proviso that the extra object isn’t referenced.
However, the startxref does not point to the start of the xref table:
startxref
773
%%EOF
from the file does not match the results of the following command:
grep -bP "\bxref\b" test.pdf
gives:
914:xref
Julia also highlighted some other issues with the PDF format that I may talk about later.
/e bangs head into desk for every time she pauses and gets lost and says Uhhhh ummmm uhhhhhhh *changes topic*