Here’s a heads up for some malicious PDF samples that are deliberately malformed to avoid detection.
The most important case is the missing endobj keyword:
Adobe Reader will happily parse a PDF where the object are not terminated with endobj, but my pdf-parser won’t. I’ll have to update the parser to deal with this case.
The cross-reference table can also be omitted:
This is not an issue for my parser.
And then I also received a sample with a stream object, where the case of the endstream object was wrong: Endstream. First we assumed Adobe Reader was not case-sensitive for the endstream keyword, but I found out it can actually parse a stream object with missing endstream keyword:
This is an issue for my parser.
In the case where endobj is missing, what defines the end of the object?
Comment by Bryan — Friday 21 May 2010 @ 15:13
@Bryan As Adobe Reader can’t render a PDF with omitted endobj and XREF table, I assume it uses the XREF table to calculate the size of the objects.
Comment by Didier Stevens — Monday 24 May 2010 @ 8:12
[…] you used my pdf-parser, you’ve also encountered a problem. The objects lack the endobj keyword. A simple solution: add the missing keyword and extract the stream with my parser. The stream is […]
Pingback by Solving the Win7 Puzzle « Didier Stevens — Friday 25 June 2010 @ 9:39
[…] video) 2010-04-22: Will there be new viruses exploiting /Launch vulnerability in PDF? 2010-05-18: Quickpost: More Malformed PDFs 2010-06-08: Analysis of a Zero-day Exploit for Adobe Flash and Reader (CVE-2010-1297) […]
Pingback by Security PDF-related links in 2010: analyses and tools — Wednesday 10 August 2011 @ 0:55