Didier Stevens

Wednesday 7 May 2008

Solving a Little PDF Puzzle

Filed under: Forensics,Malware,PDF — Didier Stevens @ 8:22

I’m quite pleased with the feedback I received for my Little PDF Puzzle, thanks all.

As promised, I’m posting the solution now, but first be sure you understand the basic structure of a PDF file.

The PDF file format supports Incremental Updates, this means that changes to an existing PDF document can be appended to the end of the file, leaving the original content intact. When the PDF file is rendered by a PDF reader, it will display the latest version, not the original content. Remember that the basic structure of a PDF file (one without incremental updates) consists of 4 parts:

  • header
  • objects
  • cross reference table
  • trailer

A PDF file with one incremental update has the following structure:

  • header
  • objects (original content)
  • cross reference table (original content)
  • trailer (original content)
  • objects (updated content)
  • cross reference table (updated content)
  • trailer (updated content)

Every object that has been modified can be found twice in the PDF file. The unmodified object is still present in the original content, and the edited version of the same object can be found in the updated content.

The cross reference table of the updated content indexes the updated objects, and the trailer of the updated content points to both cross reference tables.

When a PDF reader renders a PDF document, it starts from the end of the file. It reads the last trailer and follows the links to the root object and the cross reference tables to build the logical structure of the document it is about to render. When the reader encounters updated objects, it ignores the original versions of the same objects.

Let’s open our PDF Puzzle with a PDF reader:

And let’s also open it with Notepad:

With Notepad, it becomes clear that I’ve created a PDF document with an incremental update (original document in red, update in blue). If you delete the updated content (the blue part, or everything after the first occurrence of %%EOF), you’ve actually recovered the original version. Save it and open it with your PDF reader:

In the original PDF document, I stored the sentence “The passphrase is Incremental Updates” in indirect object 5 (to make the puzzle a bit more challenging, I used an ASCII85 encoded stream, otherwise you could just read the solution with Notepad). Next, I updated the sentence to “The passphrase is XXXXXXXXXXXXXXXXXXX” by creating a new version of object 5 and appending this at the end of the original PDF document. To finalize the updated document, I added a new cross reference table (just indexing the new version of object 5) and a new trailer (referencing the new and the old cross reference tables).

If you produce PDF documents with a PDF editor that supports incremental updates, be aware that previous versions of your document could be included in the final document, and that this could lead to information disclosure. Most office applications that support export to PDF do not use incremental updates (because they save the document in their own native format, not PDF).

If you conduct forensic investigations or do malware research, don’t limit your analysis to the final version of a PDF document. You can easily identify incrementally updated PDF documents by looking for multiple instances of cross reference tables and trailers. But don’t get confused by Linearized PDF documents, they too have more than one cross reference table and trailer (linearized PDF documents start with an indirect object sporting a /Linearized name).

You can find interesting information in the different versions included in an incremental PDF file. For example, I have a malicious PDF sample that has been created in February 2008, updated in March 2008 to add the malicious payload (it took the author about 20 minutes) and, not surprising, that this was done on a machine with the timezone set to GMT+08.

A final detail: to allow you to edit the PDF puzzle with Notepad, I produced an ASCII-only PDF file (that’s one of the reasons I used ASCII85 encoding for the stream of indirect object 5). But most PDF documents contain non-ASCII characters, so be sure to use an editor that will support this (and that won’t convert 0x0A or 0x0D to 0x0D0A).

10 Comments »

  1. […] If you want to find more about the innards of a PDF file, then see Didier’s piece Solving a Little PDF Puzzle. […]

    Pingback by PDF redaction uncovered …. again! — Thursday 4 September 2008 @ 8:45

  2. […] Filed under: Forensics, Malware, PDF — Didier Stevens @ 21:32 Ever since I read about the incremental updates feature of the PDF file format, I’ve been patiently waiting for a malicious PDF document with […]

    Pingback by Shoulder Surfing a Malicious PDF Author « Didier Stevens — Monday 10 November 2008 @ 21:33

  3. […] line mentions the number of times %%EOF appears in the document (more than once usually indicates incremental updates). “After last %%EOF” counts the number of bytes after the last %%EOF. This value will […]

    Pingback by Malformed PDF Documents « Didier Stevens — Thursday 14 May 2009 @ 7:55

  4. […] you want to make it harder to detect, use PDF obfuscation techniques. Or embed the file twice with incremental updates. First version is the file you want to hide, second version is a […]

    Pingback by Embedding and Hiding Files in PDF Documents « Didier Stevens — Wednesday 1 July 2009 @ 6:28

  5. […] you want to make it harder to detect, use PDF obfuscation techniques. Or embed the file twice with incremental updates. First version is the file you want to hide, second version is a […]

    Pingback by Embedding and Hiding Files in PDF Documents - Opsec — Wednesday 1 July 2009 @ 17:22

  6. […] you want to make it harder to detect, use PDF obfuscation techniques. Or embed the file twice with incremental updates. First version is the file you want to hide, second version is a […]

    Pingback by Embedding and Hiding Files in PDF Documents | Steve Shead Dot Com — Wednesday 1 July 2009 @ 17:29

  7. […] you want to make it harder to detect, use PDF obfuscation techniques. Or embed the file twice with incremental updates. First version is the file you want to hide, second version is a […]

    Pingback by Abusing PDFs « Security For All — Wednesday 8 July 2009 @ 21:03

  8. […] Quickpost: “Hiding” a PDF Document Filed under: Entertainment, My Software, PDF, Quickpost — Didier Stevens @ 15:00 Here’s some Python code (it uses my mPDF module) to append a new PDF document to an existing PDF document to “hide” the original document. Recovering the original is trivial, you open the PDF document with a HEX-editor and delete the appended document (starting after the second %%EOF counting from the end of the file). This trick uses incremental updates. […]

    Pingback by Quickpost: “Hiding” a PDF Document « Didier Stevens — Monday 9 November 2009 @ 15:02

  9. […] If you want to find more about the innards of a PDF file, then see Didier’s piece Solving a Little PDF Puzzle. […]

    Pingback by PDF redaction uncovered …. again! | 4x PDF Blog — Monday 1 March 2010 @ 21:51

  10. […] posts: Solving a Little PDF Puzzle, Shoulder Surfing a Malicious PDF Author, […]

    Pingback by pdftool.py: Incremental Updates – Didier Stevens Videos — Saturday 30 January 2021 @ 22:21


RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.