Didier Stevens

Wednesday 9 April 2008

Quickpost: About the Physical and Logical Structure of PDF Files

Filed under: PDF, Quickpost — Didier Stevens @ 6:57

Here is a post to explain in detail PDF polymorphism mentioned in my BH post.

This is a simple “Hello World”-PDF viewed with a text editor:

It is composed of:

  • a header
  • a list of objects
  • a cross reference table
  • a trailer

What I describe here is the physical structure of a PDF file. The header identifies that this is a PDF file (specifying the PDF file format version), the trailer points to the cross reference table (starting at byte position 642 into the file), and the cross reference table points to each object (1 to 7) in the file (byte positions 12 through 518). The objects are ordered in the file: 1, 2, 3, 4, 5, 6 and 7.

The logical structure of a PDF file is an hierarchical structure, the root object is identified in the trailer. Object 1 is the root, object 2 and 3 are children of object 1, etc…, giving this logical structure:

The physical structure of a PDF file can be transformed into another physical structure, without changing the logical structure. Here is the same file, but now the objects are ordered from 7 to 1 (I reversed the order in which the objects appear in the file):

I also had to update the cross reference table, because each object is located at a different position now. But apart from that, nothing has changed. The root is still object 1, and the tree is the same. In other words, the logical structure of the file remained unchanged, which implies that the rendering of both PDF files is identical. Objects can appear at random positions in a PDF file without impact on the logical file structure (i.e. rendering). For this simple file, with 7 objects, I have 5020 (that’s 7!) possible physical structures, just by reordering the objects. And reordering objects is just one way to mutate the physical structure of a PDF file.

You can download both PDF files here.


Quickpost info


10 Comments »

  1. [...] add the URI action object and the OpenAction event to the hello world PDF file I used in a previous post, to build our test PDF. You can download all examples here. Opening the test PDF document launches [...]

    Pingback by PDF, Let Me Count the Ways… « Didier Stevens — Tuesday 29 April 2008 @ 6:22

  2. [...] indirect object is all I have to include in my basic PDF document to get a PoC PDF document to crash Adobe Acrobat Reader [...]

    Pingback by Quickpost: /JBIG2Decode Essentials « Didier Stevens — Monday 2 March 2009 @ 23:12

  3. hello,

    thanks for the nice description of the pdf format; one question: how to insert some text that is positioned at some angle relative to the horizontal; for example the entire text-box should be at 45 degrees …

    Comment by iovanalex — Tuesday 31 March 2009 @ 10:04

  4. I have no idea, you’ll have to look that up in the PDF reference document. I don’t have PDF expertise, only malicious PDF expertise ;-)

    Comment by Didier Stevens — Tuesday 31 March 2009 @ 10:44

  5. thanks,
    what do you mean by “pdf reference document” ? do you have some links ?

    Comment by iovanalex — Tuesday 31 March 2009 @ 17:10

  6. http://tinyurl.com/c2c7sy ;-)

    Comment by Didier Stevens — Tuesday 31 March 2009 @ 17:21

  7. [...] Malformed PDF Documents Filed under: Malware, My Software, PDF — Didier Stevens @ 7:55 For the sake of this post, I consider a PDF document malformed when it doesn’t observe the basic structure of a PDF document. [...]

    Pingback by Malformed PDF Documents « Didier Stevens — Thursday 14 May 2009 @ 7:55

  8. I am designing a tool which would extract all the comments related information from a pdf file like the creator of the comment, date and the note..
    Can ne one help me like how can i extract the comments from a pdf file.

    Comment by saurav — Thursday 14 May 2009 @ 17:35

  9. I guess you mean meta-data, the thing you see in the properties dialog of a PDF document? And not the comments reviewers add to a PDF document?

    Comment by Didier Stevens — Thursday 14 May 2009 @ 19:15

  10. [Security]2009年10月Gumblar亜種(仮)が悪用している脆弱性を調べてみた…

    2009年10月下旬に確認された Gumblaer 亜種(仮)が悪用する脆弱性は、次の 4 つが確認されています。 ・Adobe Reader の脆弱性 ・Adobe Flash Player の脆弱性 ・Microsoft Office Web コンポーネント の脆弱性 (MS09-043) ・Internet Explorer 7 の脆弱性 (MS09-002) …

    Trackback by 思い立ったら書く日記 — Sunday 25 October 2009 @ 2:34


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.