Didier Stevens

Thursday 28 February 2019

Update: pdf-parser.py Version 0.7.0

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This new version of pdf-parser brings support for analysis of stream objects (/ObjStm). Use new option -O to enable this mode.

Stream objects (/ObjStm) are objects that contain other objects: they have a stream, containing other objects. These contained objects can not have a stream.

pdfid.py detects the presence of stream objects:

But pdfid can not look inside a stream, to figure out what objects are inside. That’s why I always say to use pdf-parser to select and decompress stream objects, and then pipe this through pdfid:

When pdf-parser parses a stream object, it does not parse the content of its stream:

This changes with this new version of pdf-parser. When option -O is used, pdf-parser extracts objects from /ObjStm streams and handles them like normal objects. In the following example, object 2 is contained in object 1:

pdf-parser provides statistics for a PDF’s content with option -a:

Combining option -a with option -O includes objects present inside stream objects (this is an alternative for combining both tools: pdf-parser -s objstm -f a.pdf | pdfid -f):

This output shows that /JavaScript can be found in object 7. We need to use option -O to find object 7 “hiding” in object 1:

If we forget to use option -O, object 7 is not found:

Here is a video showing this new feature:

pdf-parser_V0_7_0.zip (https)
SHA256: 219FF0BB729C4478679A79163CA9942296ACF49E4EC06D128CBC53FBEE25FF05


  1. […] Update: pdf-parser.py Version 0.7.0 […]

    Pingback by Title: Overview of Content Published in February | Didier Stevens — Saturday 2 March 2019 @ 0:00

  2. […] Update: pdf-parser.py Version 0.7.0  […]

    Pingback by Week 9 – 2019 – This Week In 4n6 — Monday 4 March 2019 @ 7:33

  3. […] pdf-parser.py version 0.7.0, I prefer another method: using option -O to let pdf-parser.py extract and parse the objects inside […]

    Pingback by Analyzing a Phishing PDF with /ObjStm | Didier Stevens — Thursday 7 March 2019 @ 0:00

  4. […] There’s a new environment variable, PDFPARSER_OPTIONS, that can be used to provide extra options you want to include with each execution of pdf-parser.py. This is useful for option -O, an option to parse stream objects. […]

    Pingback by Update Of My PDF Tools | Didier Stevens — Monday 30 September 2019 @ 19:16

RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.