Didier Stevens

Thursday 30 May 2013

pdf-parser: Searching Inside Streams

Filed under: My Software,PDF — Didier Stevens @ 12:38

I’m giving a 2-day training on PDF at Brucon 2013. Early-bird price applies til June 15th.

This new version of pdf-parser comes with options to search inside streams. For example, you can select all objects with the word Linux inside a stream with this command:

pdf-parser.py --searchstream Linux manual.pdf

The search is not case sensitive. To make it case sensitive, use option –casesensitive. Filters are applied to streams (e.g. decompressed) before the search is performed. To search in the raw stream data, use option –unfiltered.

Regular expression searching is done with option –regex. This allows you, for example, to select objects with embedded Flash files. Flash files begin with FWS, CWS or ZWS:

pdf-parser.py --searchstream "^[FCZ]WS" --regex sample.pdf

Regular expression searching has another advantage. You can search for bytes: \xCA\xFE.

 pdf-parser_V0_4_2.zip (https)
MD5: B0C8F02358B386E7924DACB3059F8161
SHA256: E90620320AF6ED8E474B42BF6850E246446391878F87AE34DCDBD1D9945A6671

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.