Didier Stevens

Thursday 30 May 2013

pdf-parser: Searching Inside Streams

Filed under: My Software,PDF — Didier Stevens @ 12:38

I’m giving a 2-day training on PDF at Brucon 2013. Early-bird price applies til June 15th.

This new version of pdf-parser comes with options to search inside streams. For example, you can select all objects with the word Linux inside a stream with this command:

pdf-parser.py --searchstream Linux manual.pdf

The search is not case sensitive. To make it case sensitive, use option –casesensitive. Filters are applied to streams (e.g. decompressed) before the search is performed. To search in the raw stream data, use option –unfiltered.

Regular expression searching is done with option –regex. This allows you, for example, to select objects with embedded Flash files. Flash files begin with FWS, CWS or ZWS:

pdf-parser.py --searchstream "^[FCZ]WS" --regex sample.pdf

Regular expression searching has another advantage. You can search for bytes: \xCA\xFE.

 pdf-parser_V0_4_2.zip (https)
MD5: B0C8F02358B386E7924DACB3059F8161
SHA256: E90620320AF6ED8E474B42BF6850E246446391878F87AE34DCDBD1D9945A6671

Blog at WordPress.com.