I’m giving a 2-day training on PDF at Brucon 2013. Early-bird price applies til June 15th.
This new version of pdf-parser comes with options to search inside streams. For example, you can select all objects with the word Linux inside a stream with this command:
pdf-parser.py --searchstream Linux manual.pdf
The search is not case sensitive. To make it case sensitive, use option –casesensitive. Filters are applied to streams (e.g. decompressed) before the search is performed. To search in the raw stream data, use option –unfiltered.
Regular expression searching is done with option –regex. This allows you, for example, to select objects with embedded Flash files. Flash files begin with FWS, CWS or ZWS:
pdf-parser.py --searchstream "^[FCZ]WS" --regex sample.pdf
Regular expression searching has another advantage. You can search for bytes: \xCA\xFE.