The second feature in this new version of PDFiD is selection. With this, you can select PDFs using criteria you provide.
Example:
pdfid.py -S “pdf.javascript.count > 0” *.pdf
This command will select all files with extension .pdf in the current directory that are PDFs and have a /JavaScript count larger than zero. The selection expression you provide is a Python expression. Here is a list off attributes to use in your selection expressions:
pdf.version pdf.filename pdf.errorOccured pdf.errorMessage pdf.isPDF pdf.header pdf.keywords[keywordname].count pdf.keywords[keywordname].hexcode pdf.keywords['/AA'].count pdf.keywords['/Root'].count # if option -a and if /Root present in PDF pdf.obj.count pdf.obj.hexcode pdf.endobj.count pdf.endobj.hexcode pdf.stream.count pdf.stream.hexcode pdf.endstream.count pdf.endstream.hexcode pdf.xref.count pdf.xref.hexcode pdf.trailer.count pdf.trailer.hexcode pdf.startxref.count pdf.startxref.hexcode pdf.page.count pdf.page.hexcode pdf.encrypt.count pdf.encrypt.hexcode pdf.objstm.count pdf.objstm.hexcode pdf.js.count pdf.js.hexcode pdf.javascript.count pdf.javascript.hexcode pdf.aa.count pdf.aa.hexcode pdf.openaction.count pdf.openaction.hexcode pdf.acroform.count pdf.acroform.hexcode pdf.jbig2decode.count pdf.jbig2decode.hexcode pdf.richmedia.count pdf.richmedia.hexcode pdf.launch.count pdf.launch.hexcode pdf.embeddedfile.count pdf.embeddedfile.hexcode pdf.xfa.count pdf.xfa.hexcode pdf.colors_gt_2_24.count pdf.colors_gt_2_24.hexcode
Be careful if you are going to use this in an automated scenario where you don’t control the selection expression. This expression is evaluated in Python with the eval function, and there is no input validation.
Leave a Reply (comments are moderated)