This is a bug fix version.
pdf-parser_V0_7_5.zip (https)MD5: D39E98981E6FEA48BF61CA2F78ED0B09
SHA256: 5D970AFAC501A71D4FDDEECBD63060062226BF1D587A6A74702DDA79B5C2D3FB
This is a bug fix version.
pdf-parser_V0_7_5.zip (https)This is a bug fix version
pdfid_v0_2_8.zip (https)pdftool.py is a new tool I developed. This version has only one command: iu (incremental updates).
With this command, one can check if a PDF has incremental updates, and then select different versions of this PDF with incremental updates.
A PDF with incremental updates, is a PDF that has been modified by appending changes to the document at the end of the PDF file, without modifying the original content.
Here is a video explaining incremental updates and the use of my tool.
I reference 2 blog posts in the video: “Solving a Little PDF Puzzle” and “Shoulder Surfing a Malicious PDF Author“.
pdftool_V0_0_1.zip (https)
MD5: ED2BBE886008C737CC06E22F4F0FE8A1
SHA256: 401E88FBFAEC4382A50FE59430D04FE6111F9911958AB09BA7530C26043FDA87
This is a bug fix version.
pdf-parser_V0_7_4.zip (https)
MD5: 51C6925243B91931E7FCC1E39A7209CF
SHA256: FC318841952190D51EB70DAFB0666D7D19652C8839829CC0C3871BBF7E155B6A
pdfid_v0_2_7.zip (https)
MD5: F1852F238386681C2DC40752669B455B
SHA256: FE2B59FE458ECBC1F91A40095FB1536E036BDD4B7B480907AC4E387D9ADB6E60
This is an update of my PDF tools.
There are a couple of bug fixes for pdf-parser and pdfid.
And 2 new features in pdf-parser, inspired by a private training on maldoc analysis I gave last week. I often get good ideas from my students, and sometimes, even I get a good idea in class 🙂 .
Option -o can now be used to select multiple objects: separate the indices by a comma.
There’s a new environment variable, PDFPARSER_OPTIONS, that can be used to provide extra options you want to include with each execution of pdf-parser.py. This is useful for option -O, an option to parse stream objects.
It’s actually best to always parse stream objects, i.e. always use option -O. But I decided not to make this an option that is on by default, so that the behavior of pdf-parser would remain unchanged. I consider this important for the many people that rely on a predictable behavior of pdf-parser, like teachers and students of infosec trainings where my tools are used/mentioned.
However, always including option -O is tedious and error prone. So now you can have best of both worlds, by defining an environment variable with name PDFPARSER_OPTIONS and value -O.
And finally, I started to add a man page (option -m), like I do with many of my other tools. This is a work in progress: for the moment, it points to my free PDF analysis e-book that explains the use of pdfid and pdf-parser.
pdf-parser_V0_7_3.zip (https)
MD5: 7EB1713631D255B36BC698CD2422C7EB
SHA256: D4D5AC9C26A9D8FEF65CE58A769D3F64A737860DC26606068CCDD3F04FDEA0D7
pdfid_v0_2_6.zip (https)
MD5: 9CCE332914A6C76410F04B7C35DA3155
SHA256: 95F7C91EEFB561F3F3BE9809ED339D85E7109BAA7E128EF056651EE018DBDBA0
This is a bugfix version.
pdf-parser_V0_7_2.zip (https)
MD5: 7D417F2313FF505AC96B80D80495BB78
SHA256: 3CDB98A57DAABC98382BFA361390AE3637F96852F6F078D03A7922766AE14B57
When I’m asked to perform a quick check of an online PDF document, that I expect to be benign, I will just point my PDF tools to the online document. When you provide an URL argument to pdf-parser, it will download the document and perform the analysis (without writing it to disk).
I got hold of a phishing PDF where the /URI is hiding inside a stream object (/ObjStm).
First I start the analysis with pdfid.py:
There is no /URI reported, but remark that the PDF contains 5 stream objects (/ObjStm). These can contain /URIs. In the past, I would search and decompress these stream objects with pdf-parser.py, and then pipe the result through pdfid.py, in order to detect /URIs (or other objects that require further analysis).
Since pdf-parser.py version 0.7.0, I prefer another method: using option -O to let pdf-parser.py extract and parse the objects inside stream objects.
With option -a (here combined with option -O), I can get statistics and keywords just like with pdfid:
Now I can see that there is a /URI inside the PDF (object 43).
Thus I can use option -k to get the value of /URI entries, combined with option -O to look inside stream objects:
And here I have the /URI.
Another method, is to select object 43:
From this output, we also see that object 43 is inside stream object 16.
Remark: if you use option -O on a PDF that does not contain stream objects (/ObjStm), pdf-parser will behave as if you didn’t provide this option. Hence, if you want, you can always use option -O to analyze PDFs.
This is a bug fix version for statistics (-a).
pdf-parser_V0_7_1.zip (https)
MD5: 1480D3BF602686C9E7C2FE82AC6C963B
SHA256: D2C8E0599A84127C36656AA2600F9668A3CB12EF306D28752D6D8AC436A89D1A
This new version of pdf-parser brings support for analysis of stream objects (/ObjStm). Use new option -O to enable this mode.
Stream objects (/ObjStm) are objects that contain other objects: they have a stream, containing other objects. These contained objects can not have a stream.
pdfid.py detects the presence of stream objects:
But pdfid can not look inside a stream, to figure out what objects are inside. That’s why I always say to use pdf-parser to select and decompress stream objects, and then pipe this through pdfid:
When pdf-parser parses a stream object, it does not parse the content of its stream:
This changes with this new version of pdf-parser. When option -O is used, pdf-parser extracts objects from /ObjStm streams and handles them like normal objects. In the following example, object 2 is contained in object 1:
pdf-parser provides statistics for a PDF’s content with option -a:
Combining option -a with option -O includes objects present inside stream objects (this is an alternative for combining both tools: pdf-parser -s objstm -f a.pdf | pdfid -f):
This output shows that /JavaScript can be found in object 7. We need to use option -O to find object 7 “hiding” in object 1:
If we forget to use option -O, object 7 is not found:
Here is a video showing this new feature:
pdf-parser_V0_7_0.zip (https)
MD5: CDE355BB3FCACE3C4EDBC762E632F9AB
SHA256: 219FF0BB729C4478679A79163CA9942296ACF49E4EC06D128CBC53FBEE25FF05