Didier Stevens

Sunday 31 January 2021

New Tool: pdftool.py

Filed under: My Software,PDF — Didier Stevens @ 0:00

pdftool.py is a new tool I developed. This version has only one command: iu (incremental updates).

With this command, one can check if a PDF has incremental updates, and then select different versions of this PDF with incremental updates.

A PDF with incremental updates, is a PDF that has been modified by appending changes to the document at the end of the PDF file, without modifying the original content.

Here is a video explaining incremental updates and the use of my tool.

I reference 2 blog posts in the video: “Solving a Little PDF Puzzle” and “Shoulder Surfing a Malicious PDF Author“.

pdftool_V0_0_1.zip (https)
MD5: ED2BBE886008C737CC06E22F4F0FE8A1
SHA256: 401E88FBFAEC4382A50FE59430D04FE6111F9911958AB09BA7530C26043FDA87

Sunday 29 December 2019

Update: pdf-parser.py Version 0.7.4 and pdfid.py Version 0.2.7

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This is a bug fix version.

pdf-parser_V0_7_4.zip (https)
MD5: 51C6925243B91931E7FCC1E39A7209CF
SHA256: FC318841952190D51EB70DAFB0666D7D19652C8839829CC0C3871BBF7E155B6A

pdfid_v0_2_7.zip (https)
MD5: F1852F238386681C2DC40752669B455B
SHA256: FE2B59FE458ECBC1F91A40095FB1536E036BDD4B7B480907AC4E387D9ADB6E60

Monday 30 September 2019

Update Of My PDF Tools

Filed under: maldoc,Malware,My Software,PDF,Update — Didier Stevens @ 19:16

This is an update of my PDF tools.

There are a couple of bug fixes for pdf-parser and pdfid.

And 2 new features in pdf-parser, inspired by a private training on maldoc analysis I gave last week. I often get good ideas from my students, and sometimes, even I get a good idea in class 🙂 .

Option -o can now be used to select multiple objects: separate the indices by a comma.

There’s a new environment variable, PDFPARSER_OPTIONS, that can be used to provide extra options you want to include with each execution of pdf-parser.py. This is useful for option -O, an option to parse stream objects.

It’s actually best to always parse stream objects, i.e. always use option -O. But I decided not to make this an option that is on by default, so that the behavior of pdf-parser would remain unchanged. I consider this important for the many people that rely on a predictable behavior of pdf-parser, like teachers and students of infosec trainings where my tools are used/mentioned.

However, always including option -O is tedious and error prone. So now you can have best of both worlds, by defining an environment variable with name PDFPARSER_OPTIONS and value -O.

And finally, I started to add a man page (option -m), like I do with many of my other tools. This is a work in progress: for the moment, it points to my free PDF analysis e-book that explains the use of pdfid and pdf-parser.

pdf-parser_V0_7_3.zip (https)
MD5: 7EB1713631D255B36BC698CD2422C7EB
SHA256: D4D5AC9C26A9D8FEF65CE58A769D3F64A737860DC26606068CCDD3F04FDEA0D7

pdfid_v0_2_6.zip (https)
MD5: 9CCE332914A6C76410F04B7C35DA3155
SHA256: 95F7C91EEFB561F3F3BE9809ED339D85E7109BAA7E128EF056651EE018DBDBA0

Tuesday 6 August 2019

Update: pdf-parser.py Version 0.7.2

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This is a bugfix version.

pdf-parser_V0_7_2.zip (https)
MD5: 7D417F2313FF505AC96B80D80495BB78
SHA256: 3CDB98A57DAABC98382BFA361390AE3637F96852F6F078D03A7922766AE14B57

Saturday 23 March 2019

Quickpost: PDF Tools Download Feature

Filed under: My Software,PDF,Quickpost — Didier Stevens @ 9:34

When I’m asked to perform a quick check of an online PDF document, that I expect to be benign, I will just point my PDF tools to the online document. When you provide an URL argument to pdf-parser, it will download the document and perform the analysis (without writing it to disk).

Quickpost info


Thursday 7 March 2019

Analyzing a Phishing PDF with /ObjStm

Filed under: maldoc,Malware,My Software,PDF — Didier Stevens @ 0:00

I got hold of a phishing PDF where the /URI is hiding inside a stream object (/ObjStm).

First I start the analysis with pdfid.py:

There is no /URI reported, but remark that the PDF contains 5 stream objects (/ObjStm). These can contain /URIs. In the past, I would search and decompress these stream objects with pdf-parser.py, and then pipe the result through pdfid.py, in order to detect /URIs (or other objects that require further analysis).

Since pdf-parser.py version 0.7.0, I prefer another method: using option -O to let pdf-parser.py extract and parse the objects inside stream objects.

With option -a (here combined with option -O), I can get statistics and keywords just like with pdfid:

Now I can see that there is a /URI inside the PDF (object 43).

Thus I can use option -k to get the value of /URI entries, combined with option -O to look inside stream objects:

And here I have the /URI.

Another method, is to select object 43:

From this output, we also see that object 43 is inside stream object 16.

Remark: if you use option -O on a PDF that does not contain stream objects (/ObjStm), pdf-parser will behave as if you didn’t provide this option. Hence, if you want, you can always use option -O to analyze PDFs.

Wednesday 6 March 2019

Update: pdf-parser.py Version 0.7.1

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This is a bug fix version for statistics (-a).

pdf-parser_V0_7_1.zip (https)
MD5: 1480D3BF602686C9E7C2FE82AC6C963B
SHA256: D2C8E0599A84127C36656AA2600F9668A3CB12EF306D28752D6D8AC436A89D1A

Thursday 28 February 2019

Update: pdf-parser.py Version 0.7.0

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This new version of pdf-parser brings support for analysis of stream objects (/ObjStm). Use new option -O to enable this mode.

Stream objects (/ObjStm) are objects that contain other objects: they have a stream, containing other objects. These contained objects can not have a stream.

pdfid.py detects the presence of stream objects:

But pdfid can not look inside a stream, to figure out what objects are inside. That’s why I always say to use pdf-parser to select and decompress stream objects, and then pipe this through pdfid:

When pdf-parser parses a stream object, it does not parse the content of its stream:

This changes with this new version of pdf-parser. When option -O is used, pdf-parser extracts objects from /ObjStm streams and handles them like normal objects. In the following example, object 2 is contained in object 1:

pdf-parser provides statistics for a PDF’s content with option -a:

Combining option -a with option -O includes objects present inside stream objects (this is an alternative for combining both tools: pdf-parser -s objstm -f a.pdf | pdfid -f):

This output shows that /JavaScript can be found in object 7. We need to use option -O to find object 7 “hiding” in object 1:

If we forget to use option -O, object 7 is not found:

Here is a video showing this new feature:

pdf-parser_V0_7_0.zip (https)
SHA256: 219FF0BB729C4478679A79163CA9942296ACF49E4EC06D128CBC53FBEE25FF05

Tuesday 23 October 2018

Update: pdf-parser.py Version 0.6.9

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This new version of pdf-parser.py brings 2 new features; the idea came to me during private & public trainings I gave on malicious documents (if you are interested in a training, please get in touch).

The statistics option (-a –stats) has been enhanced with a search for keywords section:

In this section, the result of searches for particular keywords (that might indicate a malicious PDF) is displayed: you get the number of hits followed by the indices of the objects that contain this keyword.

In the example above, we see that object 11 contains JavaScript.

Remark that this section is the result of a search command (-s): search in pdf-parser is not case-senstive and partial (unlike PDFiD). That explains why /AA is found in object 37, while it’s actually /Aacute:

pdf-parser will also read file pdfid.ini (if present) so that the personal keywords you added to PDFiD are also used by pdf-parser.

–overridingfilters is a new option: it allows for the processing of streams with a different filter (or filter chain) than the one specified in the object’s dictionary. Use value raw to obtain the raw stream, without filtering.

pdf-parser_V0_6_9.zip (https)
MD5: 27D65A96FEAF157360ACBBAAB9748D27
SHA256: 3F102595B9EAE5842A1B4723EF965344AE3AB01F90D85ECA96E9678A6C7092B7

Friday 3 August 2018

Update: PDFiD.py Version 0.2.5

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

It’s the second time now that a friend reports to me that PDFiD produces no output at all when a pdf is analyzed.

In both cases, the filename was something like sample[1].pdf (a file you could find in Internet Explorer’s cache, for example).

PDFiD can process multiple files, and accepts UNIX shell-style wildcards. Not only * and ?, but also []. So with a filename like sample[1].pdf, PDFiD is actually looking for a file with filename sample1.pdf. Which it doesn’t find, and thus produces no output.

About two years ago, when first a friend reported this, I added option -l –literal. If you use this option, then PDFiD will do no wildcard expansion, and will thus find file sample[1].pdf.

Recently, another friend had the same problem. And was not aware of the existence of option -l.

This new version of PDFiD will display a warning when you use wildcard characters in filenames (without option -l) and when no files match. Like this:

I also renamed option –literal to –literalfilenames, to be consistent across my tools.

pdfid_v0_2_5.zip (https)
MD5: 9B835D9E934A7AA7E68C3649A7AA5DAF
SHA256: 4DD43D7BDA885C5A579FC1F797E93A536E1DB5A4AB52A9337759A69D3B0250E0

Next Page »

Blog at WordPress.com.