Didier Stevens

Saturday 23 March 2019

Quickpost: PDF Tools Download Feature

Filed under: My Software,PDF,Quickpost — Didier Stevens @ 9:34

When I’m asked to perform a quick check of an online PDF document, that I expect to be benign, I will just point my PDF tools to the online document. When you provide an URL argument to pdf-parser, it will download the document and perform the analysis (without writing it to disk).

Quickpost info


Thursday 7 March 2019

Analyzing a Phishing PDF with /ObjStm

Filed under: maldoc,Malware,My Software,PDF — Didier Stevens @ 0:00

I got hold of a phishing PDF where the /URI is hiding inside a stream object (/ObjStm).

First I start the analysis with pdfid.py:

There is no /URI reported, but remark that the PDF contains 5 stream objects (/ObjStm). These can contain /URIs. In the past, I would search and decompress these stream objects with pdf-parser.py, and then pipe the result through pdfid.py, in order to detect /URIs (or other objects that require further analysis).

Since pdf-parser.py version 0.7.0, I prefer another method: using option -O to let pdf-parser.py extract and parse the objects inside stream objects.

With option -a (here combined with option -O), I can get statistics and keywords just like with pdfid:

Now I can see that there is a /URI inside the PDF (object 43).

Thus I can use option -k to get the value of /URI entries, combined with option -O to look inside stream objects:

And here I have the /URI.

Another method, is to select object 43:

From this output, we also see that object 43 is inside stream object 16.

Remark: if you use option -O on a PDF that does not contain stream objects (/ObjStm), pdf-parser will behave as if you didn’t provide this option. Hence, if you want, you can always use option -O to analyze PDFs.

Wednesday 6 March 2019

Update: pdf-parser.py Version 0.7.1

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This is a bug fix version for statistics (-a).

pdf-parser_V0_7_1.zip (https)
MD5: 1480D3BF602686C9E7C2FE82AC6C963B
SHA256: D2C8E0599A84127C36656AA2600F9668A3CB12EF306D28752D6D8AC436A89D1A

Thursday 28 February 2019

Update: pdf-parser.py Version 0.7.0

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This new version of pdf-parser brings support for analysis of stream objects (/ObjStm). Use new option -O to enable this mode.

Stream objects (/ObjStm) are objects that contain other objects: they have a stream, containing other objects. These contained objects can not have a stream.

pdfid.py detects the presence of stream objects:

But pdfid can not look inside a stream, to figure out what objects are inside. That’s why I always say to use pdf-parser to select and decompress stream objects, and then pipe this through pdfid:

When pdf-parser parses a stream object, it does not parse the content of its stream:

This changes with this new version of pdf-parser. When option -O is used, pdf-parser extracts objects from /ObjStm streams and handles them like normal objects. In the following example, object 2 is contained in object 1:

pdf-parser provides statistics for a PDF’s content with option -a:

Combining option -a with option -O includes objects present inside stream objects (this is an alternative for combining both tools: pdf-parser -s objstm -f a.pdf | pdfid -f):

This output shows that /JavaScript can be found in object 7. We need to use option -O to find object 7 “hiding” in object 1:

If we forget to use option -O, object 7 is not found:

Here is a video showing this new feature:

pdf-parser_V0_7_0.zip (https)
SHA256: 219FF0BB729C4478679A79163CA9942296ACF49E4EC06D128CBC53FBEE25FF05

Tuesday 23 October 2018

Update: pdf-parser.py Version 0.6.9

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This new version of pdf-parser.py brings 2 new features; the idea came to me during private & public trainings I gave on malicious documents (if you are interested in a training, please get in touch).

The statistics option (-a –stats) has been enhanced with a search for keywords section:

In this section, the result of searches for particular keywords (that might indicate a malicious PDF) is displayed: you get the number of hits followed by the indices of the objects that contain this keyword.

In the example above, we see that object 11 contains JavaScript.

Remark that this section is the result of a search command (-s): search in pdf-parser is not case-senstive and partial (unlike PDFiD). That explains why /AA is found in object 37, while it’s actually /Aacute:

pdf-parser will also read file pdfid.ini (if present) so that the personal keywords you added to PDFiD are also used by pdf-parser.

–overridingfilters is a new option: it allows for the processing of streams with a different filter (or filter chain) than the one specified in the object’s dictionary. Use value raw to obtain the raw stream, without filtering.

pdf-parser_V0_6_9.zip (https)
MD5: 27D65A96FEAF157360ACBBAAB9748D27
SHA256: 3F102595B9EAE5842A1B4723EF965344AE3AB01F90D85ECA96E9678A6C7092B7

Friday 3 August 2018

Update: PDFiD.py Version 0.2.5

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

It’s the second time now that a friend reports to me that PDFiD produces no output at all when a pdf is analyzed.

In both cases, the filename was something like sample[1].pdf (a file you could find in Internet Explorer’s cache, for example).

PDFiD can process multiple files, and accepts UNIX shell-style wildcards. Not only * and ?, but also []. So with a filename like sample[1].pdf, PDFiD is actually looking for a file with filename sample1.pdf. Which it doesn’t find, and thus produces no output.

About two years ago, when first a friend reported this, I added option -l –literal. If you use this option, then PDFiD will do no wildcard expansion, and will thus find file sample[1].pdf.

Recently, another friend had the same problem. And was not aware of the existence of option -l.

This new version of PDFiD will display a warning when you use wildcard characters in filenames (without option -l) and when no files match. Like this:

I also renamed option –literal to –literalfilenames, to be consistent across my tools.

pdfid_v0_2_5.zip (https)
MD5: 9B835D9E934A7AA7E68C3649A7AA5DAF
SHA256: 4DD43D7BDA885C5A579FC1F797E93A536E1DB5A4AB52A9337759A69D3B0250E0

Thursday 31 May 2018

PDFiD: GoToE and GoToR Detection (“NTLM Credential Theft”)

Filed under: My Software,PDF — Didier Stevens @ 0:00

The article “NTLM Credentials Theft via PDF Files” explains how PDF documents can refer to a resource via UNC paths. This is done using  PDF names /GoToE or /GoToR.

My tool pdfid.py can now be extended to report /GoToE and /GoToR usage in a PDF file, without having to change the source code. You just have to edit the pdfid.ini file (or create it) to include these names, like this:


Using pdfid configured like this on a “credential stealing PDF” gives the following result:

pdfid.ini has to be located in the same directory as pdfid.py. And remember that names in the PDF language are case-sensitive.


Friday 29 December 2017

Cracking Encrypted PDFs – Conclusion

Filed under: Encryption,Forensics,Hacking,PDF — Didier Stevens @ 0:00

TL;DR: PDFs protected with 40-bit keys can not guarantee confidentiality, even with strong passwords. When you protect your PDFs with a password, you have to encrypt your PDFs with strong passwords and use long enough keys. The PDF specification has evolved over time, and with it, the encryption options you have. There are many encryption options today, you are no longer restricted to 40-bit keys. You can use 128-bit or 256-bit keys too.

There is a trade-off too: the more advanced encryption option you use, the more recent the PDF reader must be to support the encryption option you selected. Older PDF readers are not able to handle 256-bit AES for example.

Since each application capable of creating PDFs will have different options and descriptions for encryption, I can not tell you what options to use for your particular application. There are just too many different applications and versions. But if you are not sure if you selected an encryption option that will use long enough keys, you can always check the /Encrypt dictionary of the PDF you created, for example with my pdf-parser (in this example /Length 128 tells us a 128-bit key is used):

Or you can use QPDF to encrypt an existing PDF (I’ll publish a blog post later with encryption examples for QPDF).

But don’t use 40-bit keys, unless confidentiality is not important to you:

I first showed (almost 4 years ago) how PDFs with 40-bit keys can be decrypted in minutes, using a commercial tool with rainbow tables. This video illustrates this.

Later I showed how this can be done with free, open source tools: Hashcat and John the Ripper. But although I could recover the encryption key using Hashcat, I still had to use a commercial tool to do the actual decryption with the key recovered by Hashcat.

Today, this is no longer the case: in this series of blog posts, I show how to recover the password, how to recover the key and how to decrypt with the key, all with free, open source tools.

Overview of the complete blog post series:


Thursday 28 December 2017

Cracking Encrypted PDFs – Part 3

Filed under: Encryption,Forensics,Hacking,PDF — Didier Stevens @ 0:00

I performed a brute-force attack on the password of an encrypted PDF and a brute-force attack on the key of (another) encrypted PDF, both PDFs are part of a challenge published by John August.

The encryption key is derived from the password. it’s not just based on the password only, but also on metadata. This implies that different PDFs encrypted with the same user password, will have different encryption keys.

When you recover the user password of an encrypted PDF, you can just use it with PDF readers like Adobe Reader: they will ask you for the password, you provide it and the PDF will be decrypted and rendered.

But when you recover the key of an encrypted PDF, you can not use it with PDF reader: there is no feature that will allow you to input a key in stead of a password. The only method I knew to decrypt a PDF document with its encryption key, was to use Elcomsoft’s PDF cracking tool:

Now I worked out a second method: I modified the source code of QPDF so that it will accept encryption keys too. It’s a quick and dirty hack, I did not add a new option to QPDF but I “hijacked” the –password option. If the value to the option –password starts with string “key:”, then QPDF will not derive the key from the provided password, but it will use the key provided as hexadecimal characters. Here is how I use it to decrypt the “tough” PDF:

I also made a small modification to the –show-encryption option, to display the encryption key:

Update: I had an email exchange with Jay Berkenbilt, the author of QPDF, and he will look into this patch and possibly add a new key option to QPDF.

If you are interested in my modified version of QPDF, you can find the modified source code files and Windows binaries here:

qpdf-patched.zip (https)
MD5: 57E1A5A232E12B45D0A927181A1E8C3B
SHA256: 6F17E095B38AE72F229A6662216DDCE86057D2BA1C567B07FEF78B8A93413495

Update: this is the complete blog post series:

Wednesday 27 December 2017

Cracking Encrypted PDFs – Part 2

Filed under: Encryption,Forensics,Hacking,PDF — Didier Stevens @ 0:00

After cracking the “easy” PDF of John’s challenge, I’m cracking the “tough” PDF (harder_encryption).

Using the same steps as for the “easy” PDF, I confirm the PDF is encrypted with a user password using 40-bit encryption, and I extract the hash.

Since the password is a long random password, a brute-force attack on the password like I did in the first part will take too long. That’s why I’m going to perform a brute-force attack on the key: using 40-bit encryption means that the key is just 5 bytes long, and that will take about 2 hours on my machine. The key is derived from the password.

I’m using hashcat again, but this time with hash mode 10410 in stead of 10400.
This is the command I’m using:

hashcat-4.0.0\hashcat64.exe --potfile-path=harder_encryption.pot -m 10410 -a 3 -w 3 "harder_encryption - CONFIDENTIAL.hash" ?b?b?b?b?b

I’m using the following options:

  • –potfile-path=harder_encryption.pot : I prefer using a dedicated pot file, but this is optional
  • -m 10410 : this hash mode is suitable to crack the key used for 40-bit PDF encryption
  • -a 3 : I perform a brute force attack (since it’s a key, not a password)
  • -w 3 : I’m using a workload profile that is supposed to speed up cracking on my machine
  • ?b?b?b?b?b : I’m providing a mask for 5 bytes (I want to brute-force keys that are 40 bits long, i.e. 5 bytes)

And here is the result:

The recovered key is 27ce78c81a. I was lucky, it took about 15 minutes to recover this key (again, using GPU GeForce GTX 980M, 2048/8192 MB allocatable, 12MCU). Checking the complete keyspace whould take a bit more than 2 hours.

Now, how can we decrypt a PDF with the key (in stead of the password)? I’ll explain that in the next blog post.

Want a hint? Take a look at my Tweet!

Update: this is the complete blog post series:

Next Page »

Blog at WordPress.com.