Didier Stevens

Wednesday 28 December 2016

Update: pdf-parser Version 0.6.7

Filed under: My Software,PDF,Update — Didier Stevens @ 12:03

I added option -k to search for keys in dictionaries. A usage example can be found in blog post “PDF Analysis: Back To Basics“.

pdf-parser_V0_6_7.zip (https)
MD5: D04D7DA42F3263139BC2C7E7B2621C91
SHA256: ED863DE952A5096FF4BE0825110D2726BA1BE75A7A6717AF0E6A153B843E3B78

Saturday 30 July 2016

Bugfix: pdf-parser Version 0.6.5

Filed under: My Software,PDF,Update — Didier Stevens @ 16:19

This is a bugfix for pdf-parser. Streams were not properly extracted when they started with whitespace after the normal whitespace following the stream keyword.

pdf-parser_V0_6_5.zip (https)
MD5: 7F0880EB8A954979CA0ADAB2087E1C55
SHA256: E7D2CCA12CC43D626C53873CFF0BC0CE2875330FD5DBC8FB23B07396382DCC85

Tuesday 7 June 2016

Recovering A Ransomed PDF

Filed under: PDF — Didier Stevens @ 0:00

I was contacted to help with a PDF file encrypted by ransomware. Just like another case I helped with, the file was not completely encrypted. The file had parts with low entropy, as byte-stats.py shows:


Searching for endobj, I noticed the file contained PDF objects:


So I stripped the beginning of the file that was encrypted:


This file can be parsed by pdf-parser. Now I’m going to try to rebuild this PDF. First I check if it contains an object referencing all pages:


As you can see, it doesn’t. So I will add the missing objects:


Object 2 (the missing /Pages object) needs to reference all pages still present in the document (/Kids list). I make a list of all /Page objects with the following command:


And then I update object 2 /Pages with the 87 /Page objects I found (dictionary entries /Kids and /Count):


When I open this PDF with a PDF reader, I get 87 pages. All of them are blank, except the last one:


The pages are blank because of missing fonts definitions:


I add some generic font definitions:


This gives me the following PDF:


AS you can see, not all text is readable, that’s because I did not select the right font. Some trial and error with different fonts would allow me to further recover the document.

This method can also help you with corrupt PDF documents. Of course, this is not a complete recovery. We miss the first pages that were encrypted.

Monday 21 September 2015

PDF + DOC + VBAs Videos

Filed under: Malware,PDF — Didier Stevens @ 10:46

I produced videos showing how I created my “Test File: PDF With Embedded DOC Dropping EICAR” and how to change the settings in Adobe Reader to mitigate this.

Friday 28 August 2015

Test File: PDF With Embedded DOC Dropping EICAR

Filed under: PDF — Didier Stevens @ 9:30

Over at the SANS ISC diary I wrote a diary entry on the analysis of a PDF file that contains a malicious DOC file.

For testing purposes, I created a PDF file that contains a DOC file that drops the EICAR test file.

The PDF file contains JavaScript that extracts and opens the DOC file (with user approval). The DOC file contains a VBA script that executes upon opening of the file, and writes the EICAR test file to a temporary file in the %TEMP% folder.


You can download the PDF file here. It is in a password protected ZIP file. The password is eicardropper, with eicar written in uppercase: EICAR.

This will generate an anti-virus alert. Use at your own risk, with approval.
pdf-doc-vba-eicar-dropper.zip (https)
MD5: 65928D03CDF37FEDD7C99C33240CD196
SHA256: 48258AEC3786CB9BA032CD09DB09DC66E0EC8AA19677C299678A473895E79369

Thursday 13 August 2015

Update: pdf-parser Version 0.6.4

Filed under: Malware,My Software,PDF,Update — Didier Stevens @ 0:00

In this new version of pdf-parser, option -H will now also calculate the MD5 hashes of the unfiltered and filtered stream of selected objects, and also dump the first 16 bytes. I needed this to analyze a malicious PDF that embeds a .docm file.


As you can see in this screenshot, the embedded file is a ZIP file (PK). .docm files are actually ZIP files.

pdf-parser_V0_6_4.zip (https)
MD5: 47A4C70AA281E1E80A816371249DCBD6
SHA256: EC8E64E3A74FCCDB7828B8ECC07A2C33B701052D52C43C549115DDCD6F0F02FE

Wednesday 29 April 2015

pdf-parser: A Method To Manipulate PDFs Part 2

Filed under: My Software,PDF — Didier Stevens @ 0:00

I provide 2 days of Hacking PDF training at HITB Amsterdam. This is one of the methods I teach.

Maarten Van Horenbeeck posted a diary entry (July 2008) explaining how scripts and data are stored in PDF documents (using streams), and demonstrated a Perl script to decompress streams. A couple of months before, I had started developing my pdf-parser tool, and Maarten’s diary entry motivated me to continue adding features to pdf-parser.

Extracting and decompressing a stream (for example containing a JavaScript script) is easy with pdf-parser. You select the object that contains the stream (example object 5: -o 5) and you “filter” the content of the stream (-f ). The command is:

pdf-parser.py –o 5 –f sample.pdf

In PDF jargon, streams are compressed using filters. You have all kinds of filters, for example ZLIB DEFLATE, but also lossy compressions like JPEG. pdf-parser supports a couple of filters, but not all, because the implementation of some of them (mostly the lossy ones) differs between vendors and PDF applications.


A recent article published by Virus Bulletin on JavaScript stored inside a lossy stream gave me the opportunity to implement a method I had worked out manually.

The problem: you need to decompress a stream and you have no decompression algorithm.

The solution: you use the PDF application to decompress the stream.

The method: you create a new PDF document with the stream as embedded file, and then save the embedded file using the PDF application.

The detailed method: when you need to decompress a stream for which you have no decompressor (or no decompressor identical to the target application), you create a new PDF document into which you include the object with the stream as an embedded file. PDF documents support embedded files. For example, if you have a PDF document explaining a financial method, you can include a spreadsheet in the PDF document as an embedded file. The embedded file is stored as an object with a stream, and the compression can be any method supported by the PDF application. Crafting this PDF document with embedded file manually requires many manipulations and calculations, and is thus a very good candidate for automation.

Figure: this PDF embeds a file called vbanner2.jpg

With pdf-parser, you can use this method as follows:

  1. Create a Python program that generates the PDF document with embedded file. Use pdf-parser like this (in this example, the data stream you want to decompress is in object 5 of PDF file sample.pdf): pdf-parser.py –generateembedded 5 sample.pdf > embedded.py
  2. Execute the Python program to create the PDF file: embedded.py embedded.pdf
  3. Open the created PDF file embedded.pdf with the target application (Adobe Reader for the Virus Bulletin example), and save the embedded file to disk
  4. The saved file contains the decompressed stream

You can find my PDF tools here.

Remark: the generated Python program requires my module mPDF.py, which can also be found on my PDF tools page.

Remark 2: don’t use this method when the stream contains an exploit for the decompressor.

Thursday 16 April 2015

pdf-parser: A Method To Manipulate PDFs Part 1

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

I provide 2 days of Hacking PDF training at HITB Amsterdam. This is one of the methods I teach.

Sometimes when I analyze PDF documents (benign or malicious), I want to reduce the PDF to its essential objects. But when one removes objects in a PDF, indexes need to be updated and references updated/removed. To automate this process as much as possible, I updated my pdf-parser program to generate a Python program that in turn, generates the original PDF.

Thus when I want to make changes to the PDF (like removing objects), I generate its corresponding Python program, and then I edit this Python program.

I do this simply with option -g.


Then you can edit the Python program, and when you run it, it will generate a new PDF file.

You can also use option -g together with option -f to filter the streams before they are inserted in the Python program. This gives you the decompressed streams in the Python program, opening them up to editing.

In this example, without option -f the Python statement for the stream object is:

oPDF.stream(5, 0, 'x\x9cs\nQ\xd0w3T02Q\x08IS040P0\x07\xe2\x90\x14\x05\r\x8f\xd4\x9c\x9c|\x85\xf0\xfc\xa2\x9c\x14M\x85\x90,\x05\xd7\x10\x00\xdfn\x0b!', '<<\r\n /Length %d\r\n /Filter /FlateDecode\r\n>>')

And with option -f, it becomes:

oPDF.stream2(5, 0, 'BT /F1 24 Tf 100 700 Td (Hello World) Tj ET', '', 'f')

The generated Python program relies on my mPDF library found in my PDF make tools.

pdf-parser_V0_6_2.zip (https)
MD5: D6717F1CA6B9DA2392E63F0DABF590DD
SHA256: 4DC0136062E9A5B6D84C74696005531609BD0299887B70DDFFAA19115BF2E746

Wednesday 15 April 2015

PDF Password Cracking With John The Ripper

Filed under: Encryption,PDF — Didier Stevens @ 0:00

I have a video showing how to use oclHashcat to crack PDF passwords, but I was also asked how to do this with John The Ripper on Windows.

It’s not difficult.

Download the latest jumbo edition john-the-ripper-v1.8.0-jumbo-1-win-32.7z from the custom builds page.

Decompress this version.

Download the previous jumbo edition John the Ripper 1.7.9-jumbo-5 (Windows binaries, ZIP, 3845 KB).

Extract file cyggcc_s-1.dll from the previous jumbo edition, and copy it to folder John-the-Ripper-v1.8.0-jumbo-1-Win-32\run.

Generate the hash for the password protected PDF file (I’m using my ex020.pdf exercise file) and store it in a file (pdf2john.py is a Python program, so you need to have Python installed):

John-the-Ripper-v1.8.0-jumbo-1-Win-32\run\pdf2john.py ex020.pdf > ex020.hash

Start John The Ripper:

John-the-Ripper-v1.8.0-jumbo-1-Win-32\run\john.exe ex020.hash

Loaded 1 password hash (PDF [MD5 SHA2 RC4/AES 32/32])
Will run 8 OpenMP threads
Press 'q' or Ctrl-C to abort, almost any other key for status
secret           (ex020.pdf)
1g 0:00:00:00 DONE 2/3 (2015-03-29 22:39) 10.20g/s 125071p/s 125071c/s 125071C/s
Use the "--show" option to display all of the cracked passwords reliably
Session completed

By starting John The Ripper without any options, it will first run in single crack mode and then in wordlist mode until it finds the password (secret).

But you can also provide your own wordlists (with option –wordlist) and use rules (option –rules) or work in incremental mode (–incremental).

Tuesday 31 March 2015

pdf-parser And YARA

Filed under: My Software,PDF — Didier Stevens @ 21:13

I’m teaching a PDF class at HITB Amsterdam in May. This is one of the many subjects covered in the class.

For about half a year now, I’ve been adding YARA support to several of my analysis tools. Like pdf-parser.

I’ll write some blogposts covering each tool with YARA support. I’ll start with a video for pdf-parser:

Next Page »

Blog at WordPress.com.