This tool will parse a PDF document to identify the fundamental elements used in the analyzed file. It will not render a PDF document. The code of the parser is quick-and-dirty, I’m not recommending this as text book case for PDF parsers, but it gets the job done.
You can see the parser in action in this screencast.

The stats option display statistics of the objects found in the PDF document. Use this to identify PDF documents with unusual/unexpected objects, or to classify PDF documents. For example, I generated statistics for 2 malicious PDF files, and although they were very different in content and size, the statistics were identical, proving that they used the same attack vector and shared the same origin.
The search option searches for a string in indirect objects (not inside the stream of indirect objects). The search is not case-sensitive, and is susceptible to the obfuscation techniques I documented (as I’ve yet to encounter these obfuscation techniques in the wild, I decided no to resort to canonicalization).
filter option applies the filter(s) to the stream. For the moment, only FlateDecode is supported (e.g. zlib decompression).
The raw option makes pdf-parser output raw data (e.g. not the printable Python representation).
objects outputs the data of the indirect object which ID was specified. This ID is not version dependent. If more than one object have the same ID (disregarding the version), all these objects will be outputted.
reference allows you to select all objects referencing the specified indirect object. This ID is not version dependent.
type alows you to select all objects of a given type. The type is a Name and as such is case-sensitive and must start with a slash-character (/).
Download:
MD5: BDC0E5A82EB6D7C287E7360D8901023D
SHA256: C83D39F8938A00A3EB2BDE3134EFAF3A2BE11E72C2C8A92841D4E1E82366D7E1
make-pdf tools
make-pdf-javascript.py allows one to create a simple PDF document with embedded JavaScript that will execute upon opening of the PDF document. It’s essentially glue-code for the mPDF.py module which contains a class with methods to create headers, indirect objects, stream objects, trailers and XREFs.

If you execute it without options, it will generate a PDF document with JavaScript to display a message box (calling app.alert).
To provide your own JavaScript, use option –javascript for a script on the command line, or –javascriptfile for a script contained in a file.
Download:
MD5: 9AF2E343B78553021C989E8E22355531
SHA256: C604679ABEB0469C1463159E02E74F12487B2755A6096B416A8F4F638DEB8AA9
pdfid.py
This tool is not a PDF parser, but it will scan a file to look for certain PDF keywords, allowing you to identify PDF documents that contain (for example) JavaScript or execute an action when opened. PDFiD will also handle name obfuscation.
The idea is to use this tool first to triage PDF documents, and then analyze the suspicious ones with my pdf-parser.
An important design criterium for this program is simplicity. Parsing a PDF document completely requires a very complex program, and hence it is bound to contain many (security) bugs. To avoid the risk of getting exploited, I decided to keep this program very simple (it is even simpler than pdf-parser.py).

PDFiD will scan a PDF document for a given list of strings and count the occurrences (total and obfuscated) of each word:
- obj
- endobj
- stream
- endstream
- xref
- trailer
- startxref
- /Page
- /Encrypt
- /ObjStm
- /JS
- /JavaScript
- /AA
- /OpenAction
- /JBIG2Decode
- /RichMedia
- /Launch
Almost every PDF documents will contain the first 7 words (obj through startxref), and to a lesser extent stream and endstream. I’ve found a couple of PDF documents without xref or trailer, but these are rare (BTW, this is not an indication of a malicious PDF document).
/Page gives an indication of the number of pages in the PDF document. Most malicious PDF document have only one page.
/Encrypt indicates that the PDF document has DRM or needs a password to be read.
/ObjStm counts the number of object streams. An object stream is a stream object that can contain other objects, and can therefor be used to obfuscate objects (by using different filters).
/JS and /JavaScript indicate that the PDF document contains JavaScript. Almost all malicious PDF documents that I’ve found in the wild contain JavaScript (to exploit a JavaScript vulnerability and/or to execute a heap spray). Of course, you can also find JavaScript in PDF documents without malicious intend.
/AA and /OpenAction indicate an automatic action to be performed when the page/document is viewed. All malicious PDF documents with JavaScript I’ve seen in the wild had an automatic action to launch the JavaScript without user interaction.
The combination of automatic action and JavaScript makes a PDF document very suspicious.
/JBIG2Decode indicates if the PDF document uses JBIG2 compression. This is not necessarily and indication of a malicious PDF document, but requires further investigation.
/RichMedia is for embedded Flash.
/Launch counts launch actions.
A number that appears between parentheses after the counter represents the number of obfuscated occurrences. For example, /JBIG2Decode 1(1) tells you that the PDF document contains the name /JBIG2Decode and that it was obfuscated (using hexcodes, e.g. /JBIG#32Decode).
BTW, all the counters can be skewed if the PDF document is saved with incremental updates.
Because PDFiD is just a string scanner (supporting name obfuscation), it will also generate false positives. For example, a simple text file starting with %PDF-1.1 and containing words from the list will also be identified as a PDF document.
Download:
MD5: 99BFA4916EC5E005953E3D9D8AD96C83
SHA256: C831569C8139D5CA5709600B987C929716FE58B1DD6B65F18EC84473A83B4075
PDFTemplate.bt
This is a 010 Editor template for the PDF file format.
It’s particularly useful for malformed PDF files, like this example with PDFUnknown structures:

Download:
MD5: C124200C3317ACA9C17C2AE2579FCFEB
SHA256: 24C4FEAD2CABAD82EC336DDCFD404915E164D7B48FBA7BA1295E12BBAF8EB15D
[...] PDF, Quickpost — Didier Stevens @ 11:57 Per request, a more detailed post on how I use my pdf-parser stats [...]
Pingback by Quickpost: Fingerprinting PDF Files « Didier Stevens — Saturday 1 November 2008 @ 11:57
I’d like to be able to view a scanned pdf file (with handwriting in some fields) and black
out boxes on the form whose fields contain info I don’t want published.
Can that kind of thing be automated in a batch so that I don’t even have to open the files ?
That would be cool …
Can you point me in the right direction ? I’m not looking for you to code, but sending me in the right direction for this would be useful, and it looks like you’re cognizant of this kind of information.
Comment by james — Friday 21 November 2008 @ 15:03
@james
I’ve no experience with such tools, but you can start to look in the forum of PDF Planet.
Comment by Didier Stevens — Saturday 22 November 2008 @ 8:46
[...] download my Python program to generate these PoC PDF documents here, it needs the mPDF module of my PDF-tools. Quickpost info Possibly related posts: (automatically generated)PDF Stream ObjectsUsing DLE In An [...]
Pingback by Quickpost: /JBIG2Decode Essentials « Didier Stevens — Monday 2 March 2009 @ 23:12
[...] PDF — Didier Stevens @ 7:08 I’ve developed a new tool to triage PDF documents, PDFiD. It helps you differentiate between PDF documents that could be malicious and those that are most [...]
Pingback by PDFiD « Didier Stevens — Tuesday 31 March 2009 @ 7:08
Thanks Didier for sharing _yet_again_ I appreciate it very much!
Comment by Mitch Impey — Wednesday 1 April 2009 @ 20:17
[...] developed a new tool to triage PDF documents, PDFiD. It helps you differentiate between PDF documents that could be malicious and those that are most [...]
Pingback by Didier Stevens posted PDFid « Betterworldforus — Friday 3 April 2009 @ 18:34
[...] will give you statistics of some very basic elements of the PDF language. This helps you decide if a PDF could be malicious or [...]
Pingback by PDFiD On VirusTotal « Didier Stevens — Tuesday 21 April 2009 @ 16:59
[...] We can go a number of ways with this now, but since the point of all this is the fun we can have with obfuscated scripts in Adobe PDFs we’ll run it through PDFiD from Didier Stevens. [...]
Pingback by L1pht Offensive Labs » From Bloodhound to Acrobat JS — Saturday 25 April 2009 @ 2:59
Hello — I am using pdf-parser and python for the first time so please excuse my ignorance.
I’m using Python 3.0.1 on Windows XP. I’ve copied the pdf-parser.py file into the C:\Python30 directory which contains the python executable. Below is the error I get when attempting to execute your utility:
C:\Python30>python.exe pdf-parser.py
File “pdf-parser.py”, line 180
print ‘todo 1: %s’ % (self.token[1] + self.token2[1])
^
SyntaxError: invalid syntax
————
Any ideas? Thanks for your time.
Comment by Mike — Tuesday 5 May 2009 @ 19:02
@rMike No problem.
I’ve not tested my Python tools on Python 3
You should remove Python 3.0.1 and install Python 2.6. This will probably fix your problems.
I’ve still to decide when I upgrade my tools to Python 3
Comment by Didier Stevens — Tuesday 5 May 2009 @ 20:59
Hello — I’m now using Python 2.6.2. It appears to be working, however I am getting so much output from every pdf I examine, I wonder if I am doing something wrong.
My syntax is
pdf-parser.py –search javascript malware.pdf
The utility spits out hundreds maybe thousands of lines of returned information. At the very top there appears to be useful data, however there are hundreds of lines that look like:
todo 10: 3 ‘X1\x1e\x1b\x03\x12\x05X60B
Am I doing something incorrectly here or is there a way to filter the rest of this data out?
Thanks for your help.
PS, I watched your video on pdf-parser and it doesn’t have any audio.
Comment by Mike — Thursday 7 May 2009 @ 20:40
Also, just a note I am do have two hyphens in front of search (–search)
Comment by Mike — Thursday 7 May 2009 @ 20:41
No, what you’re doing is correct. These todo 10 messages indicate that your PDF document is possibly corrupt. I’ve seen one such corrupt document before, the malware authors had inserted their payload in the PDF without respecting the PDF syntax.
I’m sending you an e-mail to see if you can share your sample with me.
About the video: most of my videos have no sound, it takes me much more time to produce one with audio, and I’m never satisfied with the result.
Comment by Didier Stevens — Thursday 7 May 2009 @ 21:07
[...] updated my PDF-tools to support [...]
Pingback by PDF Filter Abbreviations « Didier Stevens — Monday 11 May 2009 @ 0:01
Didier,
Any idea whether/when Filiol, Blonce, & Frayssignes might release their ‘PDF StructAzer’ tool? My Google research doesn’t show anything from them since their presentation at Blackhat Europe last year. According to that paper, they planned to release the tool “very soon”, but maybe they’ve run into some red tape with their employer over it.
John
Comment by John McCash — Tuesday 12 May 2009 @ 13:03
Hi John,
No, still no news about a release, but I’ll ask him again.
Comment by Didier Stevens — Tuesday 12 May 2009 @ 13:51
[...] added some new features to my PDF tools to handle malformed PDF [...]
Pingback by Malformed PDF Documents « Didier Stevens — Thursday 14 May 2009 @ 7:55
PDF Structazer is available here:
http://www.esiea-recherche.eu/
The tool: http://www.esiea-recherche.eu/data/PDF%20Structazer.exe
The document (PDF): http://www.esiea-recherche.eu/data/PDF%20Structazer%20Short%20User%20Manual.pdf
Comment by Didier Stevens — Tuesday 26 May 2009 @ 13:01
[...] PDFiD – http://blog.didierstevens.com/2009/03/31/pdfid/ PDF Tools – http://blog.didierstevens.com/programs/pdf-tools/ Security Justice – http://securityjustice.com/ Exotic Liability – http://exoticliability.ning.com/ [...]
Pingback by SecuraBit Episode 32 PDF Love! | SecuraBit — Wednesday 27 May 2009 @ 14:33
[...] Finding and detecting Malicious PDF’s. Tyler’s Snort signature. Didier Steven’s fantastic PDF analysis tools. [...]
Pingback by Security Justice » Blog Archive » Security Justice - Episode 13 — Saturday 6 June 2009 @ 2:30
Didier,
Does your script pdf-parser.py works on Encrypted PDF streams ?
Comment by NeoIsOffline — Tuesday 14 July 2009 @ 12:28
No, I didn’t add code to decrypt PDF streams. And I’m not sure if I ever will, because then it could also be considered as a copyright infringement tool, and I don’t want to deal with that.
Comment by Didier Stevens — Tuesday 14 July 2009 @ 19:13
Im getting errors when running the python script:
C:\Documents and Settings\yo\Desktop\Tools\pdf>pdf-parser.py
File “C:\Documents and Settings\yo\Desktop\Tools\pdf\pdf-parser.py”, line 198
print ‘todo 1: %s’ % (self.token[1] + self.token2[1])
^
SyntaxError: invalid syntax
Im getting errors when trying to run the script. Im using activepyton 3.1 on windows xp. Launching it from the commandline. Was there any recent modifications which broke the script?
Thanks
Comment by Dave — Friday 17 July 2009 @ 16:48
@Dave
That looks the same error as in comment 10. Read comments 10 to 12 for a solution.
Comment by Didier Stevens — Friday 17 July 2009 @ 17:12
wow I feel ignorant, i just breezed through the comments fast. Thanks!
Comment by Dave — Friday 17 July 2009 @ 18:55
@Dave
No problem.
Comment by Didier Stevens — Friday 17 July 2009 @ 18:58
Hello Didier,
This tool is very cool, I am wondering how to integrate this to the ironport (mail filter) so that all attachments like pdf will be scanned and if found that there are openaction or javascript then probably we can filter that. Also, maybe this can be integrated in the Proxy or firewall. Do you have any link to see a topic on integrating this tool.
Thank you very much, indeed this is very helpful
Comment by Yaggi — Tuesday 28 July 2009 @ 0:12
What interface options does ironport offer? Does it support ICAP? http://en.wikipedia.org/wiki/Internet_Content_Adaptation_Protocol
Comment by Didier Stevens — Tuesday 28 July 2009 @ 18:55
Hello Didier,
I talked to our notes admin and this can somehow be integrated in the sendmail. He ask me some technicalities on the integration. maybe you can also provide some link. HE told me ironport will not be used.
Where can we possibly put this script for internet filter, is it in the proxy server or possibly to content filter boxes like from 8e6 technologies? Im hoping to finish this integration so we can maximize your script.
Comment by Yaggi — Wednesday 29 July 2009 @ 1:30
@Yaggi
How do you interface with other scanner, like AV software?
Comment by Didier Stevens — Friday 31 July 2009 @ 12:48
[...] Taking a quick look at the rest of the file it was clear that it is just a “simple” exploit using obfuscated javascript. So I extracted the scripts with the pdf-tools from Didier Stevens. [...]
Pingback by hong10.net » Blog Archive » Analyzing malicious PDF Documents — Monday 3 August 2009 @ 21:28
I was looking over your code in the pdf parser, and trying very very hard to get it to extract something from a pdf. I have tried using pdfs I’ve been given, as well as going directly into the files to try to mess things up, and I have yet to see the code go into the cPDFElementMalformed method. What kinds of data (objects?) do you expect to fall into this category?
Thank you -ld
Comment by Delucci — Tuesday 4 August 2009 @ 14:16
@Delucci
It’s for sometinh like this:
%PDF-1.4
1 0 obj
<>
endobj
unexpected
2 0 obj
<>
endobj
Comment by Didier Stevens — Tuesday 4 August 2009 @ 15:33
[...] the PDF analysis, I used the excellent PDF-Tools from Didier Stevens that can be located here. The main python script that was used was pdf-parser seen [...]
Pingback by PDF Malware Analysis – Part 1 | isolated-threat — Tuesday 18 August 2009 @ 21:02
[...] can download PDFiD here. Leave a [...]
Pingback by Update: PDFiD Version 0.0.9 to Detect Another Adobe 0Day « Didier Stevens — Tuesday 13 October 2009 @ 21:24
[...] Here is a set of tools that can embed Javascript into a PDF. [...]
Pingback by Adobe Reader 0-day exploit FINALLY fixed. | Invariable Truth — Sunday 18 October 2009 @ 9:35
Hi,
Great set of tools, I have noticed while checking PDF files for embedded links (i.e. URI tags), some PDFs contain hyperlinks but do not contain the URI tag, is there an alternative method of embedding hyperlinks, should I be searching for some other keyword?
Comment by Tye — Monday 19 October 2009 @ 15:41
Check if the URLs you see are in the metadata.
Comment by Didier Stevens — Tuesday 20 October 2009 @ 16:48
Hi,
I’ve been using your tool to decode the malicious PDF file that I have. It’s using /Filter /ASCIIHexDecode /FlateDecode.
I used the following command:
pdf-parser.py -f -w malpdf.1 > mal.1
The resulting file didn’t show any JavaScript code, instead it showed “ASCIIHexDecode decompress failed”.
Wepawet is able to decode it though (http://wepawet.cs.ucsb.edu/view.php?hash=c9aad1ecee10ddcf1985ae4961e18fbf&type=js).
Are my parameters for the tool incorrect? Or doesn’t the tool support this?
Thanks in advance.
Comment by anima — Friday 30 October 2009 @ 5:40
@anima
I’ve e-mailed you a request for the sample.
Comment by Didier Stevens — Thursday 5 November 2009 @ 17:45
[...] my method: Use the tools from here. First of all pdfid can tell you if a pdf has Javascript included as well as autorun functionality [...]
Pingback by PDF file check question - Remote Exploit Forums — Sunday 6 December 2009 @ 22:25
[...] first tool we’ll be using is pdf-parser.py from the PDF Tools suite. This script will search through a PDF file’s sections, display raw data in the sections, [...]
Pingback by Reversing the Adobe 0-day APSA09-07 Exploit – Part 1 | Missouri S&T ACM SIG-SEC|Reversing — Wednesday 16 December 2009 @ 3:55
[...] Countermeasures __________________ Either you're part of the problem or you're part of the solution or you're just part of the landscape. [...]
Pingback by Using-an-adobe-exploit-in-a-email-attack - Remote Exploit Forums — Tuesday 22 December 2009 @ 15:46
[...] pdf-parser.py http://blog.didierstevens.com/programs/pdf-tools/ Lets decompression some of the zlib compressed code inside of the PDF and send the raw output to a [...]
Pingback by Reversing MerryChristmas.pdf - Sp8sCorp — Thursday 31 December 2009 @ 5:01
[...] pdf-parser.py or PDF Structazer to analyze PDF files [...]
Pingback by Can You Trust That File? « Aggressive Virus Defense — Thursday 31 December 2009 @ 22:40
[...] [...]
Pingback by How to encode a PDF payload in metasploit? - Remote Exploit Forums — Tuesday 5 January 2010 @ 14:10
[...] thanks to Didier Stevens for his free PDF tools and for providing some [...]
Pingback by PDF file loader to extract and analyse shellcode « c0llateral Blog — Wednesday 6 January 2010 @ 23:19
Hey Didier,
Thanks for excellent tool and great PDF analysis blog. I enjoyed every minute and in addition I have become much more paranoid when it comes to carelessly downloading tons of PDF material. Now I run all my PDFs through your “pdfid” tool, if I have downloaded anything from a suspicious site…
But I can’t help thinking that this should be implemented as an automatic plug-in/add-on to Firefox? You know, when you click on PDFs, they usually automatically open in the browser, which is nice if it was safe. But in the cyber-war era of today it is simply very bad, at it’s best!
Comment by E:V:A — Saturday 16 January 2010 @ 18:40
I’m looking into this, but the problem is to prevent the download PDF from being opened after it’s downloaded and before it’s scanned. I talked to the developer of the Fireclam add-on and he has the same issue.
Comment by Didier Stevens — Tuesday 19 January 2010 @ 9:29
Only thing i get are syntax error!
C:\pdfid_v0_0_10>pdfid.py
File “C:\pdfid_v0_0_10\pdfid.py”, line 271
print ‘/%s -> /%s’ % (HexcodeName2String(wordExact), wordExactSwapped)
Comment by sheldor — Sunday 31 January 2010 @ 22:18
Are you using Python 3? Haven’t tested PDFiD on Python 3. Use Python 2.
Comment by Didier Stevens — Sunday 31 January 2010 @ 22:20
got it!! just read the comments!
Comment by sheldor — Sunday 31 January 2010 @ 22:35
whow just noticed your quick response! thank you didier! great tool!
Comment by sheldor — Sunday 31 January 2010 @ 22:36
Is the File Size limited? Everytime i scan larger PDF files i get exceptions like this:
***Error occured***
Traceback (most recent call last):
File “C:\PDFtools\pdfid.py”, line 363, in PDFiD
(bytesHeader, pdfHeader) = FindPDFHeaderRelaxed(oBinaryFile)
File “C:\PDFtools\pdfid.py”, line 218, in FindPDFHeaderRelaxed
bytes = oBinaryFile.bytes(1024)
File “C:\PDFtools\pdfid.py”, line 70, in bytes
inbytes = self.infile.read(size – len(self.ungetted))
IOError: [Errno 9] Bad file descriptor
Comment by sheldor — Monday 8 February 2010 @ 13:50
@sheldor: No, I didn’t code an explicit file size limit. I tried on PDF files up to 41MB without problems. How large is your PDF file?
Comment by Didier Stevens — Monday 8 February 2010 @ 14:55
I have this issue with PDFs 20MB and up! Well,.. then there must be another reason! Still can’t figure it out!
Anyhow, thank you!
Comment by sheldor — Monday 8 February 2010 @ 17:27
@sheldor: If you can point me to an online PDF document that causes the problem you experience, I’ll take a look at it.
Comment by Didier Stevens — Monday 8 February 2010 @ 20:24
[...] Didier Stevens has provided a fantastic resource and tools for analyzing PDF files. Some of these resources have been incorporated into VirusTotal. Didier Stevens: http://blog.didierstevens.com/programs/pdf-tools/ [...]
Pingback by PDF Exploitation & Forensic Resources « MadMark's Blog — Tuesday 16 February 2010 @ 19:00
[...] we see in Pyew? The output of PDFId (a great tool by Didier Stevens) is shown as well as the hexadecimal output of the first block (512 [...]
Pingback by Unintended Results » Blog Archive » Analyzing PDF exploits with Pyew — Sunday 21 February 2010 @ 14:50
Very cool man, I tried to use PDF tools to unwind a drive-by ZeuS pdf infection. Unfortunately, it gave me some problems because I was using a newer version of Python (and it looks like the El Fiesta Exploit kit might use some kind of different zLib encoding to compress its payloads). Good stuff though!
http://www.mdl4.com/2010/02/28/reverse-engineering-zeus/
Comment by mdl4 — Tuesday 2 March 2010 @ 11:15
[...] http://blog.didierstevens.com/programs/pdf-tools/ [...]
Pingback by PDF Malware Analysis Tools | Tahir's Security Blog — Wednesday 31 March 2010 @ 18:11
[...] For the PDF analysis, I used the excellent PDF-Tools from Didier Stevens that can be located here. The main python script that was used was pdf-parser and pdfid seen [...]
Pingback by PDF Launch Command without javascript - isolated-threat — Thursday 1 April 2010 @ 10:47
[...] the PDF with Didier Steven’s pdfid.py showed that there was an OpenAction in the PDF, but no JavaScript. Interesting. Using [...]
Pingback by /Launch Malicious PDF | Portable Digital Video Recorder — Tuesday 27 April 2010 @ 22:48
[...] @ 10:11 Now that malicious PDFs using the /Launch action become more prevalent, I release a new PDFiD version to detect (and disarm) the /Launch [...]
Pingback by Update: PDFiD Version 0.0.11 to Detect /Launch « Didier Stevens — Thursday 29 April 2010 @ 10:11
[...] الباحث ديدر ستفينز أداة جديدة (pdfid.py)، تساعد الكشف عمّا إذا كان ملف pdf يحتوي [...]
Pingback by اطلاق أداة جديدة تقوم بالكشف على ملفات pdf قبل تشغيلها | مجتمع الحماية العربي — Thursday 29 April 2010 @ 18:48
[...] PDFiD v0.0.11 – didierstevens.com I release a new PDFiD version to detect (and disarm) the /Launch action. [...]
Pingback by Week 17 in Review – 2010 | Portable Digital Video Recorder — Monday 3 May 2010 @ 6:41
[...] PDFiD v0.0.11 – didierstevens.com I release a new PDFiD version to detect (and disarm) the /Launch action. [...]
Pingback by Week 17 in Review – 2010 | Infosec Events — Tuesday 4 May 2010 @ 9:40
I have a large volume of pdfs coming soon from a vendor, does pdfid.py handle compressed (gzip, bzip2, zip) files? If so, how. If not is it something that can be worked around or accomplished with another program?
BTW
really appreciate your work, your blog and website have been a treasure trove of information.
Comment by Johnny — Tuesday 4 May 2010 @ 16:34
@Johnny No, but it has an option to scan all files in a folder. Unzip all PDFs to a folder and use that option.
Comment by Didier Stevens — Tuesday 4 May 2010 @ 20:57
[...] non fidate può essere utile eseguire un’analisi automatizzata ricorrendo al tool pdfid.py di Didier Stevens. Si tratta di uno script, funzionante su Windows, Linux e qualsiasi sistema che [...]
Pingback by Analizzare e “disinfettare” file PDF con pdfid — Tuesday 4 May 2010 @ 21:13
[...] fonctions suspectes cachées dans le PDF (à savoir exécution de Javascript et d'exécutables) : pdfid et pdf-parser. Avant de découvrir les fonctionnalités de ces deux outils, il est important de connaitre la [...]
Pingback by Les outils d’analyse de PDF « Elevenses blog — Monday 10 May 2010 @ 14:45
[...] Il n'a pas fallu longtemps pour que ce PoC (Proof Of Concept) ne soit utilisé par dans des PDF malicieux, permettant ainsi d'installer un trojan sur la machine cible. Didier Stevens a développé deux scripts Python permettant d'analyser les PDF pour y découvrir d'éventuelles fonctions suspectes cachées dans le PDF (entre autres exécution de Javascript et d'exécutables) : pdfid et pdf-parser. [...]
Pingback by Les outils d’analyse de PDF « ELEVENSES BLOG — Thursday 27 May 2010 @ 15:41
[...] }; "> — Classificat com a: Eines — Comentari (0) — Lectures: 2130 abril 2010PDFID.py és una eina que analitza un fitxer PDF i mostra les característiques de les que fa ús. Per [...]
Pingback by Eina: PDFID.py | L’home dibuixat — Saturday 5 June 2010 @ 20:01
[...] you used my pdf-parser, you’ve also encountered a problem. The objects lack the endobj keyword. A simple solution: [...]
Pingback by Solving the Win7 Puzzle « Didier Stevens — Friday 25 June 2010 @ 9:39
With PDFiD, I’ve noticed I get a lot of false positives on the /JS and /AA tags, since in most cases (that I’ve looked at) they seem to be simply text in a compressed image or something similar. I haven’t seen a /JS used on it’s own for Javascript, but it does seem that if there is a /JS then there is also a /S/JavaScript to go with it.
Is this always the case, or just in the samples I’ve looked at so far (same applies for AA)? Finding the text JavaScript is much less likely to lead to a false positive than JS.
Comment by Russell — Monday 28 June 2010 @ 23:28
@Russell Good observation, I almost always see /JavaScript together with /JS. I’ve seen some cases without /JavaScript, but it looks like these were non-functional.
Comment by Didier Stevens — Tuesday 29 June 2010 @ 9:01
This is a complementing post. Work you have done is adorable I liked it how do you get this all in mind???
but anyways I found this great and keep going. keep making us explore each security aspect.
thanks
Sushant
Comment by Sushant — Friday 9 July 2010 @ 10:50
[...] plików PDF: Didier’s PDF tools, Origami framework, Jsunpack-n, [...]
Pingback by » REMnux — programy do analizy złośliwego oprogramowania -- Niebezpiecznik.pl -- — Monday 12 July 2010 @ 9:15
[...] any known viruses, when run through a total of 32 anti-virus programs. Processing the file with PDFiD reveals that the file contains no JavaScript objects, but it does contain a single JS object. [...]
Pingback by Al-Qaeda Magazine is Cupcake Recipe Book | Public Intelligence — Monday 12 July 2010 @ 21:18
Possible bug: PDFiD fails sometime in cPDFEOF when using –extra option for entropy, stating cntCharsAfterLastEOF doesn’t exist. Defining it in init seems to fix the issue.
Other Notes: Is it possible to use pdf-parser to parse pdf-parser output? For example, I can see a use of this when using pdf-parser to obtain contents of object streams, but then it would be nice if it were possible to use pdf-parser on THAT output to display all Launch commands, for example (similar to piping into PDFiD, but actually seeing the contents instead of just the count). Then again, object stream structure is a bit different so perhaps that’s why it doesn’t play nice. I haven’t figured it out yet…
Comment by Russell — Thursday 15 July 2010 @ 23:21
[...] PDF analysis: Didier’s PDF tools, Origami framework, Jsunpack-n, [...]
Pingback by Malware Analysis Tools Set Up for Linux « Wikihead's Blog — Saturday 17 July 2010 @ 9:31
@Russell Thanks for the feedback. I’ve had similar reports, and defining it in the init fixes the issue, but I also would like to understand the bug. Can you share a sample?
Comment by Didier Stevens — Monday 19 July 2010 @ 11:30
[...] and Flare. Furthermore, it contains several applications for analyzing malicious PDFs, such as the Didier Steven’s analysis tools. The OS also provides a lot of tools for de-obfucating JavaScript, including Rhino [...]
Pingback by New Linux OS REMnux Designed For Reverse Engineering Malware « The FORWARD project blog — Tuesday 20 July 2010 @ 10:36
[...] Il n'a pas fallu longtemps pour que ce PoC (Proof Of Concept) ne soit utilisé par dans des PDF malicieux, permettant ainsi d'installer un trojan sur la machine cible. Didier Stevens a développé deux scripts Python permettant d'analyser les PDF pour y découvrir d'éventuelles fonctions suspectes cachées dans le PDF (entre autres exécution de Javascript et d'exécutables) : pdfid et pdf-parser. [...]
Pingback by Les outils d’analyse de « ELEVENSES BLOG — Monday 2 August 2010 @ 8:29
[...] I highly recommend any security conscious sysadmins add this tool to their toolkit, as the number of PDF exploits are likely to continue rising for the forseeable future. PDFiD can be downloaded from Didier Stevens website at http://blog.didierstevens.com/programs/pdf-tools. [...]
Pingback by PDFiD: Analyzing suspicious PDFs « Life as a cmddot — Tuesday 3 August 2010 @ 7:03
[...] Font Format) stream that looked suspicious enough for us to decode it (thanks to the excellent pdf-parser tool from Didier Stevens). In the now clear-text stream, we could identify at least one manifest [...]
Pingback by iPhone 4 / iPad: The Keys Out Of Prison | Fortinet Security Blog — Thursday 5 August 2010 @ 8:27
[...] – Didier Stevens’ PDF tools: analyse, identify and create PDF files (includes PDFiD: pdf-parser and [...]
Pingback by Security tools « Eikonal Blog — Monday 9 August 2010 @ 14:29
[...] i PDF-tools di Didier Stevens si riesce ad analizzare la struttura dei file PDF, anche se tutti risultano [...]
Pingback by Honeynet Project: Challenge 3/2010 (II parte) « Il non-blog di Mario Pascucci — Thursday 19 August 2010 @ 3:04
Is there a licensing agreement with using pdfid or pdf-parser? Can it be used as part of software that will be sold?
Comment by Jon — Thursday 2 September 2010 @ 14:59
[...] Here is a PDF template for the 010 Editor. It’s particularly useful for malformed PDF files, like this example with PDFUnknown structures: [...]
Pingback by PDFTemplate « Didier Stevens — Friday 3 September 2010 @ 10:36
[...] Didier Stevens’ PDF tools: analyse, identify and create PDF files (includes PDFiD, pdf-parser and make-pdf and mPDF) [...]
Pingback by Python tools for penetration testers | Secondary Logic – There is always a theory !!! — Saturday 4 September 2010 @ 8:08
@Jon Can’t contact you, you didn’t provide an e-mail address.
Comment by Didier Stevens — Sunday 5 September 2010 @ 21:31
[...] & pdftools – Two frameworks for analysing malicious PDF [...]
Pingback by Mercury – Live Honeypot DVD « Infosanity's Blog — Wednesday 22 September 2010 @ 14:26
[...] http://blog.didierstevens.com/programs/pdf-tools/ [...]
Pingback by BruCON 2010 : Day 0×2 | Peter Van Eeckhoutte's Blog — Saturday 25 September 2010 @ 20:54
[...] Font Format) stream that looked suspicious enough for us to decode it (thanks to the excellent pdf-parser tool from Didier Stevens). In the now clear-text stream, we could identify at least one manifest [...]
Pingback by » iPhone 4 / iPad: The Keys Out Of Prison — Saturday 25 September 2010 @ 22:53
[...] Analyse verdächtiger Dateien hält Stevens verschiedene selbstentwickelte Tools auf seiner Website vorrätig, deren Nutzung für technisch unversierte Lesefreunde allerdings wenig praktikabel ist. Weil schon [...]
Pingback by Schadhafte pdf-Dateien identifizieren » Software » lesen.net — Monday 27 September 2010 @ 17:55
[...] Didier Stevens’ PDF tools Over the weekend, I was reading Didier Stevens’ chapter on malicious PDF analysis and I have to give credit to him to break down the technical part of a PDF into something simple and easy to understand (er … maybe I am the only one who is coming to term with PDF for the first time). Reading the article brought me to his PDF-tools. pdfid and pdf-parser is definitely a must try if you really want to get your hands-on on PDF analysis. [...]
Pingback by Hunger 4 Knowledge #10 « David Koepi — Sunday 3 October 2010 @ 1:28
[...] and wonder where to start. Get a Linux distro, install Python, and use Didier Stevens PDF parser [Didier Stevens]. This is a script that will structure all the objects for you, making them more readable. This is [...]
Pingback by Analyzing malicious PDFs — Monday 11 October 2010 @ 19:03
[...] and dump the zipped sections of a PDF file. In my opinion, the best are Didier Steven’s PDF Tools. Unfortunately, in this case, none of them worked for me, so I had to do it manually. I selected [...]
Pingback by Reverse engineering a Facebook ZeuS infection — Monday 25 October 2010 @ 2:24
[...] was about malicious PDF analysis, given by “Mr PDF” himself, Didier Stevens. Using his toolbox, several malicious PDF files were analyzed with a growing complexity. Very interesting and this [...]
Pingback by Hack.lu Day #1 Wrap-up « /dev/random — Wednesday 27 October 2010 @ 21:51
[...] pdfid.py and pdf-parser.py. Get them from from Didier Stevens PDF Tools page. [...]
Pingback by Analysing a Malicious PDF Document — Saturday 6 November 2010 @ 12:08
[...] Download: click here [...]
Pingback by Malware Analysis: Handy tools for analysing PDF files « Brainfold's blog — Tuesday 16 November 2010 @ 3:00
[...] I ran pdf-parser.py against the pdf file. The output indicated that there were 2 “interesting” objects [...]
Pingback by Malicious pdf analysis : from price.zip to flashplayer.exe | Peter Van Eeckhoutte's Blog — Thursday 18 November 2010 @ 13:50
Didier,
Is there a way to embed a .exe in a pdf and have it automatically execute when the pdf is opened? I have tried to use your .py tool but it does not run the .exe after being opened.
Thanks,
Willie
Comment by Willie — Saturday 20 November 2010 @ 6:37
@Willie That’s normal, Adobe Reader doesn’t allow you to extract executable files. I found one way to deliver executable files: http://blog.didierstevens.com/2010/03/29/escape-from-pdf/
But Adobe has updated their reader to prevent this /Launch action.
Comment by Didier Stevens — Saturday 20 November 2010 @ 8:49
[...] obvious choice were the pdftools from Didier Stevens. What [...]
Pingback by Malware PDF. Analysis of a very simple sample. | Brundle Lab — Tuesday 23 November 2010 @ 18:35
[...] Didier’s own pdf-parser.py, the PDF’s meta information for the creation date is as [...]
Pingback by Praetorian Prefect | The Anonymous PR Guy and a Greece Connection — Sunday 12 December 2010 @ 0:57
[...] Il n’a pas fallu longtemps pour que ce PoC (Proof Of Concept) ne soit utilisé par dans des PDF malicieux, permettant ainsi d’installer un trojan sur la machine cible. Didier Stevens a développé deux scripts Python permettant d’analyser les PDF pour y découvrir d’éventuelles fonctions suspectes cachées dans le PDF (entre autres exécution de Javascript et d’exécutables) : pdfid et pdf-parser. [...]
Pingback by Secur-IT — Thursday 6 January 2011 @ 13:28
[...] second Didier Steven’s PDF Tools. PDF Tools includes pdf-parser.py, make-pdf-javascript.py, and pdfid.py. Pdf-parser and pdfid are [...]
Pingback by Tools — Saturday 29 January 2011 @ 18:34
[...] PDF-Parser (http://blog.didierstevens.com/programs/pdf-tools/) [...]
Pingback by Attributes of a Zero Dollar Malware Analysis Environment « SecAnalysis — Tuesday 8 February 2011 @ 3:07
hi!
i tried using your make-pdf-javascript.py. i gave it a javascript file which executes notepad, but though it got embedded( i checked it with pdf-parser.py), it did not run.
wen i run the js file directly it executes, but when i embed it , it does not run.
Comment by pret — Tuesday 15 February 2011 @ 11:41
@pret And how do you start Notepad?
Comment by Didier Stevens — Tuesday 15 February 2011 @ 17:08
i ran notepad directly from js file using ws.run command , but wen i run the script outside pdf, it runs, wen i embed it in pdf and run, it gets embedded but does not run. pls tell how can i make it run.
Comment by pret — Thursday 17 February 2011 @ 5:28
@pret You are using a Windows JavaScript feature, that’s not supported by Adobe’s JavaScript. There is no feature to run arbitrary programs.
Comment by Didier Stevens — Thursday 17 February 2011 @ 7:06
I am new to Python. I have installed Python 27 and have tried running pdfid.py with no success.
The syntax >>>pdfid.py MidtermChazaraQuestions.pdf returns Invalid Syntax error in the input.
What am I doing wrong? It is extremely important I analyze this file. It may be the key to the identity theft that is destroying me. Please, help!
Comment by Joseph Ainbinder — Friday 18 February 2011 @ 19:33
@joseph You need to use 2.6, a module in 2.7 was deprecated.
Comment by Didier Stevens — Friday 18 February 2011 @ 20:07
[...] pdf-parser.py – http://blog.didierstevens.com/programs/pdf-tools/ (éditer le source pour modifier la version maximale de python acceptée)- pdfid.py – [...]
Pingback by escape from PDF | Linux-backtrack.com — Saturday 19 February 2011 @ 21:20
[...] – Didier Stevens’ PDF tools: analyse, identify and create PDF files (includes PDFiD: pdf-parser and [...]
Pingback by Malware analysis « Eikonal Blog — Monday 28 February 2011 @ 16:33
[...] pdf-parser.py [...]
Pingback by PDF Analysis for Humans « P4r4n0id Reversing Lab — Friday 18 March 2011 @ 15:28
i had a problem with “make-pdf-javascript”
first use with the original package:
C:\Documents and Settings\abdelmoumen bacetti\mpdf1>python make-pdf-javascript.py test.pdf
File “make-pdf-javascript.py”, line 29
print ”
^
SyntaxError: invalid syntax
###############################################################################################
so i changed the lines 29,30,31,32,33,55,61 in “make-pdf-javascript.py” and line 110 in “mPDF.py” because the “prints” are without parenthesis
###############################################################################################
after fixing the prints problem:
C:\Documents and Settings\abdelmoumen bacetti\mpdf>python make-pdf-javascript.py down.pdf
Traceback (most recent call last):
File “make-pdf-javascript.py”, line 71, in
Main()
File “make-pdf-javascript.py”, line 44, in Main
oPDF.stream(5, 0, ‘BT /F1 12 Tf 100 700 Td 15 TL (JavaScript example) Tj ET’)
File “C:\Documents and Settings\abdelmoumen bacetti\mpdf\mPDF.py”, line 69, in stream
self.appendBinary(streamdata)
File “C:\Documents and Settings\abdelmoumen bacetti\mpdf\mPDF.py”, line 39, in appendBinary
fPDF.write(str)
TypeError: ‘str’ does not support the buffer interface
###############################################################################################
config:
Windows XP SP2
Python 3.2
Comment by bmoumen — Sunday 10 April 2011 @ 13:37
@bmoumen Yes, my Python programs are not designed for Python 3. Neither do most of my programs work on 2.7, because of a deprecated module I use to parse command lines. It’s something I hope to solve in a near future (i.e. make my Python programs compatible with Python 2.5, 2.6, 2.7 and 3.x).
Comment by Didier Stevens — Monday 11 April 2011 @ 7:05
[...] of python tools which can be used for analysing PDFs. I downloaded two of his tools from this page http://blog.didierstevens.com/programs/pdf-tools/, pdf-parser.py and [...]
Pingback by Solving the Security BSides London Challenge – Number 2 | 4armed — Thursday 21 April 2011 @ 14:39
[...] a look at my Analyzing Malicious Documents Cheat Sheet. From the tools perspective, Didier Steven’s pdf-parser is an all-time favorite. Another excellent tool, which sports a user-friendly GUI, is PDF Stream [...]
Pingback by How to Extract Flash Objects from Malicious PDF Files — Wednesday 4 May 2011 @ 15:18
[...] PDF Tools by Didier Stevens is the classic toolkit that established the foundation for our understanding of the PDF analysis process. It includes pdfid.py to quickly scan the PDF for risky objects and, most usefully, pdf-parser.py to examine their contents. [...]
Pingback by 6 Free Tools for Analyzing Malicious PDF Files « AfterShell.com — Wednesday 11 May 2011 @ 17:46
[...] Signatures work with a few open source tools. The first one is pdf-parser.py which is part of the PDF Tools by Didier [...]
Pingback by The Anatomy of a PDF Signature < experiment nr.: 1598 — Wednesday 11 May 2011 @ 19:39
[...] But did you notice the inclusion of my PDFiD and pdf-parser tools? [...]
Pingback by BackTrack 5 Includes PDFiD and pdf-parser « Didier Stevens — Thursday 12 May 2011 @ 21:13
[...] my PDF tools [...]
Pingback by Malicious PDF Analysis Workshop Screencasts « Didier Stevens — Wednesday 25 May 2011 @ 15:59
[...] Stevens’ PDF tools: analyse, identify and create PDF files (includes PDFiD, pdf-parser and make-pdf and [...]
Pingback by 基于python渗透测试工具 — Sunday 29 May 2011 @ 3:00
[...] here. In the past I have also used [...]
Pingback by Checking a PDF for exploits Drija — Thursday 9 June 2011 @ 4:14
[...] encodings to name like JBIG2Decode and DCTDecode. FlateDecode usually can be decoded by using pdf-parser [...]
Pingback by Analyzing malicious PDF « lab69 — Thursday 23 June 2011 @ 17:08
[...] suo interno l’exploit vero e proprio. Sinceramente non sono riuscito a decomprimerlo né con pdf-parser di Didier Stevens, né con PDF Stream Dumper, né con Ghostscript come spiegato qui. Diciamo che [...]
Pingback by Jailbreakme: ecco come funziona il jailbreak per iPad 2 — Wednesday 6 July 2011 @ 21:28
Hi Didier,
May I ask you which tools are you using for Python (debuggers,..)
Thanks
Comment by zudqg — Wednesday 20 July 2011 @ 13:59
[...] primero que nos interesa es determinar el contenido del PDF y para ello utilizamos las PDFtools que nos permiten analizar PDF. Ejecutamos la herramienta pdfid para ver el contenido del fichero y [...]
Pingback by Reconstructing JavaScript Exploit « Simon Roses Femerling – Blog — Wednesday 20 July 2011 @ 20:32
@zudqg I’m going to disappoint you, for Python, I just use a text editor.
Comment by Didier Stevens — Thursday 21 July 2011 @ 6:31
[...] Didier Stevens’ PDF tools: analyse, identify and create PDF files (includes PDFiD, pdf-parser and make-pdf and mPDF) [...]
Pingback by Repost:Lista de ferramentas de segurança feitas em Python. « VSLA – Virtual Security Labs Anywhere — Monday 1 August 2011 @ 15:51
[...] Didier Stevens’ PDF tools: analyse, identify and create PDF files (includes PDFiD, pdf-parser and make-pdf and mPDF) [...]
Pingback by Attack Attack » Python tools for penetration testers — Monday 8 August 2011 @ 4:17
[...] javascript heap overflow in PDF. More info to come. I used Didier Steven’s pdfid and pdf-parser to extract the javascript. The Javascript which is called when the document is opened creates a [...]
Pingback by The Spy Hunter, Part II – Solution « wirewatcher — Sunday 14 August 2011 @ 20:55
Just a “wowie” comment – thanks for sharing these tools, they’re fantastic.
Comment by B. Oceander — Monday 26 September 2011 @ 14:55
Hi Didier,
Do you have a tool, or know of a tool, that can take an existing PDF and add JS to it? I would like the ability to add javascript to multiple existing files. It would basically have the same functionality as your current make-pdf.py script, but you’d provide it an existing PDF, as well as a JS file that it would be merged with.
Thx for your help!
Comment by Sagui — Thursday 13 October 2011 @ 12:35
@Sagui Look for phptk, it can merge 2 PDF files.
Comment by Didier Stevens — Friday 14 October 2011 @ 20:51
Hi Didier, thanks for providing these tools, would you have any objection to me adding them to a public github repo so people can contribute any fixes/extensions they have?
Comment by Tom — Sunday 16 October 2011 @ 13:03
@Tom No problem, let me know where.
Comment by Didier Stevens — Sunday 16 October 2011 @ 13:23
All done https://github.com/thomcarver/pdf-tools
Comment by Tom — Sunday 16 October 2011 @ 15:21
Hello,
Can some one help me to figure out how to use this pdfid tool. I have python inerpretor installed but would like to know how I can specify which file or directory I want this tool to parse.
I am new to Python.
Comment by Ishwar — Tuesday 18 October 2011 @ 12:20
@Ishwar: I assume you’re running Windows? Then you install Python 2.X (not version 3), open a command line (cmd.exe) and type pdfid.py test.pdf where test.pdf is the file you want to check.
Comment by Didier Stevens — Wednesday 19 October 2011 @ 16:52
[...] purpose, or write a custom tool ourselves. For the sake of this tutorial, I’ll stick with Didier Steven’s excellent “make-pdf” python script (which uses the mPDF [...]
Pingback by Exploit writing tutorial part 11 : Heap Spraying Demystified | Corelan Team — Saturday 31 December 2011 @ 23:32
Hello Didier,
Thank you for providing these tools.
I have scanned a PDF I suspect may be malicious with your pdfid script, and it returned 0 for everything but ” /AcroForm 1″. I see above that acroform is not described in the pdfid summary. Could you please tell the meaning of this, and how to tell whether it is harmful?
Comment by Inkblots — Wednesday 11 January 2012 @ 19:40
@InkBlots Take a look at my PDF workshop, I’ve an exercise for AcroForm. AcroForm can contain JavaScript that is executed when a document is opened.
Comment by Didier Stevens — Wednesday 11 January 2012 @ 20:15
[...] PDF-Parser (http://blog.didierstevens.com/programs/pdf-tools/) [...]
Pingback by Attributes of a Zero Dollar Malware Analysis System « secanalysis.com — Monday 16 January 2012 @ 17:21
[...] Related great tools: http://blog.didierstevens.com/programs/pdf-tools/ [...]
Pingback by Re: pdf attacks vectors | Net Cleaner — Saturday 21 January 2012 @ 18:29