Analyzing a Malicious PDF File

Monday 20 October 2008

Analyzing a Malicious PDF File

Filed under: Malware,PDF — Didier Stevens @ 21:43

This starts a series of post leading up to my PDF talk at the next Belgian ISSA and OWASP chapter event. I’ll be publishing a couple of my PDF tools.

Next video shows how I use my PDF parser to analyze a malicious PDF file, and extract the shell code.

Searching for keyword javascript yields 2 indirect objects referencing /JavaScript objects. The JavaScript is executed through an automatic annotation (/AA) when the page is rendered (e.g. when the PDF document is opened, as it contains only one page). Decompressing the second /JavaScript object (34) displays the code.

collectEmailInfo is an undocument Adobe Acrobat JavaScript method with a vulnerability (fixed in Adobe Acrobat Reader 8.1.2). My Spidermonkey helps me to extract the shell code.

YouTube, Vimeo and hires Xvid.

Comments (12)

12 Comments »

[…] — Didier Stevens @ 21:38 A malicious PDF file I analyzed a couple of months ago (the one featured in this video) had a corrupted stream object. It uses a /FlateDecode filter, but I could not find a way to […]

Pingback by The Case of the Corrupted Stream Object « Didier Stevens — Tuesday 21 October 2008 @ 21:40
[…] Software, PDF — Didier Stevens @ 17:19 I’m publishing my pdf-parser tool featured in my last video. Details and download […]

Pingback by pdf-parser.py « Didier Stevens — Thursday 30 October 2008 @ 17:19
[…] PDF Test-Files Filed under: My Software, PDF — Didier Stevens @ 12:56 As promised, I’m releasing a couple of my PDF tools as a warm-up to my ISSA Belgium and OWASP Belgium […]

Pingback by Creating PDF Test-Files « Didier Stevens — Sunday 9 November 2008 @ 12:58
[…] the file loader. If you need more information please check Didier Steven’s site and this blog entry, also check Jon Paterson and Dennis Elser blog entry showing how they extracted the shellcode […]

Pingback by PDF file loader to extract and analyse shellcode « c0llateral Blog — Wednesday 6 January 2010 @ 23:19
Hello Didier Stevens

i have encounter this problem how to embed javaScript inside a PDF file. Do you have any best solution.

Comment by hong chun lin — Wednesday 12 May 2010 @ 6:33
@hong chun lin Take a look at my PDf tools.

Comment by Didier Stevens — Friday 14 May 2010 @ 10:07
[…] process I used to analyse the PDF is based on Didier’s video which you can find at https://blog.didierstevens.com/2008/10/20/analyzing-a-malicious-pdf-file/. I highly recommend you go and watch it if you’re interested in learning about this stuff. […]

Pingback by Solving the Security BSides London Challenge – Number 2 | 4armed — Thursday 21 April 2011 @ 14:39
Didier — Thank you for making these excellent tools. I want to clarify an aspect of how to perform the kind of analysis you show here. I have a PDF file which pdf-id.py reports a non-zero number for /JS and /AA (although in my case /JavaScript is 0). However, pdf-parser.py –search JS –raw finds nothing, and pdf-parser.py –search javascript –raw finds nothing, and pdf-parser.py –search AA –raw finds nothing. How can this be? Am I misunderstanding how these tools work? How can I track down exactly what pdf-id.py is finding that it counts as a /JS or an /AA? Thanks — Frank

Comment by Frank G. — Friday 21 December 2012 @ 15:17
@Frank Those are false positives. PDFiD does not take the PDF structure into account, so it happens that /JS and /AA are found inside object streams. Those strings are only 3 characters long, so when your PDF document is a few MB or larger, it is very likely to contain these strings, just by chance. If you open the PDF document with a hex editor and search for /JS and /AA, you’ll most likely find them inside a stream.

pdf-parser takes the structure into account, so /JS and /AA inside a stream are ignored.

Comment by Didier Stevens — Wednesday 26 December 2012 @ 19:41
[…] process I used to analyse the PDF is based on Didier’s video which you can find at https://blog.didierstevens.com/2008/10/20/analyzing-a-malicious-pdf-file/. I highly recommend you go and watch it if you’re interested in learning about this stuff. […]

Pingback by Solving the Security BSides London Challenge – Number 2 | 7 Elements — Wednesday 23 January 2013 @ 16:03
I’m having some issues when analyzing a phishing pdf with pdf-parser, where pdfid was able to identify the urls but pdf-parser wasn’t returning the results. Here is the VT link to the pdf: https://www.virustotal.com/#/file/f004fd972df400522698f48fe53958fec962a99d4d2f6ae4e0eaedf60a12adb6/detection, I was able to successfully extract the URL by running the pdf through strings and grepping for http.

Comment by Jason Killam — Tuesday 12 December 2017 @ 20:12
I was able to extract the URL with pdf-parser. Can you tell me exactly which command you used?

Comment by Didier Stevens — Tuesday 12 December 2017 @ 20:45

RSS feed for comments on this post. TrackBack URI

Didier Stevens

Monday 20 October 2008