Didier Stevens

Monday 27 October 2014

Update: PDFiD With Plugins Part 2

Filed under: My Software,PDF,Update — Didier Stevens @ 8:40

The second feature in this new version of PDFiD is selection. With this, you can select PDFs using criteria you provide.

Example:

pdfid.py -S “pdf.javascript.count > 0″ *.pdf

This command will select all files with extension .pdf in the current directory that are PDFs and have a /JavaScript count larger than zero. The selection expression you provide is a Python expression. Here is a list off attributes to use in your selection expressions:

pdf.version
pdf.filename
pdf.errorOccured
pdf.errorMessage
pdf.isPDF
pdf.header

pdf.keywords[keywordname].count
pdf.keywords[keywordname].hexcode

pdf.keywords['/AA'].count
pdf.keywords['/Root'].count # if option -a and if /Root present in PDF

pdf.obj.count
pdf.obj.hexcode
pdf.endobj.count
pdf.endobj.hexcode
pdf.stream.count
pdf.stream.hexcode
pdf.endstream.count
pdf.endstream.hexcode
pdf.xref.count
pdf.xref.hexcode
pdf.trailer.count
pdf.trailer.hexcode
pdf.startxref.count
pdf.startxref.hexcode
pdf.page.count
pdf.page.hexcode
pdf.encrypt.count
pdf.encrypt.hexcode
pdf.objstm.count
pdf.objstm.hexcode
pdf.js.count
pdf.js.hexcode
pdf.javascript.count
pdf.javascript.hexcode
pdf.aa.count
pdf.aa.hexcode
pdf.openaction.count
pdf.openaction.hexcode
pdf.acroform.count
pdf.acroform.hexcode
pdf.jbig2decode.count
pdf.jbig2decode.hexcode
pdf.richmedia.count
pdf.richmedia.hexcode
pdf.launch.count
pdf.launch.hexcode
pdf.embeddedfile.count
pdf.embeddedfile.hexcode
pdf.xfa.count
pdf.xfa.hexcode
pdf.colors_gt_2_24.count
pdf.colors_gt_2_24.hexcode

Be careful if you are going to use this in an automated scenario where you don’t control the selection expression. This expression is evaluated in Python with the eval function, and there is no input validation.

 

Monday 20 October 2014

Update: PDFiD With Plugins Part 1

Filed under: My Software,PDF,Update — Didier Stevens @ 8:51

Almost from the beginning when I released PDFiD, people asked me for anti-virus like feature: that PDFiD would tell you if a PDF was malicious or not. Some people even patched PDFiD with a scoring feature.

But I didn’t want to develop an “anti-virus” for PDFs; PDFiD is a triage tool.

Now you can develop your own scoring system with plugins.

Plugins are loaded with option -p, like this:

20141020-102902

I provide 3 plugins: plugin_triage.py, plugin_nameobfuscation.py and plugin_embeddedfile.py. You can run more than one plugin by separating their names with a comma: pdfid.py -p plugin_triage,plugin_embeddedfile js.pdf

Or you can use an @-file: a text file with the names of the plugins you want to run.

To output the result as CSV file, use option -c, and to write the output to a file, use option -o. With option -m, you can provide a minimum score the plugin has to produce for its output to be displayed.

Plugins are Python classes, I’ll explain how to make your own in a later post.

plugin_triage.py produces a score of 1.0 when the PDF requires further analysis, and 0.0 if not.

plugin_nameobfuscation.py produces a score of 1.0 when name obfuscation is used in the PDF.

plugin_embeddedfile.py produces a score of 0.9 when an embedded file is present, and 1.0 when name obfuscation is also used.
pdfid_v0_2_1.zip (https)
MD5: 7463412536678B321276F8720F52DE81
SHA256: F1B4728DD2CE455B863B930E12C6DEC952CB95C0BB3D6924136A6E49ACA877C2

Tuesday 30 September 2014

Announcement: PDFiD Plugins

Filed under: Announcement,My Software,PDF — Didier Stevens @ 21:30

I have a new version of PDFiD. One with plugins and selections.

Here’s a preview:

20140930-231450

20140930-231637

Monday 29 September 2014

Update: XORSearch With Shellcode Detector

Filed under: My Software,Update — Didier Stevens @ 0:00

XORSearch allows you to search for strings and embedded PE-files brute-forcing different encodings. Now I added shellcode detection.

This new version of XORSearch integrates Frank Boldewin’s shellcode detector. In his Hack.lu 2009 presentation, Frank explains how he detects shellcode in Microsoft Office documents by searching for byte sequences often used in shellcode.

I integrated Frank’s methods in XORSearch, so that you can use it for any file type, not only Microsoft Office files.

20140928-135402

Frank was kind enough to give me his source code for the detection engine. However, I did not integrated is source code as-is. I developed my own engine that uses rules to detect shellcode artifacts. These rules are not hard-coded, but can be externalized, so that you can define your own rules.

Wildcard rule syntax

A wildcard rule is composed of 3 parts: a rule name, a score and a pattern. These are separated by a : character.

Example of a rule:

Find kernel32 base method 1bis:10:64A130000000

The name of this rule is “Find kernel32 base method 1bis”, it has a score of 10, and the pattern is 64A130000000. When XORSearch finds byte pattern 64A130000000, it will report it mentioning rule name “Find kernel32 base method 1bis” and add 10 to the total score. This byte pattern is the following assembly instruction:

MOV EAX, dword [fs:0x30]

This is an instruction often found in shellcode that looks for the base of kernel32.

When assembly instructions reference a register, the register is encoded as bits in the bytes that make up the instruction. For example, pop eax is just one byte: 58. pop ecx is 59, pop edx is 5a, … If you look at the bits of this instruction, they have the following value: 01011RRR. The last 3 bits (RRR) encode the register to use for the pop instruction.

To deal with this, my rule definition language supports wildcards. This is how you encode a pop reg instruction:

(B;01011???)

The B indicates that we want to define a byte using bits and wildcards. 0 and 1 are fixed bit values, and ? is the wildcard: the bit value can be 0 or 1. Thus the pattern (B;01011???) matches bytes 58, 59, 5A, 5B, 5C, 5D, 5E and 5F.

This wildcard allows us to encode patterns for shellcode instructions that use registers. For example , here is an often used set of instructions to determine the EIP with shellcode:

	call label
label:
	pop eax

This pattern is encoded for all possible registers with the following rule:

GetEIP method 1:10:E800000000(B;01011???)

Another instruction often found in shellcode is xor reg1, reg1, like xor eax, eax.

You could represent this with the following pattern:

31(B;11??????)

But this pattern matches more instructions than you want. It matches xor eax, eax, xor ecx, ecx, … but also xor eax, ecx, xor eax, edx, … You want this pattern to match the xor instruction for the same register, and not different registers. That is why you can use the following syntax:

31(B;11A??A??)

By using a letter like A, B, …, as a wildcard, you assign a variable name to the wildcard bit pattern. ??? matches 3 bits. A?? also matches 3 bits, and assigns the variable name A to these 3 bits. When you use this bit pattern again, you make sure that the pattern will only be matched if the bit pattern is identical. Pattern ?????? matches 6 bits regardless of their value. Pattern A??A?? also matches 6 bits, but the first 3 bits must have the same value as the last 3 bits.

Here is another example:

Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??)

This pattern matches the following set of assembly instructions:

	push 0x30
	pop reg1
	mov reg2, dword [fs:reg1]

By using bit pattern A?? for the register of the second instruction, and B??A?? for the registers of the third instruction, you make sure that the third instruction use the same register for indexing as the second instruction.

Up til now, we looked at sequential assembly instructions. But you can also have shellcode patterns with jumps, e.g. non-sequential instructions. Here is an example:

	jmp LABEL1
LABEL2:
	pop eax
	...
	...
LABEL1:
	call LABEL2

To enable to match assembly code patterns with jumps, I introduced the (J;*) pattern in my rule definitions. J stands for a jump, and * represent the numbers of bytes that make up the displacement of the jump instruction (normally 1 byte or 4 bytes). Here is the rule that encodes the above assembly code pattern:

GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)

Finally, Frank’s detector also looks for suspicious strings, like UrlDownloadToFile, WinExec, … You can define rules using a hex pattern to detect these strings, but to facilitate the encoding of these rules, I added the str= keyword, like this:

Suspicious strings:2:str=UrlDownloadToFile
Suspicious strings:2:str=WinExec

Using wildcard rules

To use these shellcode wildcard rules with XORSearch, you use options -w or -W. -w allows you to specify your own rule(s), -W uses the build-in rules.

With -w, you can specify your rule as the search argument, or together with option -f, you provide a text file with rules.

Example: XORSearch.exe -w olimpikge.xls “GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)”

With -W, you don’t have to provide the rules, XORSearch will use the build-in rules.

Example: XORSearch.exe -W olimpikge.xls

You can view the build-in rules with option -L:

Function prolog signature:10:558BEC83C4
Function prolog signature:10:558BEC81EC
Function prolog signature:10:558BECEB
Function prolog signature:10:558BECE8
Function prolog signature:10:558BECE9
Indirect function call tris:10:FFB7(B;????????)(B;????????)(B;????????)(B;????????)FF57(B;????????)
GetEIP method 4 FLDZ/FSTENV [esp-12]:10:D9EED97424F4(B;01011???)
GetEIP method 1:10:E800000000(B;01011???)
GetEIP method 2:10:EB(J;1)E8(J;4)(B;01011???)
GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)
GetEIP method 4:10:D9EE9BD97424F4(B;01011???)
Find kernel32 base method 1:10:648B(B;00???101)30000000
Find kernel32 base method 1bis:10:64A130000000
Find kernel32 base method 2:10:31(B;11A??A??)(B;10100A??)30648B(B;00B??A??)
Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??)
Structured exception handling :10:648B(B;00???101)00000000
Structured exception handling bis:10:64A100000000
API Hashing:10:AC84C07407C1CF0D01C7EBF481FF
API Hashing bis:10:AC84C07407C1CF0701C7EBF481FF
Indirect function call:10:FF75(B;A???????)FF55(B;A???????)
Indirect function call bis:10:FFB5(B;A???????)(B;B???????)(B;C???????)(B;D???????)FF95(B;A???????)(B;B???????)(B;C???????)(B;D???????)
OLE file magic number:10:D0CF11E0
Suspicious strings:2:str=UrlDownloadToFile
Suspicious strings:2:str=GetTempPath
Suspicious strings:2:str=GetWindowsDirectory
Suspicious strings:2:str=GetSystemDirectory
Suspicious strings:2:str=WinExec
Suspicious strings:2:str=ShellExecute
Suspicious strings:2:str=IsBadReadPtr
Suspicious strings:2:str=IsBadWritePtr
Suspicious strings:2:str=CreateFile
Suspicious strings:2:str=CloseHandle
Suspicious strings:2:str=ReadFile
Suspicious strings:2:str=WriteFile
Suspicious strings:2:str=SetFilePointer
Suspicious strings:2:str=VirtualAlloc
Suspicious strings:2:str=GetProcAddr
Suspicious strings:2:str=LoadLibrary

I derived these rules from the source code Frank gave me. Testing these rules on different benign and malicious files revealed 2 things: a couple of rules generated a lot of false positives, and brute-forcing the ROT encoding also generated a lot of false positives. So I removed these rules, and I added an option to disable encodings (option -d). For example, with option -d 3 I disable the brute-forcing of the ROT encoding (1: XOR 2: ROL 3: ROT 4: SHIFT 5: ADD).

When looking for shellcode, you want several rules to trigger. If just one or two rules trigger, they are likely false positives.
XORSearch_V1_11_0.zip (https)
MD5: 7313A198033C0A1F69B79F96894462C7
SHA256: 1700D037D7A9902108F3986D75A9BA250ACBD96E38CC43C5B4BC1FB90761B320

Tuesday 23 September 2014

Video: PDF Creation – Public Tools

Filed under: My Software,PDF — Didier Stevens @ 20:27

Have you subscribed to my new video blog: videos.didierstevens.com ?

If not, you missed my new video where I show my public tools to create PDFs.

Thursday 18 September 2014

FileScanner.exe Part 4

Filed under: My Software — Didier Stevens @ 0:00

Please read part 1, part 2 and part 3 for more info.

A few remarks for people having issues running my program.

Folder Release contains a 32-bit executable that requires the Visual C++ Redistributable Packages for Visual Studio 2013.

Folder Release CRT contains a 32-bit executable with embedded C runtime, it does not require the redistributable.

Folder x64 contains 64-bit executables.

I included a rule file as example, filescanner-analysis-01.txt:

#Comment
exhaustive
PK:start:str=PK
$META:icontent:str=MANIFEST.MF
JAR:and:PK $META
CLASS:start:CAFEBABE
MZ:start:4D5A
PDF:start:str=%PDF-
OLE:start:D0CF11E0
RAR:start:526172211A07
$ATTRIBUT:content:00417474726962757400
OLE-VBA:and:OLE $ATTRIBUT
CAB:start:str=MSCF
ARJ:start:EA60
JFIF:start:FFD8FFE0

FileScanner_V0_0_0_3.zip (https)
MD5: D9A7BA5874C10B10BF380D03E49C82A6
SHA256: C89FF7DBDB71A22E2A88C16ECD65E36619BD8EA39A77036404B6F4B1049D21E5

Wednesday 17 September 2014

FileScanner.exe Part 3

Filed under: My Software — Didier Stevens @ 0:00

FileScanner.exe is a new Windows tool I developed. Read part 1 and part 2 for more info.

20140915-175358

To let you choose the files filescanner will scan, you can provide the following arguments: filename, @filename, folder and ?f:.

Filename and folder are self-descriptive. When you pass argument @filename, filename is a textfile that contains filenames to scan. ?f: stands for all fixed drives on the machine, for example: C:\ D:\.

You can provide more than one argument. To scan the subfolders of a folder you provided, use option -s.

By default, FileScanner provides the following information for scanned files:

20140902 225258

With option -f, files are completely read and the following information is provided:

20140902-225858

You can have CSV output with option -v.

To write the output to a file, use option -o and provide a filename. Option -O also writes the output to a filename, this filename is automatically generated: FileScanner-HOSTNAME-DATE-TIME.csv. Option -c lets you specify a folder to where the output file is copied when FileScanner finishes. This can be a UNC share to centralize all reports when you run FileScanner on several machines in parallel.

Option -l follows links.

Use option -r to specify a single rule and -a or -A to specify a textfile with rules.

Tuesday 16 September 2014

FileScanner.exe Part 2

Filed under: My Software — Didier Stevens @ 0:00

My new FileScanner tool allows you to use rules to scan files. Here is how you define rules.

Rule syntax

If you provide rules to FileScanner, it will only report files that match one rule or several rules (unless you instruct it to report all scanned files). A rule has a name, a type and one or more conditions. These elements are separated by the : character (colon). A name can be any string, and it is best unique if you have several rules (but this is not enforced). If a name starts with a $ character (dollar), the rule is only tested if it is referred to by another rule. Valid rule types are:

  • md5
  • sizemd5
  • start
  • content
  • icontent
  • and

The md5 rule triggers if the file has the specified md5 hash. Example:

PSEXEC2:md5:AEEE996FD3484F28E5CD85FE26B6BDCD

The sizemd5 rule triggers if the file has the specified size and md5 hash. The size is tested first, and the md5 hash is only calculated when the size matches. This speeds up the scan process if you know the size. Example:

PSEXEC:sizemd5:381816:AEEE996FD3484F28E5CD85FE26B6BDCD

The start rule triggers if the content of the file starts with the specified bytes. You can specify these bytes with a hexadecimal sequence or with a string. When using a string, prefix it with keyword str=. This test is case-sensitive. Examples:

MZ:start:4D5A

PK:start:str=PK

The content rule triggers if the file contains the specified bytes. You can specify these bytes with a hexadecimal sequence or with a string. When using a string, prefix it with keyword str=. This test is case-sensitive. Examples:

META:content:4D414E49464553542E4D46

META:content:str=MANIFEST.MF

The icontent rule is identical to the content rule, except that it is not case-sensitive.

The and rule triggers if all specified rules do trigger. The specified rules are tested from left to right, and testing stops if a rule does not trigger. If a specified rule has a name that starts with $, it will also be tested. In the following example, the JAR rule triggers if the $PK and $META rules do trigger.

$PK:start:str=PK
$META:icontent:str=MANIFEST.MF
JAR:and:$PK $META

 

Defining rules

Rules can be defined in a text file. A single rule can be defined via a command-line option or via the executable filename.

A set of rules contained in a text file is passed to the FileScanner tool via command line options -a or -A. With option -a, only files that match one or several rules are analyzed and reported. With option -A, all files are reported. A rule-file can contain comments: lines with the # character as the first character are comments (and ignored). 2 directives can be set in a rule-file:

  • selectallfiles
  • exhaustive

The selectallfiles directive instructs FileScanner to report all files (even with option -a).

The exhaustive directive instructs FileScanner to test all rules defined in the text file. If this directive is not present, rule testing stops after the first rule matches.

Example of a rule-file:

exhaustive
PK:start:str=PK
$META:icontent:str=MANIFEST.MF
JAR:and:PK $META
CLASS:start:CAFEBABE
MZ:start:4D5A
PDF:start:str=%PDF-
OLE:start:D0CF11E0

Specifying a single rule can be done via option -r. Example:

filescanner.exe -sr PSEXEC:sizemd5:381816:AEEE996FD3484F28E5CD85FE26B6BDCD c:\

Finally, if you have to ask an inexperienced user to run filescanner on his machine, you can encode a rule in the filename and send him the program. Example:

filescanner-auto-rule-PSEXEC-sizemd5-381816-AEEE996FD3484F28E5CD85FE26B6BDCD.exe

Download

20140915-175358
FileScanner_V0_0_0_2.zip (https)
MD5: 9A89333C13DBB669A94226F57E5D919A
SHA256: 5F46312B06AE865957A36B95A4C2DDC41F20113B0E51B7F083A50929B38BD0F9

 

Sunday 14 September 2014

Update: SpiderMonkey

Filed under: My Software,Update — Didier Stevens @ 15:00

During my PDF training at 44CON I got the idea for a simple modification: now with document.write(), a third file is created. The file is write.bin.log and contains the pure UNICODE data, e.g. without 0xFFFE header.

To extract shellcode now, you no longer need to edit write.uc.log to remove the 0xFFFE header.

I also included binaries for Windows and Linux (compiled on CentOS 6.0) in the ZIP file.

js-1.7.0-mod-b.zip (https)
MD5: 85B369B5650D4C041D21E8574CF09B9A
SHA256: D3827DF7B2EA81EEE91181B2DE045320E1CFEC46EED33F7CD84CA63C3A36BC38

Wednesday 3 September 2014

Introducing Filescanner.exe

Filed under: My Software — Didier Stevens @ 0:17

Filescanner is a tool I started to develop almost 2 years ago.

Back then, I needed a stand-alone, single executable tool that would allow me to search for files based on their content. Filescanner is a Windows tool.

Without any options, the tool will report some properties of the scanned file:

20140902 225258

Remark that the first 4 bytes of the scanned file are reported.

Here are the options:

20140902-225711

Option -f does a full read of the file and calculates some properties like entropy, md5, …

20140902-225858

You can also output CSV with option -v and search through subfolders with option -s.

Rules can be defined to select specific files. For example, with option -r, I can specify a single rule that will be used to select files.

Here is a rule named EXE that triggers when the content of a file starts with MZ: EXE:start:str=MZ

20140902-230520

A single rule can be passed as a command-line argument or be encoded in the executable filename. If you require more than 1 rule, put them inside a text file to define a ruleset.

Options -a and -A specify the ruleset to use. Here is an example of a ruleset:

exhaustive
PK:start:str=PK
$META:icontent:str=MANIFEST.MF
JAR:and:PK $META
CLASS:start:CAFEBABE
MZ:start:4D5A
PDF:start:str=%PDF-
OLE:start:D0CF11E0

Rules can also be defined for MD5 hashes.

In a next post, I’ll explain in detail the rule syntax.

FileScanner_V0_0_0_1.zip (https)
MD5: 9EE883A4E28A6D0649F6D7787BD76ED4
SHA256: 5AA71E6F4FED8E45A22B49FD9A0417933F7218AF9300FDEF24FEF696CF012F61

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 236 other followers