pecheck.py is a wrapper for pefile, ant this update has a couple of new features:
- accept input from stdin (for pipes)
- load PeID userdb.txt by default from same directory as pecheck.py
- extra entry point info
pecheck.py is a wrapper for pefile, ant this update has a couple of new features:
Sometimes I want to check a malware sample with XORSearch, but I can’t because my AV will delete it. My solution is to work with a hexdump of the file.
Option -x allows XORSearch to work with a hexdump.
The second feature in this new version of PDFiD is selection. With this, you can select PDFs using criteria you provide.
Be careful if you are going to use this in an automated scenario where you don’t control the selection expression. This expression is evaluated in Python with the eval function, and there is no input validation.
Almost from the beginning when I released PDFiD, people asked me for anti-virus like feature: that PDFiD would tell you if a PDF was malicious or not. Some people even patched PDFiD with a scoring feature.
But I didn’t want to develop an “anti-virus” for PDFs; PDFiD is a triage tool.
Now you can develop your own scoring system with plugins.
Plugins are loaded with option -p, like this:
I provide 3 plugins: plugin_triage.py, plugin_nameobfuscation.py and plugin_embeddedfile.py. You can run more than one plugin by separating their names with a comma: pdfid.py -p plugin_triage,plugin_embeddedfile js.pdf
Or you can use an @-file: a text file with the names of the plugins you want to run.
To output the result as CSV file, use option -c, and to write the output to a file, use option -o. With option -m, you can provide a minimum score the plugin has to produce for its output to be displayed.
Plugins are Python classes, I’ll explain how to make your own in a later post.
plugin_triage.py produces a score of 1.0 when the PDF requires further analysis, and 0.0 if not.
plugin_nameobfuscation.py produces a score of 1.0 when name obfuscation is used in the PDF.
plugin_embeddedfile.py produces a score of 0.9 when an embedded file is present, and 1.0 when name obfuscation is also used.
I have a new version of PDFiD. One with plugins and selections.
Here’s a preview:
XORSearch allows you to search for strings and embedded PE-files brute-forcing different encodings. Now I added shellcode detection.
This new version of XORSearch integrates Frank Boldewin’s shellcode detector. In his Hack.lu 2009 presentation, Frank explains how he detects shellcode in Microsoft Office documents by searching for byte sequences often used in shellcode.
I integrated Frank’s methods in XORSearch, so that you can use it for any file type, not only Microsoft Office files.
Frank was kind enough to give me his source code for the detection engine. However, I did not integrated is source code as-is. I developed my own engine that uses rules to detect shellcode artifacts. These rules are not hard-coded, but can be externalized, so that you can define your own rules.
Wildcard rule syntax
A wildcard rule is composed of 3 parts: a rule name, a score and a pattern. These are separated by a : character.
Example of a rule:
Find kernel32 base method 1bis:10:64A130000000
The name of this rule is “Find kernel32 base method 1bis”, it has a score of 10, and the pattern is 64A130000000. When XORSearch finds byte pattern 64A130000000, it will report it mentioning rule name “Find kernel32 base method 1bis” and add 10 to the total score. This byte pattern is the following assembly instruction:
MOV EAX, dword [fs:0x30]
This is an instruction often found in shellcode that looks for the base of kernel32.
When assembly instructions reference a register, the register is encoded as bits in the bytes that make up the instruction. For example, pop eax is just one byte: 58. pop ecx is 59, pop edx is 5a, … If you look at the bits of this instruction, they have the following value: 01011RRR. The last 3 bits (RRR) encode the register to use for the pop instruction.
To deal with this, my rule definition language supports wildcards. This is how you encode a pop reg instruction:
The B indicates that we want to define a byte using bits and wildcards. 0 and 1 are fixed bit values, and ? is the wildcard: the bit value can be 0 or 1. Thus the pattern (B;01011???) matches bytes 58, 59, 5A, 5B, 5C, 5D, 5E and 5F.
This wildcard allows us to encode patterns for shellcode instructions that use registers. For example , here is an often used set of instructions to determine the EIP with shellcode:
call label label: pop eax
This pattern is encoded for all possible registers with the following rule:
GetEIP method 1:10:E800000000(B;01011???)
Another instruction often found in shellcode is xor reg1, reg1, like xor eax, eax.
You could represent this with the following pattern:
But this pattern matches more instructions than you want. It matches xor eax, eax, xor ecx, ecx, … but also xor eax, ecx, xor eax, edx, … You want this pattern to match the xor instruction for the same register, and not different registers. That is why you can use the following syntax:
By using a letter like A, B, …, as a wildcard, you assign a variable name to the wildcard bit pattern. ??? matches 3 bits. A?? also matches 3 bits, and assigns the variable name A to these 3 bits. When you use this bit pattern again, you make sure that the pattern will only be matched if the bit pattern is identical. Pattern ?????? matches 6 bits regardless of their value. Pattern A??A?? also matches 6 bits, but the first 3 bits must have the same value as the last 3 bits.
Here is another example:
Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??)
This pattern matches the following set of assembly instructions:
push 0x30 pop reg1 mov reg2, dword [fs:reg1]
By using bit pattern A?? for the register of the second instruction, and B??A?? for the registers of the third instruction, you make sure that the third instruction use the same register for indexing as the second instruction.
Up til now, we looked at sequential assembly instructions. But you can also have shellcode patterns with jumps, e.g. non-sequential instructions. Here is an example:
jmp LABEL1 LABEL2: pop eax ... ... LABEL1: call LABEL2
To enable to match assembly code patterns with jumps, I introduced the (J;*) pattern in my rule definitions. J stands for a jump, and * represent the numbers of bytes that make up the displacement of the jump instruction (normally 1 byte or 4 bytes). Here is the rule that encodes the above assembly code pattern:
GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)
Finally, Frank’s detector also looks for suspicious strings, like UrlDownloadToFile, WinExec, … You can define rules using a hex pattern to detect these strings, but to facilitate the encoding of these rules, I added the str= keyword, like this:
Suspicious strings:2:str=UrlDownloadToFile Suspicious strings:2:str=WinExec
Using wildcard rules
To use these shellcode wildcard rules with XORSearch, you use options -w or -W. -w allows you to specify your own rule(s), -W uses the build-in rules.
With -w, you can specify your rule as the search argument, or together with option -f, you provide a text file with rules.
Example: XORSearch.exe -w olimpikge.xls “GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)”
With -W, you don’t have to provide the rules, XORSearch will use the build-in rules.
Example: XORSearch.exe -W olimpikge.xls
You can view the build-in rules with option -L:
Function prolog signature:10:558BEC83C4 Function prolog signature:10:558BEC81EC Function prolog signature:10:558BECEB Function prolog signature:10:558BECE8 Function prolog signature:10:558BECE9 Indirect function call tris:10:FFB7(B;????????)(B;????????)(B;????????)(B;????????)FF57(B;????????) GetEIP method 4 FLDZ/FSTENV [esp-12]:10:D9EED97424F4(B;01011???) GetEIP method 1:10:E800000000(B;01011???) GetEIP method 2:10:EB(J;1)E8(J;4)(B;01011???) GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???) GetEIP method 4:10:D9EE9BD97424F4(B;01011???) Find kernel32 base method 1:10:648B(B;00???101)30000000 Find kernel32 base method 1bis:10:64A130000000 Find kernel32 base method 2:10:31(B;11A??A??)(B;10100A??)30648B(B;00B??A??) Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??) Structured exception handling :10:648B(B;00???101)00000000 Structured exception handling bis:10:64A100000000 API Hashing:10:AC84C07407C1CF0D01C7EBF481FF API Hashing bis:10:AC84C07407C1CF0701C7EBF481FF Indirect function call:10:FF75(B;A???????)FF55(B;A???????) Indirect function call bis:10:FFB5(B;A???????)(B;B???????)(B;C???????)(B;D???????)FF95(B;A???????)(B;B???????)(B;C???????)(B;D???????) OLE file magic number:10:D0CF11E0 Suspicious strings:2:str=UrlDownloadToFile Suspicious strings:2:str=GetTempPath Suspicious strings:2:str=GetWindowsDirectory Suspicious strings:2:str=GetSystemDirectory Suspicious strings:2:str=WinExec Suspicious strings:2:str=ShellExecute Suspicious strings:2:str=IsBadReadPtr Suspicious strings:2:str=IsBadWritePtr Suspicious strings:2:str=CreateFile Suspicious strings:2:str=CloseHandle Suspicious strings:2:str=ReadFile Suspicious strings:2:str=WriteFile Suspicious strings:2:str=SetFilePointer Suspicious strings:2:str=VirtualAlloc Suspicious strings:2:str=GetProcAddr Suspicious strings:2:str=LoadLibrary
I derived these rules from the source code Frank gave me. Testing these rules on different benign and malicious files revealed 2 things: a couple of rules generated a lot of false positives, and brute-forcing the ROT encoding also generated a lot of false positives. So I removed these rules, and I added an option to disable encodings (option -d). For example, with option -d 3 I disable the brute-forcing of the ROT encoding (1: XOR 2: ROL 3: ROT 4: SHIFT 5: ADD).
When looking for shellcode, you want several rules to trigger. If just one or two rules trigger, they are likely false positives.
Have you subscribed to my new video blog: videos.didierstevens.com ?
If not, you missed my new video where I show my public tools to create PDFs.
A few remarks for people having issues running my program.
Folder Release contains a 32-bit executable that requires the Visual C++ Redistributable Packages for Visual Studio 2013.
Folder Release CRT contains a 32-bit executable with embedded C runtime, it does not require the redistributable.
Folder x64 contains 64-bit executables.
I included a rule file as example, filescanner-analysis-01.txt:
#Comment exhaustive PK:start:str=PK $META:icontent:str=MANIFEST.MF JAR:and:PK $META CLASS:start:CAFEBABE MZ:start:4D5A PDF:start:str=%PDF- OLE:start:D0CF11E0 RAR:start:526172211A07 $ATTRIBUT:content:00417474726962757400 OLE-VBA:and:OLE $ATTRIBUT CAB:start:str=MSCF ARJ:start:EA60 JFIF:start:FFD8FFE0
To let you choose the files filescanner will scan, you can provide the following arguments: filename, @filename, folder and ?f:.
Filename and folder are self-descriptive. When you pass argument @filename, filename is a textfile that contains filenames to scan. ?f: stands for all fixed drives on the machine, for example: C:\ D:\.
You can provide more than one argument. To scan the subfolders of a folder you provided, use option -s.
By default, FileScanner provides the following information for scanned files:
With option -f, files are completely read and the following information is provided:
You can have CSV output with option -v.
To write the output to a file, use option -o and provide a filename. Option -O also writes the output to a filename, this filename is automatically generated: FileScanner-HOSTNAME-DATE-TIME.csv. Option -c lets you specify a folder to where the output file is copied when FileScanner finishes. This can be a UNC share to centralize all reports when you run FileScanner on several machines in parallel.
Option -l follows links.
Use option -r to specify a single rule and -a or -A to specify a textfile with rules.
My new FileScanner tool allows you to use rules to scan files. Here is how you define rules.
If you provide rules to FileScanner, it will only report files that match one rule or several rules (unless you instruct it to report all scanned files). A rule has a name, a type and one or more conditions. These elements are separated by the : character (colon). A name can be any string, and it is best unique if you have several rules (but this is not enforced). If a name starts with a $ character (dollar), the rule is only tested if it is referred to by another rule. Valid rule types are:
The md5 rule triggers if the file has the specified md5 hash. Example:
The sizemd5 rule triggers if the file has the specified size and md5 hash. The size is tested first, and the md5 hash is only calculated when the size matches. This speeds up the scan process if you know the size. Example:
The start rule triggers if the content of the file starts with the specified bytes. You can specify these bytes with a hexadecimal sequence or with a string. When using a string, prefix it with keyword str=. This test is case-sensitive. Examples:
The content rule triggers if the file contains the specified bytes. You can specify these bytes with a hexadecimal sequence or with a string. When using a string, prefix it with keyword str=. This test is case-sensitive. Examples:
The icontent rule is identical to the content rule, except that it is not case-sensitive.
The and rule triggers if all specified rules do trigger. The specified rules are tested from left to right, and testing stops if a rule does not trigger. If a specified rule has a name that starts with $, it will also be tested. In the following example, the JAR rule triggers if the $PK and $META rules do trigger.
$PK:start:str=PK $META:icontent:str=MANIFEST.MF JAR:and:$PK $META
Rules can be defined in a text file. A single rule can be defined via a command-line option or via the executable filename.
A set of rules contained in a text file is passed to the FileScanner tool via command line options -a or -A. With option -a, only files that match one or several rules are analyzed and reported. With option -A, all files are reported. A rule-file can contain comments: lines with the # character as the first character are comments (and ignored). 2 directives can be set in a rule-file:
The selectallfiles directive instructs FileScanner to report all files (even with option -a).
The exhaustive directive instructs FileScanner to test all rules defined in the text file. If this directive is not present, rule testing stops after the first rule matches.
Example of a rule-file:
exhaustive PK:start:str=PK $META:icontent:str=MANIFEST.MF JAR:and:PK $META CLASS:start:CAFEBABE MZ:start:4D5A PDF:start:str=%PDF- OLE:start:D0CF11E0
Specifying a single rule can be done via option -r. Example:
filescanner.exe -sr PSEXEC:sizemd5:381816:AEEE996FD3484F28E5CD85FE26B6BDCD c:\
Finally, if you have to ask an inexperienced user to run filescanner on his machine, you can encode a rule in the filename and send him the program. Example: