pecheck.py is a wrapper for pefile, ant this update has a couple of new features:
- accept input from stdin (for pipes)
- load PeID userdb.txt by default from same directory as pecheck.py
- extra entry point info
pecheck.py is a wrapper for pefile, ant this update has a couple of new features:
Sometimes I want to check a malware sample with XORSearch, but I can’t because my AV will delete it. My solution is to work with a hexdump of the file.
Option -x allows XORSearch to work with a hexdump.
The second feature in this new version of PDFiD is selection. With this, you can select PDFs using criteria you provide.
Be careful if you are going to use this in an automated scenario where you don’t control the selection expression. This expression is evaluated in Python with the eval function, and there is no input validation.
Almost from the beginning when I released PDFiD, people asked me for anti-virus like feature: that PDFiD would tell you if a PDF was malicious or not. Some people even patched PDFiD with a scoring feature.
But I didn’t want to develop an “anti-virus” for PDFs; PDFiD is a triage tool.
Now you can develop your own scoring system with plugins.
Plugins are loaded with option -p, like this:
I provide 3 plugins: plugin_triage.py, plugin_nameobfuscation.py and plugin_embeddedfile.py. You can run more than one plugin by separating their names with a comma: pdfid.py -p plugin_triage,plugin_embeddedfile js.pdf
Or you can use an @-file: a text file with the names of the plugins you want to run.
To output the result as CSV file, use option -c, and to write the output to a file, use option -o. With option -m, you can provide a minimum score the plugin has to produce for its output to be displayed.
Plugins are Python classes, I’ll explain how to make your own in a later post.
plugin_triage.py produces a score of 1.0 when the PDF requires further analysis, and 0.0 if not.
plugin_nameobfuscation.py produces a score of 1.0 when name obfuscation is used in the PDF.
plugin_embeddedfile.py produces a score of 0.9 when an embedded file is present, and 1.0 when name obfuscation is also used.
XORSearch allows you to search for strings and embedded PE-files brute-forcing different encodings. Now I added shellcode detection.
This new version of XORSearch integrates Frank Boldewin’s shellcode detector. In his Hack.lu 2009 presentation, Frank explains how he detects shellcode in Microsoft Office documents by searching for byte sequences often used in shellcode.
I integrated Frank’s methods in XORSearch, so that you can use it for any file type, not only Microsoft Office files.
Frank was kind enough to give me his source code for the detection engine. However, I did not integrated is source code as-is. I developed my own engine that uses rules to detect shellcode artifacts. These rules are not hard-coded, but can be externalized, so that you can define your own rules.
Wildcard rule syntax
A wildcard rule is composed of 3 parts: a rule name, a score and a pattern. These are separated by a : character.
Example of a rule:
Find kernel32 base method 1bis:10:64A130000000
The name of this rule is “Find kernel32 base method 1bis”, it has a score of 10, and the pattern is 64A130000000. When XORSearch finds byte pattern 64A130000000, it will report it mentioning rule name “Find kernel32 base method 1bis” and add 10 to the total score. This byte pattern is the following assembly instruction:
MOV EAX, dword [fs:0x30]
This is an instruction often found in shellcode that looks for the base of kernel32.
When assembly instructions reference a register, the register is encoded as bits in the bytes that make up the instruction. For example, pop eax is just one byte: 58. pop ecx is 59, pop edx is 5a, … If you look at the bits of this instruction, they have the following value: 01011RRR. The last 3 bits (RRR) encode the register to use for the pop instruction.
To deal with this, my rule definition language supports wildcards. This is how you encode a pop reg instruction:
The B indicates that we want to define a byte using bits and wildcards. 0 and 1 are fixed bit values, and ? is the wildcard: the bit value can be 0 or 1. Thus the pattern (B;01011???) matches bytes 58, 59, 5A, 5B, 5C, 5D, 5E and 5F.
This wildcard allows us to encode patterns for shellcode instructions that use registers. For example , here is an often used set of instructions to determine the EIP with shellcode:
call label label: pop eax
This pattern is encoded for all possible registers with the following rule:
GetEIP method 1:10:E800000000(B;01011???)
Another instruction often found in shellcode is xor reg1, reg1, like xor eax, eax.
You could represent this with the following pattern:
But this pattern matches more instructions than you want. It matches xor eax, eax, xor ecx, ecx, … but also xor eax, ecx, xor eax, edx, … You want this pattern to match the xor instruction for the same register, and not different registers. That is why you can use the following syntax:
By using a letter like A, B, …, as a wildcard, you assign a variable name to the wildcard bit pattern. ??? matches 3 bits. A?? also matches 3 bits, and assigns the variable name A to these 3 bits. When you use this bit pattern again, you make sure that the pattern will only be matched if the bit pattern is identical. Pattern ?????? matches 6 bits regardless of their value. Pattern A??A?? also matches 6 bits, but the first 3 bits must have the same value as the last 3 bits.
Here is another example:
Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??)
This pattern matches the following set of assembly instructions:
push 0x30 pop reg1 mov reg2, dword [fs:reg1]
By using bit pattern A?? for the register of the second instruction, and B??A?? for the registers of the third instruction, you make sure that the third instruction use the same register for indexing as the second instruction.
Up til now, we looked at sequential assembly instructions. But you can also have shellcode patterns with jumps, e.g. non-sequential instructions. Here is an example:
jmp LABEL1 LABEL2: pop eax ... ... LABEL1: call LABEL2
To enable to match assembly code patterns with jumps, I introduced the (J;*) pattern in my rule definitions. J stands for a jump, and * represent the numbers of bytes that make up the displacement of the jump instruction (normally 1 byte or 4 bytes). Here is the rule that encodes the above assembly code pattern:
GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)
Finally, Frank’s detector also looks for suspicious strings, like UrlDownloadToFile, WinExec, … You can define rules using a hex pattern to detect these strings, but to facilitate the encoding of these rules, I added the str= keyword, like this:
Suspicious strings:2:str=UrlDownloadToFile Suspicious strings:2:str=WinExec
Using wildcard rules
To use these shellcode wildcard rules with XORSearch, you use options -w or -W. -w allows you to specify your own rule(s), -W uses the build-in rules.
With -w, you can specify your rule as the search argument, or together with option -f, you provide a text file with rules.
Example: XORSearch.exe -w olimpikge.xls “GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)”
With -W, you don’t have to provide the rules, XORSearch will use the build-in rules.
Example: XORSearch.exe -W olimpikge.xls
You can view the build-in rules with option -L:
Function prolog signature:10:558BEC83C4 Function prolog signature:10:558BEC81EC Function prolog signature:10:558BECEB Function prolog signature:10:558BECE8 Function prolog signature:10:558BECE9 Indirect function call tris:10:FFB7(B;????????)(B;????????)(B;????????)(B;????????)FF57(B;????????) GetEIP method 4 FLDZ/FSTENV [esp-12]:10:D9EED97424F4(B;01011???) GetEIP method 1:10:E800000000(B;01011???) GetEIP method 2:10:EB(J;1)E8(J;4)(B;01011???) GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???) GetEIP method 4:10:D9EE9BD97424F4(B;01011???) Find kernel32 base method 1:10:648B(B;00???101)30000000 Find kernel32 base method 1bis:10:64A130000000 Find kernel32 base method 2:10:31(B;11A??A??)(B;10100A??)30648B(B;00B??A??) Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??) Structured exception handling :10:648B(B;00???101)00000000 Structured exception handling bis:10:64A100000000 API Hashing:10:AC84C07407C1CF0D01C7EBF481FF API Hashing bis:10:AC84C07407C1CF0701C7EBF481FF Indirect function call:10:FF75(B;A???????)FF55(B;A???????) Indirect function call bis:10:FFB5(B;A???????)(B;B???????)(B;C???????)(B;D???????)FF95(B;A???????)(B;B???????)(B;C???????)(B;D???????) OLE file magic number:10:D0CF11E0 Suspicious strings:2:str=UrlDownloadToFile Suspicious strings:2:str=GetTempPath Suspicious strings:2:str=GetWindowsDirectory Suspicious strings:2:str=GetSystemDirectory Suspicious strings:2:str=WinExec Suspicious strings:2:str=ShellExecute Suspicious strings:2:str=IsBadReadPtr Suspicious strings:2:str=IsBadWritePtr Suspicious strings:2:str=CreateFile Suspicious strings:2:str=CloseHandle Suspicious strings:2:str=ReadFile Suspicious strings:2:str=WriteFile Suspicious strings:2:str=SetFilePointer Suspicious strings:2:str=VirtualAlloc Suspicious strings:2:str=GetProcAddr Suspicious strings:2:str=LoadLibrary
I derived these rules from the source code Frank gave me. Testing these rules on different benign and malicious files revealed 2 things: a couple of rules generated a lot of false positives, and brute-forcing the ROT encoding also generated a lot of false positives. So I removed these rules, and I added an option to disable encodings (option -d). For example, with option -d 3 I disable the brute-forcing of the ROT encoding (1: XOR 2: ROL 3: ROT 4: SHIFT 5: ADD).
When looking for shellcode, you want several rules to trigger. If just one or two rules trigger, they are likely false positives.
During my PDF training at 44CON I got the idea for a simple modification: now with document.write(), a third file is created. The file is write.bin.log and contains the pure UNICODE data, e.g. without 0xFFFE header.
To extract shellcode now, you no longer need to edit write.uc.log to remove the 0xFFFE header.
I also included binaries for Windows and Linux (compiled on CentOS 6.0) in the ZIP file.
I think there’s more interest for my program to calculate the SSH fingerprint for Cisco IOS since Snowden started with his revelations.
I fixed a bug with 2048 bit (and more) keys.
Looking at this program from 2007, I though: my Python coding style has changed since then, I need to rewrite this.
So here is the new version. It’s backward compatible with the old version (same arguments), but it offers more flexibility, like input/output redirection, allowing it to be used in pipes.
And from now on, I’m going to try to add a man page to all new Python program releases. It’s embedded in the source code, and you view it like this: translate.py –man
kurt wismer pointed me to this post on pastebin after he read my Stoned Bitcoin blogpost. The author of this pastebin post works out a method to spam the Bitcoin blockchain to cause anti-virus (false) positives.
I scanned through all the Bitcoin transactions (until 24/06/2014) for the addresses listed in this pastebin post (the addresses represent antivirus signatures for 400+ malwares).
All these “malicious” Bitcoin addresses, designed to generate anti-virus false positives, have been exclusively used in the 8 Bitcoin transactions I mentioned in my previous post.
The pastebin entry was posted on 2014/04/02 19:01:08 UTC.
And here are the 8 transactions with the UTC timestamp of the block in which they appear:
Block: 2014/04/03 23:12:48
Block: 2014/04/04 01:10:45
Block: 2014/04/04 01:43:25
Block: 2014/04/04 02:58:13
Block: 2014/04/04 04:32:24
Block: 2014/04/04 04:32:24
Block: 2014/04/04 09:36:29
Block: 2014/04/04 09:36:29
So it took a bit more than 24 hours before someone spammed the Bitcoin blockchain with these transactions designed to trigger false positives.
Someone mentioned on a forum that he found a picture with an embedded, XORed executable. You can easily identify such embedded executables by xorsearching for the string “This program must be run under Win32”. But if the author or compiler modifies this DOS-stub string, you will not find it.
That’s how I got the idea to add an option to search for PE-files: search for string MZ, read the offset to the IMAGE_NT_HEADER structure (e_lfanew), and check if it starts with string PE.
Example: XORSearch.exe -p test.jpg
Found XOR A2 position 00005D1D: 000000E8 ........!..L.!This program cannot be r Found XOR A2 position 0001221D: 00000108 ........!..L.!This program cannot be r
We found 2 embedded executables in test.jpg (XOR key A2). Remark we didn’t provide a search string, only option -p.
XORSearch also reports the value of e_lfanew and the string found in the DOS-stub. This allows you to inspect the results for false positives.
This can also be used on unencoded files, like this installation file:
XORSearch.exe -p c8400.msi Found XOR 00 position 00236400: 000000E8 ........!..L.!This program cannot be r Found XOR 00 position 00286000: 00000100 ........!..L.!This program cannot be r Found XOR 00 position 00346800: 000000F8 ........!..L.!This program cannot be r Found XOR 00 position 003A7200: 00000080 ........!..L.!This program cannot be r Found XOR 00 position 003AD200: 00000080 ........!..L.!This program cannot be r Found XOR 00 position 004B4800: 00000108 ........!..L.!This program cannot be r Found XOR 00 position 004DE600: 000000F8 ........!..L.!This program cannot be r Found XOR 00 position 004FE200: 000000E0 ........!..L.!This program cannot be r Found XOR 00 position 00520C00: 000000E0 ........!..L.!This program cannot be r Found XOR 00 position 00542000: 000000E0 ........!..L.!This program cannot be r Found XOR 00 position 00562400: 00000100 ........!..L.!This program cannot be r Found XOR 00 position 0058F800: 000000E0 ........!..L.!This program cannot be r
Finally, I added option -e (exclude). This excludes a particular byte-value from encoding. If you suspect a file is XOR encoded, but that byte 0x00 is not encoded, you use option -e 0x00.