XORSearch allows you to search for strings and embedded PE-files brute-forcing different encodings. Now I added shellcode detection.
This new version of XORSearch integrates Frank Boldewin’s shellcode detector. In his Hack.lu 2009 presentation, Frank explains how he detects shellcode in Microsoft Office documents by searching for byte sequences often used in shellcode.
I integrated Frank’s methods in XORSearch, so that you can use it for any file type, not only Microsoft Office files.
Frank was kind enough to give me his source code for the detection engine. However, I did not integrated is source code as-is. I developed my own engine that uses rules to detect shellcode artifacts. These rules are not hard-coded, but can be externalized, so that you can define your own rules.
Wildcard rule syntax
A wildcard rule is composed of 3 parts: a rule name, a score and a pattern. These are separated by a : character.
Example of a rule:
Find kernel32 base method 1bis:10:64A130000000
The name of this rule is “Find kernel32 base method 1bis”, it has a score of 10, and the pattern is 64A130000000. When XORSearch finds byte pattern 64A130000000, it will report it mentioning rule name “Find kernel32 base method 1bis” and add 10 to the total score. This byte pattern is the following assembly instruction:
MOV EAX, dword [fs:0x30]
This is an instruction often found in shellcode that looks for the base of kernel32.
When assembly instructions reference a register, the register is encoded as bits in the bytes that make up the instruction. For example, pop eax is just one byte: 58. pop ecx is 59, pop edx is 5a, … If you look at the bits of this instruction, they have the following value: 01011RRR. The last 3 bits (RRR) encode the register to use for the pop instruction.
To deal with this, my rule definition language supports wildcards. This is how you encode a pop reg instruction:
The B indicates that we want to define a byte using bits and wildcards. 0 and 1 are fixed bit values, and ? is the wildcard: the bit value can be 0 or 1. Thus the pattern (B;01011???) matches bytes 58, 59, 5A, 5B, 5C, 5D, 5E and 5F.
This wildcard allows us to encode patterns for shellcode instructions that use registers. For example , here is an often used set of instructions to determine the EIP with shellcode:
call label label: pop eax
This pattern is encoded for all possible registers with the following rule:
GetEIP method 1:10:E800000000(B;01011???)
Another instruction often found in shellcode is xor reg1, reg1, like xor eax, eax.
You could represent this with the following pattern:
But this pattern matches more instructions than you want. It matches xor eax, eax, xor ecx, ecx, … but also xor eax, ecx, xor eax, edx, … You want this pattern to match the xor instruction for the same register, and not different registers. That is why you can use the following syntax:
By using a letter like A, B, …, as a wildcard, you assign a variable name to the wildcard bit pattern. ??? matches 3 bits. A?? also matches 3 bits, and assigns the variable name A to these 3 bits. When you use this bit pattern again, you make sure that the pattern will only be matched if the bit pattern is identical. Pattern ?????? matches 6 bits regardless of their value. Pattern A??A?? also matches 6 bits, but the first 3 bits must have the same value as the last 3 bits.
Here is another example:
Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??)
This pattern matches the following set of assembly instructions:
push 0x30 pop reg1 mov reg2, dword [fs:reg1]
By using bit pattern A?? for the register of the second instruction, and B??A?? for the registers of the third instruction, you make sure that the third instruction use the same register for indexing as the second instruction.
Up til now, we looked at sequential assembly instructions. But you can also have shellcode patterns with jumps, e.g. non-sequential instructions. Here is an example:
jmp LABEL1 LABEL2: pop eax ... ... LABEL1: call LABEL2
To enable to match assembly code patterns with jumps, I introduced the (J;*) pattern in my rule definitions. J stands for a jump, and * represent the numbers of bytes that make up the displacement of the jump instruction (normally 1 byte or 4 bytes). Here is the rule that encodes the above assembly code pattern:
GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)
Finally, Frank’s detector also looks for suspicious strings, like UrlDownloadToFile, WinExec, … You can define rules using a hex pattern to detect these strings, but to facilitate the encoding of these rules, I added the str= keyword, like this:
Suspicious strings:2:str=UrlDownloadToFile Suspicious strings:2:str=WinExec
Using wildcard rules
To use these shellcode wildcard rules with XORSearch, you use options -w or -W. -w allows you to specify your own rule(s), -W uses the build-in rules.
With -w, you can specify your rule as the search argument, or together with option -f, you provide a text file with rules.
Example: XORSearch.exe -w olimpikge.xls “GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)”
With -W, you don’t have to provide the rules, XORSearch will use the build-in rules.
Example: XORSearch.exe -W olimpikge.xls
You can view the build-in rules with option -L:
Function prolog signature:10:558BEC83C4 Function prolog signature:10:558BEC81EC Function prolog signature:10:558BECEB Function prolog signature:10:558BECE8 Function prolog signature:10:558BECE9 Indirect function call tris:10:FFB7(B;????????)(B;????????)(B;????????)(B;????????)FF57(B;????????) GetEIP method 4 FLDZ/FSTENV [esp-12]:10:D9EED97424F4(B;01011???) GetEIP method 1:10:E800000000(B;01011???) GetEIP method 2:10:EB(J;1)E8(J;4)(B;01011???) GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???) GetEIP method 4:10:D9EE9BD97424F4(B;01011???) Find kernel32 base method 1:10:648B(B;00???101)30000000 Find kernel32 base method 1bis:10:64A130000000 Find kernel32 base method 2:10:31(B;11A??A??)(B;10100A??)30648B(B;00B??A??) Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??) Structured exception handling :10:648B(B;00???101)00000000 Structured exception handling bis:10:64A100000000 API Hashing:10:AC84C07407C1CF0D01C7EBF481FF API Hashing bis:10:AC84C07407C1CF0701C7EBF481FF Indirect function call:10:FF75(B;A???????)FF55(B;A???????) Indirect function call bis:10:FFB5(B;A???????)(B;B???????)(B;C???????)(B;D???????)FF95(B;A???????)(B;B???????)(B;C???????)(B;D???????) OLE file magic number:10:D0CF11E0 Suspicious strings:2:str=UrlDownloadToFile Suspicious strings:2:str=GetTempPath Suspicious strings:2:str=GetWindowsDirectory Suspicious strings:2:str=GetSystemDirectory Suspicious strings:2:str=WinExec Suspicious strings:2:str=ShellExecute Suspicious strings:2:str=IsBadReadPtr Suspicious strings:2:str=IsBadWritePtr Suspicious strings:2:str=CreateFile Suspicious strings:2:str=CloseHandle Suspicious strings:2:str=ReadFile Suspicious strings:2:str=WriteFile Suspicious strings:2:str=SetFilePointer Suspicious strings:2:str=VirtualAlloc Suspicious strings:2:str=GetProcAddr Suspicious strings:2:str=LoadLibrary
I derived these rules from the source code Frank gave me. Testing these rules on different benign and malicious files revealed 2 things: a couple of rules generated a lot of false positives, and brute-forcing the ROT encoding also generated a lot of false positives. So I removed these rules, and I added an option to disable encodings (option -d). For example, with option -d 3 I disable the brute-forcing of the ROT encoding (1: XOR 2: ROL 3: ROT 4: SHIFT 5: ADD).
When looking for shellcode, you want several rules to trigger. If just one or two rules trigger, they are likely false positives.