Didier Stevens

Monday 29 September 2014

Update: XORSearch With Shellcode Detector

Filed under: My Software,Update — Didier Stevens @ 0:00

XORSearch allows you to search for strings and embedded PE-files brute-forcing different encodings. Now I added shellcode detection.

This new version of XORSearch integrates Frank Boldewin’s shellcode detector. In his Hack.lu 2009 presentation, Frank explains how he detects shellcode in Microsoft Office documents by searching for byte sequences often used in shellcode.

I integrated Frank’s methods in XORSearch, so that you can use it for any file type, not only Microsoft Office files.


Frank was kind enough to give me his source code for the detection engine. However, I did not integrated is source code as-is. I developed my own engine that uses rules to detect shellcode artifacts. These rules are not hard-coded, but can be externalized, so that you can define your own rules.

Wildcard rule syntax

A wildcard rule is composed of 3 parts: a rule name, a score and a pattern. These are separated by a : character.

Example of a rule:

Find kernel32 base method 1bis:10:64A130000000

The name of this rule is “Find kernel32 base method 1bis”, it has a score of 10, and the pattern is 64A130000000. When XORSearch finds byte pattern 64A130000000, it will report it mentioning rule name “Find kernel32 base method 1bis” and add 10 to the total score. This byte pattern is the following assembly instruction:

MOV EAX, dword [fs:0x30]

This is an instruction often found in shellcode that looks for the base of kernel32.

When assembly instructions reference a register, the register is encoded as bits in the bytes that make up the instruction. For example, pop eax is just one byte: 58. pop ecx is 59, pop edx is 5a, … If you look at the bits of this instruction, they have the following value: 01011RRR. The last 3 bits (RRR) encode the register to use for the pop instruction.

To deal with this, my rule definition language supports wildcards. This is how you encode a pop reg instruction:


The B indicates that we want to define a byte using bits and wildcards. 0 and 1 are fixed bit values, and ? is the wildcard: the bit value can be 0 or 1. Thus the pattern (B;01011???) matches bytes 58, 59, 5A, 5B, 5C, 5D, 5E and 5F.

This wildcard allows us to encode patterns for shellcode instructions that use registers. For example , here is an often used set of instructions to determine the EIP with shellcode:

	call label
	pop eax

This pattern is encoded for all possible registers with the following rule:

GetEIP method 1:10:E800000000(B;01011???)

Another instruction often found in shellcode is xor reg1, reg1, like xor eax, eax.

You could represent this with the following pattern:


But this pattern matches more instructions than you want. It matches xor eax, eax, xor ecx, ecx, … but also xor eax, ecx, xor eax, edx, … You want this pattern to match the xor instruction for the same register, and not different registers. That is why you can use the following syntax:


By using a letter like A, B, …, as a wildcard, you assign a variable name to the wildcard bit pattern. ??? matches 3 bits. A?? also matches 3 bits, and assigns the variable name A to these 3 bits. When you use this bit pattern again, you make sure that the pattern will only be matched if the bit pattern is identical. Pattern ?????? matches 6 bits regardless of their value. Pattern A??A?? also matches 6 bits, but the first 3 bits must have the same value as the last 3 bits.

Here is another example:

Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??)

This pattern matches the following set of assembly instructions:

	push 0x30
	pop reg1
	mov reg2, dword [fs:reg1]

By using bit pattern A?? for the register of the second instruction, and B??A?? for the registers of the third instruction, you make sure that the third instruction use the same register for indexing as the second instruction.

Up til now, we looked at sequential assembly instructions. But you can also have shellcode patterns with jumps, e.g. non-sequential instructions. Here is an example:

	jmp LABEL1
	pop eax
	call LABEL2

To enable to match assembly code patterns with jumps, I introduced the (J;*) pattern in my rule definitions. J stands for a jump, and * represent the numbers of bytes that make up the displacement of the jump instruction (normally 1 byte or 4 bytes). Here is the rule that encodes the above assembly code pattern:

GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)

Finally, Frank’s detector also looks for suspicious strings, like UrlDownloadToFile, WinExec, … You can define rules using a hex pattern to detect these strings, but to facilitate the encoding of these rules, I added the str= keyword, like this:

Suspicious strings:2:str=UrlDownloadToFile
Suspicious strings:2:str=WinExec

Using wildcard rules

To use these shellcode wildcard rules with XORSearch, you use options -w or -W. -w allows you to specify your own rule(s), -W uses the build-in rules.

With -w, you can specify your rule as the search argument, or together with option -f, you provide a text file with rules.

Example: XORSearch.exe -w olimpikge.xls “GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)”

With -W, you don’t have to provide the rules, XORSearch will use the build-in rules.

Example: XORSearch.exe -W olimpikge.xls

You can view the build-in rules with option -L:

Function prolog signature:10:558BEC83C4
Function prolog signature:10:558BEC81EC
Function prolog signature:10:558BECEB
Function prolog signature:10:558BECE8
Function prolog signature:10:558BECE9
Indirect function call tris:10:FFB7(B;????????)(B;????????)(B;????????)(B;????????)FF57(B;????????)
GetEIP method 4 FLDZ/FSTENV [esp-12]:10:D9EED97424F4(B;01011???)
GetEIP method 1:10:E800000000(B;01011???)
GetEIP method 2:10:EB(J;1)E8(J;4)(B;01011???)
GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)
GetEIP method 4:10:D9EE9BD97424F4(B;01011???)
Find kernel32 base method 1:10:648B(B;00???101)30000000
Find kernel32 base method 1bis:10:64A130000000
Find kernel32 base method 2:10:31(B;11A??A??)(B;10100A??)30648B(B;00B??A??)
Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??)
Structured exception handling :10:648B(B;00???101)00000000
Structured exception handling bis:10:64A100000000
API Hashing:10:AC84C07407C1CF0D01C7EBF481FF
API Hashing bis:10:AC84C07407C1CF0701C7EBF481FF
Indirect function call:10:FF75(B;A???????)FF55(B;A???????)
Indirect function call bis:10:FFB5(B;A???????)(B;B???????)(B;C???????)(B;D???????)FF95(B;A???????)(B;B???????)(B;C???????)(B;D???????)
OLE file magic number:10:D0CF11E0
Suspicious strings:2:str=UrlDownloadToFile
Suspicious strings:2:str=GetTempPath
Suspicious strings:2:str=GetWindowsDirectory
Suspicious strings:2:str=GetSystemDirectory
Suspicious strings:2:str=WinExec
Suspicious strings:2:str=ShellExecute
Suspicious strings:2:str=IsBadReadPtr
Suspicious strings:2:str=IsBadWritePtr
Suspicious strings:2:str=CreateFile
Suspicious strings:2:str=CloseHandle
Suspicious strings:2:str=ReadFile
Suspicious strings:2:str=WriteFile
Suspicious strings:2:str=SetFilePointer
Suspicious strings:2:str=VirtualAlloc
Suspicious strings:2:str=GetProcAddr
Suspicious strings:2:str=LoadLibrary

I derived these rules from the source code Frank gave me. Testing these rules on different benign and malicious files revealed 2 things: a couple of rules generated a lot of false positives, and brute-forcing the ROT encoding also generated a lot of false positives. So I removed these rules, and I added an option to disable encodings (option -d). For example, with option -d 3 I disable the brute-forcing of the ROT encoding (1: XOR 2: ROL 3: ROT 4: SHIFT 5: ADD).

When looking for shellcode, you want several rules to trigger. If just one or two rules trigger, they are likely false positives.
XORSearch_V1_11_0.zip (https)
MD5: 7313A198033C0A1F69B79F96894462C7
SHA256: 1700D037D7A9902108F3986D75A9BA250ACBD96E38CC43C5B4BC1FB90761B320


  1. Would it be possible to scan multiple files for shell-code with your great tool, like xorsearch.exe -W /malwarefiles/

    Comment by Christopher — Tuesday 30 September 2014 @ 21:41

  2. @Christopher Yes, that would be useful, I have this feature in many other tools. But xorsearch is different. xorsearch is written in pure C, it has no OS dependencies. Thus it can run on Windows, Linux, OSX, … Enumerating the files in a folder is OS dependent, so I don’t know how to do that in pure C.

    Comment by Didier Stevens — Tuesday 30 September 2014 @ 21:56

  3. @Christopher Actually, I’ve an unpublished Python program that can help you with that. Interested?

    Comment by Didier Stevens — Friday 3 October 2014 @ 16:47

  4. can i get source code XorSearch in python bro ? if unpublished maybe you can send me by email … thanks before.. ^_^

    Comment by NothingSecure — Sunday 28 December 2014 @ 13:46

  5. XORSearch is written in C.

    Comment by Didier Stevens — Monday 29 December 2014 @ 9:31

RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: