Didier Stevens

Monday 29 September 2014

Update: XORSearch With Shellcode Detector

Filed under: My Software,Update — Didier Stevens @ 0:00

XORSearch allows you to search for strings and embedded PE-files brute-forcing different encodings. Now I added shellcode detection.

This new version of XORSearch integrates Frank Boldewin’s shellcode detector. In his Hack.lu 2009 presentation, Frank explains how he detects shellcode in Microsoft Office documents by searching for byte sequences often used in shellcode.

I integrated Frank’s methods in XORSearch, so that you can use it for any file type, not only Microsoft Office files.


Frank was kind enough to give me his source code for the detection engine. However, I did not integrated is source code as-is. I developed my own engine that uses rules to detect shellcode artifacts. These rules are not hard-coded, but can be externalized, so that you can define your own rules.

Wildcard rule syntax

A wildcard rule is composed of 3 parts: a rule name, a score and a pattern. These are separated by a : character.

Example of a rule:

Find kernel32 base method 1bis:10:64A130000000

The name of this rule is “Find kernel32 base method 1bis”, it has a score of 10, and the pattern is 64A130000000. When XORSearch finds byte pattern 64A130000000, it will report it mentioning rule name “Find kernel32 base method 1bis” and add 10 to the total score. This byte pattern is the following assembly instruction:

MOV EAX, dword [fs:0x30]

This is an instruction often found in shellcode that looks for the base of kernel32.

When assembly instructions reference a register, the register is encoded as bits in the bytes that make up the instruction. For example, pop eax is just one byte: 58. pop ecx is 59, pop edx is 5a, … If you look at the bits of this instruction, they have the following value: 01011RRR. The last 3 bits (RRR) encode the register to use for the pop instruction.

To deal with this, my rule definition language supports wildcards. This is how you encode a pop reg instruction:


The B indicates that we want to define a byte using bits and wildcards. 0 and 1 are fixed bit values, and ? is the wildcard: the bit value can be 0 or 1. Thus the pattern (B;01011???) matches bytes 58, 59, 5A, 5B, 5C, 5D, 5E and 5F.

This wildcard allows us to encode patterns for shellcode instructions that use registers. For example , here is an often used set of instructions to determine the EIP with shellcode:

	call label
	pop eax

This pattern is encoded for all possible registers with the following rule:

GetEIP method 1:10:E800000000(B;01011???)

Another instruction often found in shellcode is xor reg1, reg1, like xor eax, eax.

You could represent this with the following pattern:


But this pattern matches more instructions than you want. It matches xor eax, eax, xor ecx, ecx, … but also xor eax, ecx, xor eax, edx, … You want this pattern to match the xor instruction for the same register, and not different registers. That is why you can use the following syntax:


By using a letter like A, B, …, as a wildcard, you assign a variable name to the wildcard bit pattern. ??? matches 3 bits. A?? also matches 3 bits, and assigns the variable name A to these 3 bits. When you use this bit pattern again, you make sure that the pattern will only be matched if the bit pattern is identical. Pattern ?????? matches 6 bits regardless of their value. Pattern A??A?? also matches 6 bits, but the first 3 bits must have the same value as the last 3 bits.

Here is another example:

Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??)

This pattern matches the following set of assembly instructions:

	push 0x30
	pop reg1
	mov reg2, dword [fs:reg1]

By using bit pattern A?? for the register of the second instruction, and B??A?? for the registers of the third instruction, you make sure that the third instruction use the same register for indexing as the second instruction.

Up til now, we looked at sequential assembly instructions. But you can also have shellcode patterns with jumps, e.g. non-sequential instructions. Here is an example:

	jmp LABEL1
	pop eax
	call LABEL2

To enable to match assembly code patterns with jumps, I introduced the (J;*) pattern in my rule definitions. J stands for a jump, and * represent the numbers of bytes that make up the displacement of the jump instruction (normally 1 byte or 4 bytes). Here is the rule that encodes the above assembly code pattern:

GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)

Finally, Frank’s detector also looks for suspicious strings, like UrlDownloadToFile, WinExec, … You can define rules using a hex pattern to detect these strings, but to facilitate the encoding of these rules, I added the str= keyword, like this:

Suspicious strings:2:str=UrlDownloadToFile
Suspicious strings:2:str=WinExec

Using wildcard rules

To use these shellcode wildcard rules with XORSearch, you use options -w or -W. -w allows you to specify your own rule(s), -W uses the build-in rules.

With -w, you can specify your rule as the search argument, or together with option -f, you provide a text file with rules.

Example: XORSearch.exe -w olimpikge.xls “GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)”

With -W, you don’t have to provide the rules, XORSearch will use the build-in rules.

Example: XORSearch.exe -W olimpikge.xls

You can view the build-in rules with option -L:

Function prolog signature:10:558BEC83C4
Function prolog signature:10:558BEC81EC
Function prolog signature:10:558BECEB
Function prolog signature:10:558BECE8
Function prolog signature:10:558BECE9
Indirect function call tris:10:FFB7(B;????????)(B;????????)(B;????????)(B;????????)FF57(B;????????)
GetEIP method 4 FLDZ/FSTENV [esp-12]:10:D9EED97424F4(B;01011???)
GetEIP method 1:10:E800000000(B;01011???)
GetEIP method 2:10:EB(J;1)E8(J;4)(B;01011???)
GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)
GetEIP method 4:10:D9EE9BD97424F4(B;01011???)
Find kernel32 base method 1:10:648B(B;00???101)30000000
Find kernel32 base method 1bis:10:64A130000000
Find kernel32 base method 2:10:31(B;11A??A??)(B;10100A??)30648B(B;00B??A??)
Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??)
Structured exception handling :10:648B(B;00???101)00000000
Structured exception handling bis:10:64A100000000
API Hashing:10:AC84C07407C1CF0D01C7EBF481FF
API Hashing bis:10:AC84C07407C1CF0701C7EBF481FF
Indirect function call:10:FF75(B;A???????)FF55(B;A???????)
Indirect function call bis:10:FFB5(B;A???????)(B;B???????)(B;C???????)(B;D???????)FF95(B;A???????)(B;B???????)(B;C???????)(B;D???????)
OLE file magic number:10:D0CF11E0
Suspicious strings:2:str=UrlDownloadToFile
Suspicious strings:2:str=GetTempPath
Suspicious strings:2:str=GetWindowsDirectory
Suspicious strings:2:str=GetSystemDirectory
Suspicious strings:2:str=WinExec
Suspicious strings:2:str=ShellExecute
Suspicious strings:2:str=IsBadReadPtr
Suspicious strings:2:str=IsBadWritePtr
Suspicious strings:2:str=CreateFile
Suspicious strings:2:str=CloseHandle
Suspicious strings:2:str=ReadFile
Suspicious strings:2:str=WriteFile
Suspicious strings:2:str=SetFilePointer
Suspicious strings:2:str=VirtualAlloc
Suspicious strings:2:str=GetProcAddr
Suspicious strings:2:str=LoadLibrary

I derived these rules from the source code Frank gave me. Testing these rules on different benign and malicious files revealed 2 things: a couple of rules generated a lot of false positives, and brute-forcing the ROT encoding also generated a lot of false positives. So I removed these rules, and I added an option to disable encodings (option -d). For example, with option -d 3 I disable the brute-forcing of the ROT encoding (1: XOR 2: ROL 3: ROT 4: SHIFT 5: ADD).

When looking for shellcode, you want several rules to trigger. If just one or two rules trigger, they are likely false positives.
XORSearch_V1_11_0.zip (https)
MD5: 7313A198033C0A1F69B79F96894462C7
SHA256: 1700D037D7A9902108F3986D75A9BA250ACBD96E38CC43C5B4BC1FB90761B320

Sunday 14 September 2014

Update: SpiderMonkey

Filed under: My Software,Update — Didier Stevens @ 15:00

During my PDF training at 44CON I got the idea for a simple modification: now with document.write(), a third file is created. The file is write.bin.log and contains the pure UNICODE data, e.g. without 0xFFFE header.

To extract shellcode now, you no longer need to edit write.uc.log to remove the 0xFFFE header.

I also included binaries for Windows and Linux (compiled on CentOS 6.0) in the ZIP file.

js-1.7.0-mod-b.zip (https)
MD5: 85B369B5650D4C041D21E8574CF09B9A
SHA256: D3827DF7B2EA81EEE91181B2DE045320E1CFEC46EED33F7CD84CA63C3A36BC38

Monday 1 September 2014

Update: Calculating a SSH Fingerprint From a (Cisco) Public Key

Filed under: Encryption,My Software,Update — Didier Stevens @ 20:17

I think there’s more interest for my program to calculate the SSH fingerprint for Cisco IOS since Snowden started with his revelations.

I fixed a bug with 2048 bit (and more) keys.


cisco-calculate-ssh-fingerprint_V0_0_2.zip (https)
MD5: C304299624F12341F9935263304F725B
SHA256: 2F2BF65E6903BE3D9ED99D06F0F38B599079CCE920222D55CC5C3D7350BD20FB

Wednesday 16 July 2014

Update: translate.py

Filed under: My Software,Update — Didier Stevens @ 19:37

Some time ago, Chris John Riley reminded me of a program I had written, published … and forgotten: translate.py. Apparently, it is used in SANS classes.

Looking at this program from 2007, I though: my Python coding style has changed since then, I need to rewrite this.

So here is the new version. It’s backward compatible with the old version (same arguments), but it offers more flexibility, like input/output redirection, allowing it to be used in pipes.

And from now on, I’m going to try to add a man page to all new Python program releases. It’s embedded in the source code, and you view it like this: translate.py –man


Monday 30 June 2014

Update: Stoned Bitcoin

Filed under: Encryption,Forensics,Malware,Update — Didier Stevens @ 0:04

kurt wismer pointed me to this post on pastebin after he read my Stoned Bitcoin blogpost. The author of this pastebin post works out a method to spam the Bitcoin blockchain to cause anti-virus (false) positives.

I scanned through all the Bitcoin transactions (until 24/06/2014) for the addresses listed in this pastebin post (the addresses represent antivirus signatures for 400+ malwares).

All these “malicious” Bitcoin addresses, designed to generate anti-virus false positives,  have been exclusively used in the 8 Bitcoin transactions I mentioned in my previous post.

The pastebin entry was posted on 2014/04/02 19:01:08 UTC.

And here are the 8 transactions with the UTC timestamp of the block in which they appear:

Block: 2014/04/03 23:12:48
Transaction: edb83f04e68bfe78bbfe7ce80d33e85acb9335c96ead5712517b8c70d1f27b38
Block: 2014/04/04 01:10:45
Transaction: 7e49504c7cecea7ea95d78ff14687878ba581a21dc0772805d2925c617514129
Block: 2014/04/04 01:43:25
Transaction: f65895220f04aa0084d9abae938d3f517893e3afbffe25fc9e7073e02331b9ed
Block: 2014/04/04 02:58:13
Transaction: 8a445d12f225a21d36bb78da747efd2e74861fcd033757da572c0434d423acd1
Block: 2014/04/04 04:32:24
Transaction: fcf5cf9893a142897598edfc753bd6162e3638e138fc2feaf4a3477c0cfb65eb
Block: 2014/04/04 04:32:24
Transaction: 2814673f0952b936d578d73197bfd371cefbd73c6294bab16de1575a4c3f6e80
Block: 2014/04/04 09:36:29
Transaction: f09904aaa4fa4a8ec7da06f5e3d318a9b6a218e1a215f9307416fbbadf5a1c8e
Block: 2014/04/04 09:36:29
Transaction: 5dbb9df056c36457228a841d6cc98ac90967bc88411c95372d3c2d92c18060f8

So it took a bit more than 24 hours before someone spammed the Bitcoin blockchain with these transactions designed to trigger false positives.

Thursday 20 March 2014

XORSearch: Finding Embedded Executables

Filed under: My Software,Update — Didier Stevens @ 10:58

Someone mentioned on a forum that he found a picture with an embedded, XORed executable. You can easily identify such embedded executables by xorsearching for the string “This program must be run under Win32″. But if the author or compiler modifies this DOS-stub string, you will not find it.

That’s how I got the idea to add an option to search for PE-files: search for string MZ, read the offset to the IMAGE_NT_HEADER structure (e_lfanew), and check if it starts with string PE.

Example: XORSearch.exe -p test.jpg

Found XOR A2 position 00005D1D: 000000E8 ........!..L.!This program cannot be r
Found XOR A2 position 0001221D: 00000108 ........!..L.!This program cannot be r

We found 2 embedded executables in test.jpg (XOR key A2). Remark we didn’t provide a search string, only option -p.

XORSearch also reports the value of e_lfanew and the string found in the DOS-stub. This allows you to inspect the results for false positives.

This can also be used on unencoded files, like this installation file:

XORSearch.exe -p c8400.msi
Found XOR 00 position 00236400: 000000E8 ........!..L.!This program cannot be r
Found XOR 00 position 00286000: 00000100 ........!..L.!This program cannot be r
Found XOR 00 position 00346800: 000000F8 ........!..L.!This program cannot be r
Found XOR 00 position 003A7200: 00000080 ........!..L.!This program cannot be r
Found XOR 00 position 003AD200: 00000080 ........!..L.!This program cannot be r
Found XOR 00 position 004B4800: 00000108 ........!..L.!This program cannot be r
Found XOR 00 position 004DE600: 000000F8 ........!..L.!This program cannot be r
Found XOR 00 position 004FE200: 000000E0 ........!..L.!This program cannot be r
Found XOR 00 position 00520C00: 000000E0 ........!..L.!This program cannot be r
Found XOR 00 position 00542000: 000000E0 ........!..L.!This program cannot be r
Found XOR 00 position 00562400: 00000100 ........!..L.!This program cannot be r
Found XOR 00 position 0058F800: 000000E0 ........!..L.!This program cannot be r

Finally, I added option -e (exclude). This excludes a particular byte-value from encoding. If you suspect a file is XOR encoded, but that byte 0x00 is not encoded, you use option -e 0x00.

XORSearch_V1_10_0.zip (https)
MD5: 23809A03C63914B0742B7F75B73E1597
SHA256: 97BFBC5E8C59F60E10ABDA2D65DF4200B10BE14662D4A447797B341C9AAE17D8

Monday 23 December 2013

Update: Prefetch File 010 Template

Filed under: Forensics,My Software,Update — Didier Stevens @ 22:01

This update to my Prefetch File 010 Template adds Sections A through D.

PFTemplate_V0_0_2.zip (https)
MD5: 56A98A78BD4E8D1AED88385AF1DD8446
SHA256: E15D721E46FFB8158C6D14C9A38DE4E3DD5DCD0972896441DF17590C540DBCC3

Saturday 14 December 2013

Update: virustotal-submit.py V0.0.3

Filed under: Malware,My Software,Update — Didier Stevens @ 0:21

There is extra error handling in this new version.

virustotal-search and virustotal-submit have their own page now: VirusTotal Tools.

virustotal-submit_V0_0_3.zip (https)
MD5: 3F9F5421F711E2930AB6F80D87DF9E2B
SHA256: 37CCE3E8469DE097912CB23BAC6B909C9C7F5A5CEE09C9279D32BDB9D6E23BCC

Monday 2 December 2013

4 Times Faster virustotal-search.py

Filed under: Malware,My Software,Update — Didier Stevens @ 0:26

This is an important update to virustotal-search.py.

Rereading the VT API, I noticed I missed the fact that the search query accepts up to 4 search terms.

This new version submits 4 hashes at a time, making it up to 4 times faster than previous versions.

virustotal-search_V0_1_0.zip (https)
MD5: 0141D3677F759317034C416EBF9FF30D
SHA256: FE07859C3FA09DA120D3104FF982AF0D78ADFCF099A10E46E254823502DF4EE4

Friday 15 November 2013

Update: find-file-in-file.py Version 0.0.3

Filed under: Forensics,My Software,Update — Didier Stevens @ 12:55

shinnai made an interesting comment when I released my tool to find contained files: he wanted to know if I could add a batch mode.

I guess this batch mode is interesting when you want to check if a large set of files contains a particular file. So I added this features and release it here.

Now you can provide more than one containing-file to find-file-in-file.py: you can just type several files, use wildcards and/or use at-files (@file). When you specify @filename, find-file-in-file.py will search in all the files listed in textfile filename (each file on a separate line).

When you provide only one file to search, then this new version will just work like the previous version.

But if you provide more than one file, then batch mode is enabled. In batch mode, the contained file is searched for in each containing file. If a (partial) match is found, it will be included in the report. If no match is found, no output is produced. If you want output even when no match is found, then use option verbose (-v).

Example for a bunch of MSI files:

find-file-in-file.py msi49.tmp *.msi

File: c8400.msi
003a7200 00005600 (100%)
File: Cisco_Jabber.msi
00295600 00001000 (18%)
00294a00 00000c00 (13%)
00296600 00003a00 (67%)

File msi49.tmp was found in only 2 MSI files.
find-file-in-file_v0_0_3.zip (https)
MD5: 8691158700079C786F6905F0CA0F32BC
SHA256: 84506CED140F309503E723831A9EFB99A8CC213532BEB56E00BC4BA5FE235797

« Previous PageNext Page »

The Rubric Theme. Blog at WordPress.com.


Get every new post delivered to your Inbox.

Join 279 other followers