Didier Stevens

Tuesday 25 November 2014

Update: find-file-in-file.py Version 0.0.4

Filed under: Forensics,My Software,Update — Didier Stevens @ 22:05

Here is the version I talked about in my Bitcoin virus posts.

It also has an embedded man page (use option –man).

find-file-in-file_v0_0_4.zip (https)
MD5: CD381616158BD233D94B368554B824C6
SHA256: FD5C4E3EC99371754E58B93D3D96CBA7A86C230C47FC9C27C9B871ED8BFB9149

Man page:

Usage: find-file-in-file.py [options] file-contained file-containing […]
Find if a file is present in another file

Arguments:
file-containing can be a single file, several files, and/or @file
@file: run the command on each file listed in the text file specified
wildcards are supported
batch mode is enabled when more than one file is specified

Source code put in the public domain by Didier Stevens, no Copyright
Use at your own risk

https://DidierStevens.com

Options:
–version             show program’s version number and exit
-h, –help            show this help message and exit
-m MINIMUM, –minimum=MINIMUM
Minimum length of byte-sequence to find (default 10)
-o, –overlap         Found sequences may overlap
-v, –verbose         Be verbose in batch mode
-p, –partial         Perform partial search of contained file
-O OUTPUT, –output=OUTPUT
Output to file
-b RANGEBEGIN, –rangebegin=RANGEBEGIN
Select the beginning of the contained file (by default
byte 0)
-e RANGEEND, –rangeend=RANGEEND
Select the end of the contained file (by default last
byte)
-x, –hexdump         Hexdump of found bytes
-q, –quiet           Do not output to standard output
–man                 Print manual

Manual:

find-file-in-file is a program to test if one file (the contained
file) can be found inside another file (the containing file).

Here is an example.
We have a file called contained-1.txt with the following content:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
and have a file called containing-1.txt with the following content:
0000ABCDEFGHIJKLM1111NOPQRSTUVWXYZ2222

When we execute the following command:
find-file-in-file.py contained-1.txt containing-1.txt

We get this output:
0x00000004 0x0000000d (50%)
0x00000015 0x0000000d (50%)
Finished

This means that the file contained-1.txt was completely found inside
file containing-1.txt At position 0x00000004 we found a first part
(0x0000000d bytes) and at position 0x00000015 we found a second part
(0x0000000d bytes).

We can use option hexdump (-x) to see which bytes were found:
find-file-in-file.py -x contained-1.txt containing-1.txt
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000015 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished

The containing file may contain the contained file in an arbitrary
order, like file containing-2.txt:
0000NOPQRSTUVWXYZ1111ABCDEFGHIJKLM2222

Example:
find-file-in-file.py -x contained-1.txt containing-2.txt
0x00000015 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000004 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished

The containing file does not need to contain the complete contained
file, like file containing-3.txt:
0000ABCDEFGHIJKLM1111

Example:
find-file-in-file.py -x contained-1.txt containing-3.txt
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
Remaining 13 (50%)

The message “Remaining 13 (50%)” means that the last 13 bytes of the
contained file were not found in the containing file (that’s 50% of
the contained file).

If the contained file starts with a byte sequence not present in the
containing file, nothing will be found. Example with file
contained-2.txt:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ

Nothing is found:
find-file-in-file.py -x contained-2.txt containing-1.txt
Remaining 36 (100%)

If you know how long that initial byte sequence is, you can skip it.
Use option rangebegin (-b) to specify the position in the contained
file from where you want to start searching.
Example:

find-file-in-file.py -x -b 10 contained-2.txt containing-1.txt
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000015 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished

If you want to skip bytes at the end of the contained file, use option
rangeend (-e).

If you don’t know how long that initial byte sequence is, you can
instruct find-file-in-file to “brute-force” it. With option partial
(-p), one byte at a time will be removed from the beginning of the
contained file until a match is found.
Example:

find-file-in-file.py -x -p contained-2.txt containing-1.txt
File: containing-1.txt (partial 0x0a)
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000015 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished

“(partial 0x0a)” tells you that the first 10 bytes of the contained
file were skipped before a match was found.

There are some other options:
-m minimum: find-file-in-file will search for byte sequences of 10
bytes long minimum. If you want to change this minimum, use option -m
minimum.
-o overlap: find-file-in-file will not let byte sequences overlap. Use
option -o overlap to remove this restriction.
-v verbose: be verbose in batch mode (more than one containing file).
-O output: besides writing output to stdout, write the output also to
the given file.
-q quiet: do not output to stdout.

Tuesday 18 November 2014

Update: pecheck.py Version 0.4.0

Filed under: My Software,Update — Didier Stevens @ 21:15

pecheck.py is a wrapper for pefile, ant this update has a couple of new features:

  • accept input from stdin (for pipes)
  • load PeID userdb.txt by default from same directory as pecheck.py
  • extra entry point info

pecheck-v0_4_0.zip (https)
MD5: 27041C56B80B097436076B7366A6F3B2
SHA256: F9C73ED054AE4D5E9F495916D1B028FD8D6E9B2800DCE1993E568E2A2BFD9A71

Wednesday 5 November 2014

XORSearch: Hexdump Support

Filed under: My Software,Update — Didier Stevens @ 22:04

Sometimes I want to check a malware sample with XORSearch, but I can’t because my AV will delete it. My solution is to work with a hexdump of the file.

Option -x allows XORSearch to work with a hexdump.

XORSearch_V1_11_1.zip (https)
MD5: D5EA1E30B2C2C7FEBE7AE7AD6E826BF5
SHA256: 15E9AAE87E7F25CF7966CDF0F8DFCB2648099585D08EAD522737E72C5FACA50A

Monday 27 October 2014

Update: PDFiD With Plugins Part 2

Filed under: My Software,PDF,Update — Didier Stevens @ 8:40

The second feature in this new version of PDFiD is selection. With this, you can select PDFs using criteria you provide.

Example:

pdfid.py -S “pdf.javascript.count > 0″ *.pdf

This command will select all files with extension .pdf in the current directory that are PDFs and have a /JavaScript count larger than zero. The selection expression you provide is a Python expression. Here is a list off attributes to use in your selection expressions:

pdf.version
pdf.filename
pdf.errorOccured
pdf.errorMessage
pdf.isPDF
pdf.header

pdf.keywords[keywordname].count
pdf.keywords[keywordname].hexcode

pdf.keywords['/AA'].count
pdf.keywords['/Root'].count # if option -a and if /Root present in PDF

pdf.obj.count
pdf.obj.hexcode
pdf.endobj.count
pdf.endobj.hexcode
pdf.stream.count
pdf.stream.hexcode
pdf.endstream.count
pdf.endstream.hexcode
pdf.xref.count
pdf.xref.hexcode
pdf.trailer.count
pdf.trailer.hexcode
pdf.startxref.count
pdf.startxref.hexcode
pdf.page.count
pdf.page.hexcode
pdf.encrypt.count
pdf.encrypt.hexcode
pdf.objstm.count
pdf.objstm.hexcode
pdf.js.count
pdf.js.hexcode
pdf.javascript.count
pdf.javascript.hexcode
pdf.aa.count
pdf.aa.hexcode
pdf.openaction.count
pdf.openaction.hexcode
pdf.acroform.count
pdf.acroform.hexcode
pdf.jbig2decode.count
pdf.jbig2decode.hexcode
pdf.richmedia.count
pdf.richmedia.hexcode
pdf.launch.count
pdf.launch.hexcode
pdf.embeddedfile.count
pdf.embeddedfile.hexcode
pdf.xfa.count
pdf.xfa.hexcode
pdf.colors_gt_2_24.count
pdf.colors_gt_2_24.hexcode

Be careful if you are going to use this in an automated scenario where you don’t control the selection expression. This expression is evaluated in Python with the eval function, and there is no input validation.

 

Monday 20 October 2014

Update: PDFiD With Plugins Part 1

Filed under: My Software,PDF,Update — Didier Stevens @ 8:51

Almost from the beginning when I released PDFiD, people asked me for anti-virus like feature: that PDFiD would tell you if a PDF was malicious or not. Some people even patched PDFiD with a scoring feature.

But I didn’t want to develop an “anti-virus” for PDFs; PDFiD is a triage tool.

Now you can develop your own scoring system with plugins.

Plugins are loaded with option -p, like this:

20141020-102902

I provide 3 plugins: plugin_triage.py, plugin_nameobfuscation.py and plugin_embeddedfile.py. You can run more than one plugin by separating their names with a comma: pdfid.py -p plugin_triage,plugin_embeddedfile js.pdf

Or you can use an @-file: a text file with the names of the plugins you want to run.

To output the result as CSV file, use option -c, and to write the output to a file, use option -o. With option -m, you can provide a minimum score the plugin has to produce for its output to be displayed.

Plugins are Python classes, I’ll explain how to make your own in a later post.

plugin_triage.py produces a score of 1.0 when the PDF requires further analysis, and 0.0 if not.

plugin_nameobfuscation.py produces a score of 1.0 when name obfuscation is used in the PDF.

plugin_embeddedfile.py produces a score of 0.9 when an embedded file is present, and 1.0 when name obfuscation is also used.
pdfid_v0_2_1.zip (https)
MD5: 7463412536678B321276F8720F52DE81
SHA256: F1B4728DD2CE455B863B930E12C6DEC952CB95C0BB3D6924136A6E49ACA877C2

Monday 29 September 2014

Update: XORSearch With Shellcode Detector

Filed under: My Software,Update — Didier Stevens @ 0:00

XORSearch allows you to search for strings and embedded PE-files brute-forcing different encodings. Now I added shellcode detection.

This new version of XORSearch integrates Frank Boldewin’s shellcode detector. In his Hack.lu 2009 presentation, Frank explains how he detects shellcode in Microsoft Office documents by searching for byte sequences often used in shellcode.

I integrated Frank’s methods in XORSearch, so that you can use it for any file type, not only Microsoft Office files.

20140928-135402

Frank was kind enough to give me his source code for the detection engine. However, I did not integrated is source code as-is. I developed my own engine that uses rules to detect shellcode artifacts. These rules are not hard-coded, but can be externalized, so that you can define your own rules.

Wildcard rule syntax

A wildcard rule is composed of 3 parts: a rule name, a score and a pattern. These are separated by a : character.

Example of a rule:

Find kernel32 base method 1bis:10:64A130000000

The name of this rule is “Find kernel32 base method 1bis”, it has a score of 10, and the pattern is 64A130000000. When XORSearch finds byte pattern 64A130000000, it will report it mentioning rule name “Find kernel32 base method 1bis” and add 10 to the total score. This byte pattern is the following assembly instruction:

MOV EAX, dword [fs:0x30]

This is an instruction often found in shellcode that looks for the base of kernel32.

When assembly instructions reference a register, the register is encoded as bits in the bytes that make up the instruction. For example, pop eax is just one byte: 58. pop ecx is 59, pop edx is 5a, … If you look at the bits of this instruction, they have the following value: 01011RRR. The last 3 bits (RRR) encode the register to use for the pop instruction.

To deal with this, my rule definition language supports wildcards. This is how you encode a pop reg instruction:

(B;01011???)

The B indicates that we want to define a byte using bits and wildcards. 0 and 1 are fixed bit values, and ? is the wildcard: the bit value can be 0 or 1. Thus the pattern (B;01011???) matches bytes 58, 59, 5A, 5B, 5C, 5D, 5E and 5F.

This wildcard allows us to encode patterns for shellcode instructions that use registers. For example , here is an often used set of instructions to determine the EIP with shellcode:

	call label
label:
	pop eax

This pattern is encoded for all possible registers with the following rule:

GetEIP method 1:10:E800000000(B;01011???)

Another instruction often found in shellcode is xor reg1, reg1, like xor eax, eax.

You could represent this with the following pattern:

31(B;11??????)

But this pattern matches more instructions than you want. It matches xor eax, eax, xor ecx, ecx, … but also xor eax, ecx, xor eax, edx, … You want this pattern to match the xor instruction for the same register, and not different registers. That is why you can use the following syntax:

31(B;11A??A??)

By using a letter like A, B, …, as a wildcard, you assign a variable name to the wildcard bit pattern. ??? matches 3 bits. A?? also matches 3 bits, and assigns the variable name A to these 3 bits. When you use this bit pattern again, you make sure that the pattern will only be matched if the bit pattern is identical. Pattern ?????? matches 6 bits regardless of their value. Pattern A??A?? also matches 6 bits, but the first 3 bits must have the same value as the last 3 bits.

Here is another example:

Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??)

This pattern matches the following set of assembly instructions:

	push 0x30
	pop reg1
	mov reg2, dword [fs:reg1]

By using bit pattern A?? for the register of the second instruction, and B??A?? for the registers of the third instruction, you make sure that the third instruction use the same register for indexing as the second instruction.

Up til now, we looked at sequential assembly instructions. But you can also have shellcode patterns with jumps, e.g. non-sequential instructions. Here is an example:

	jmp LABEL1
LABEL2:
	pop eax
	...
	...
LABEL1:
	call LABEL2

To enable to match assembly code patterns with jumps, I introduced the (J;*) pattern in my rule definitions. J stands for a jump, and * represent the numbers of bytes that make up the displacement of the jump instruction (normally 1 byte or 4 bytes). Here is the rule that encodes the above assembly code pattern:

GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)

Finally, Frank’s detector also looks for suspicious strings, like UrlDownloadToFile, WinExec, … You can define rules using a hex pattern to detect these strings, but to facilitate the encoding of these rules, I added the str= keyword, like this:

Suspicious strings:2:str=UrlDownloadToFile
Suspicious strings:2:str=WinExec

Using wildcard rules

To use these shellcode wildcard rules with XORSearch, you use options -w or -W. -w allows you to specify your own rule(s), -W uses the build-in rules.

With -w, you can specify your rule as the search argument, or together with option -f, you provide a text file with rules.

Example: XORSearch.exe -w olimpikge.xls “GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)”

With -W, you don’t have to provide the rules, XORSearch will use the build-in rules.

Example: XORSearch.exe -W olimpikge.xls

You can view the build-in rules with option -L:

Function prolog signature:10:558BEC83C4
Function prolog signature:10:558BEC81EC
Function prolog signature:10:558BECEB
Function prolog signature:10:558BECE8
Function prolog signature:10:558BECE9
Indirect function call tris:10:FFB7(B;????????)(B;????????)(B;????????)(B;????????)FF57(B;????????)
GetEIP method 4 FLDZ/FSTENV [esp-12]:10:D9EED97424F4(B;01011???)
GetEIP method 1:10:E800000000(B;01011???)
GetEIP method 2:10:EB(J;1)E8(J;4)(B;01011???)
GetEIP method 3:10:E9(J;4)E8(J;4)(B;01011???)
GetEIP method 4:10:D9EE9BD97424F4(B;01011???)
Find kernel32 base method 1:10:648B(B;00???101)30000000
Find kernel32 base method 1bis:10:64A130000000
Find kernel32 base method 2:10:31(B;11A??A??)(B;10100A??)30648B(B;00B??A??)
Find kernel32 base method 3:10:6830000000(B;01011A??)648B(B;00B??A??)
Structured exception handling :10:648B(B;00???101)00000000
Structured exception handling bis:10:64A100000000
API Hashing:10:AC84C07407C1CF0D01C7EBF481FF
API Hashing bis:10:AC84C07407C1CF0701C7EBF481FF
Indirect function call:10:FF75(B;A???????)FF55(B;A???????)
Indirect function call bis:10:FFB5(B;A???????)(B;B???????)(B;C???????)(B;D???????)FF95(B;A???????)(B;B???????)(B;C???????)(B;D???????)
OLE file magic number:10:D0CF11E0
Suspicious strings:2:str=UrlDownloadToFile
Suspicious strings:2:str=GetTempPath
Suspicious strings:2:str=GetWindowsDirectory
Suspicious strings:2:str=GetSystemDirectory
Suspicious strings:2:str=WinExec
Suspicious strings:2:str=ShellExecute
Suspicious strings:2:str=IsBadReadPtr
Suspicious strings:2:str=IsBadWritePtr
Suspicious strings:2:str=CreateFile
Suspicious strings:2:str=CloseHandle
Suspicious strings:2:str=ReadFile
Suspicious strings:2:str=WriteFile
Suspicious strings:2:str=SetFilePointer
Suspicious strings:2:str=VirtualAlloc
Suspicious strings:2:str=GetProcAddr
Suspicious strings:2:str=LoadLibrary

I derived these rules from the source code Frank gave me. Testing these rules on different benign and malicious files revealed 2 things: a couple of rules generated a lot of false positives, and brute-forcing the ROT encoding also generated a lot of false positives. So I removed these rules, and I added an option to disable encodings (option -d). For example, with option -d 3 I disable the brute-forcing of the ROT encoding (1: XOR 2: ROL 3: ROT 4: SHIFT 5: ADD).

When looking for shellcode, you want several rules to trigger. If just one or two rules trigger, they are likely false positives.
XORSearch_V1_11_0.zip (https)
MD5: 7313A198033C0A1F69B79F96894462C7
SHA256: 1700D037D7A9902108F3986D75A9BA250ACBD96E38CC43C5B4BC1FB90761B320

Sunday 14 September 2014

Update: SpiderMonkey

Filed under: My Software,Update — Didier Stevens @ 15:00

During my PDF training at 44CON I got the idea for a simple modification: now with document.write(), a third file is created. The file is write.bin.log and contains the pure UNICODE data, e.g. without 0xFFFE header.

To extract shellcode now, you no longer need to edit write.uc.log to remove the 0xFFFE header.

I also included binaries for Windows and Linux (compiled on CentOS 6.0) in the ZIP file.

js-1.7.0-mod-b.zip (https)
MD5: 85B369B5650D4C041D21E8574CF09B9A
SHA256: D3827DF7B2EA81EEE91181B2DE045320E1CFEC46EED33F7CD84CA63C3A36BC38

Monday 1 September 2014

Update: Calculating a SSH Fingerprint From a (Cisco) Public Key

Filed under: Encryption,My Software,Update — Didier Stevens @ 20:17

I think there’s more interest for my program to calculate the SSH fingerprint for Cisco IOS since Snowden started with his revelations.

I fixed a bug with 2048 bit (and more) keys.

20111221-225407

cisco-calculate-ssh-fingerprint_V0_0_2.zip (https)
MD5: C304299624F12341F9935263304F725B
SHA256: 2F2BF65E6903BE3D9ED99D06F0F38B599079CCE920222D55CC5C3D7350BD20FB

Wednesday 16 July 2014

Update: translate.py

Filed under: My Software,Update — Didier Stevens @ 19:37

Some time ago, Chris John Riley reminded me of a program I had written, published … and forgotten: translate.py. Apparently, it is used in SANS classes.

Looking at this program from 2007, I though: my Python coding style has changed since then, I need to rewrite this.

So here is the new version. It’s backward compatible with the old version (same arguments), but it offers more flexibility, like input/output redirection, allowing it to be used in pipes.

And from now on, I’m going to try to add a man page to all new Python program releases. It’s embedded in the source code, and you view it like this: translate.py –man

20140716-212254

Monday 30 June 2014

Update: Stoned Bitcoin

Filed under: Encryption,Forensics,Malware,Update — Didier Stevens @ 0:04

kurt wismer pointed me to this post on pastebin after he read my Stoned Bitcoin blogpost. The author of this pastebin post works out a method to spam the Bitcoin blockchain to cause anti-virus (false) positives.

I scanned through all the Bitcoin transactions (until 24/06/2014) for the addresses listed in this pastebin post (the addresses represent antivirus signatures for 400+ malwares).

All these “malicious” Bitcoin addresses, designed to generate anti-virus false positives,  have been exclusively used in the 8 Bitcoin transactions I mentioned in my previous post.

The pastebin entry was posted on 2014/04/02 19:01:08 UTC.

And here are the 8 transactions with the UTC timestamp of the block in which they appear:

Block: 2014/04/03 23:12:48
Transaction: edb83f04e68bfe78bbfe7ce80d33e85acb9335c96ead5712517b8c70d1f27b38
Block: 2014/04/04 01:10:45
Transaction: 7e49504c7cecea7ea95d78ff14687878ba581a21dc0772805d2925c617514129
Block: 2014/04/04 01:43:25
Transaction: f65895220f04aa0084d9abae938d3f517893e3afbffe25fc9e7073e02331b9ed
Block: 2014/04/04 02:58:13
Transaction: 8a445d12f225a21d36bb78da747efd2e74861fcd033757da572c0434d423acd1
Block: 2014/04/04 04:32:24
Transaction: fcf5cf9893a142897598edfc753bd6162e3638e138fc2feaf4a3477c0cfb65eb
Block: 2014/04/04 04:32:24
Transaction: 2814673f0952b936d578d73197bfd371cefbd73c6294bab16de1575a4c3f6e80
Block: 2014/04/04 09:36:29
Transaction: f09904aaa4fa4a8ec7da06f5e3d318a9b6a218e1a215f9307416fbbadf5a1c8e
Block: 2014/04/04 09:36:29
Transaction: 5dbb9df056c36457228a841d6cc98ac90967bc88411c95372d3c2d92c18060f8

So it took a bit more than 24 hours before someone spammed the Bitcoin blockchain with these transactions designed to trigger false positives.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 239 other followers