Didier Stevens

Wednesday 18 February 2015

Analyzing A Fraudulent Document With Error Level Analysis

Filed under: Forensics,My Software,PDF — Didier Stevens @ 0:00

Some time ago I had the chance to try out an image forensic method (Error Level Analysis) on a PDF. It was a fraudulent document (a form), but with a special characteristic: the criminal converted the original form (a PDF) to JPEG, edited the JPEG with a raster graphics editor, and then inserted the edited JPEG in a PDF document. This gave me the opportunity to try out Error Level Analysis (ELA) on a “text document”.

I can’t share the PDF, but I recreated one to use in this blogpost.

First I search for images in the PDF document:

pdf-parser.py -s image example-edited.pdf

Result:

obj 4 0
 Type: 
 Referencing: 6 0 R

  <<
    /Font
    /XObject
      <<
        /Im4 6 0 R
      >>
    /ProcSet [/PDF/Text/ImageC/ImageI/ImageB]
  >>


obj 6 0
 Type: /XObject
 Referencing: 
 Contains stream

  <<
    /Type /XObject
    /Subtype /Image
    /Width 680
    /Height 965
    /BitsPerComponent 8
    /ColorSpace /DeviceRGB
    /Filter /DCTDecode
    /Length 233133
  >>

The image is in object 6. I extract the image:

pdf-parser.py -o 6 -d example-edited.jpeg example-edited.pdf

Here it is:

example-edited

If you Google for Error Level Analysis, you’ll find a couple of websites that provide online image forensics. But that was not an option for me, I could not share the document.

I found this C program for ELA, and later I wrote my own Python program (what else?), that I’ll use for this example:

image-forensics-ela.py example-edited.jpeg example-edited-ela.png

example-edited-ela

The colored pixels reveal the word I edited. You can see it better when I overlay the 2 images:

image-overlay.py -a 0.6 example-edited.jpeg example-edited-ela.png example-edited-overlay.png

example-edited-overlay

FYI: there is also a GIMP plugin for ELA.

You can download the examples and programs here:

blogpost-ela-files.zip (https)
MD5: 4F3071A9162C5CA8B7B10A41F662093A
SHA256: CBA786368D7BAF65E1E9F854C315BFB60FF89910429106513A0C41C180D8FCAB

Sunday 15 February 2015

Update: YARA Rule JPEG_EXIF_Contains_eval

Filed under: Forensics,Malware,Update — Didier Stevens @ 11:21

Now that YARA version 3.3.0 supports word boundaries in regular expressions, I’ve updated my YARA Rule for Detecting JPEG Exif With eval().

yara-rules-V0.0.5.zip (https)
MD5: 298EB636B3A3CB6A073815A83A6D1BA6
SHA256: EA00D044A3A0FE29265817407E382034593E0DAAD9887416E7FC128DA24B8830

Thursday 22 January 2015

Converting PEiD Signatures To YARA Rules

Filed under: Forensics,Malware,My Software — Didier Stevens @ 0:56

I converted Jim Clausing’s PEiD rules to YARA rules so that I can use them to detect executable code in suspect Microsoft Office Documents with my oledump tool.

Of course, I wrote a program to do this automatically: peid-userdb-to-yara-rules.py

This program converts PEiD signatures to YARA rules. These signatures are typically found in file userdb.txt. Since PEiD signature names don’t need to be unique, and can contain characters that are not allowed in YARA rules, the name of the YARA rule is prefixed with PEiD_ and a running counter, and non-alphanumeric characters are converted to underscores (_).
Signatures that can not be parsed are ignored.

Here is an example:
PEiD signature:

 [!EP (ExE Pack) V1.0 -> Elite Coding Group]
 signature = 60 68 ?? ?? ?? ?? B8 ?? ?? ?? ?? FF 10
 ep_only = true

Generated YARA rule:

 rule PEiD_00001__EP__ExE_Pack__V1_0____Elite_Coding_Group_
 {
     meta:
         description = "[!EP (ExE Pack) V1.0 -> Elite Coding Group]"
         ep_only = "true"
     strings:
         $a = {60 68 ?? ?? ?? ?? B8 ?? ?? ?? ?? FF 10}
     condition:
         $a
 }

PEiD signatures have an ep_only property that can be true or false. This property specifies if the signature has to be found at the PE file’s entry point (true) or can be found anywhere (false).
This program will convert all signatures, regardless of the value of the ep_only property. Use option -e to convert only rules with ep_only property equal to true or false.

Option -p generates rules that use YARA’s pe module. If a signature has ep_only property equal to true, then the YARA rule’s condition becomes $a at pe.entry_point instead of just $a.

Example:

 import "pe"

 rule PEiD_00001__EP__ExE_Pack__V1_0____Elite_Coding_Group_
 {
     meta:
         description = "[!EP (ExE Pack) V1.0 -> Elite Coding Group]"
         ep_only = "true"
     strings:
         $a = {60 68 ?? ?? ?? ?? B8 ?? ?? ?? ?? FF 10}
     condition:
         $a at pe.entry_point
 }

Specific signatures can be excluded with option -x. This option takes a file that contains signatures to ignore (signatures like 60 68 ?? ?? ?? ?? B8 ?? ?? ?? ?? FF 10, not names like [!EP (ExE Pack) V1.0 -> Elite Coding Group]).

Download my YARA Rules.

peid-userdb-to-yara-rules_V0_0_1.zip (https)
MD5: D5B9B6FA7EC50A107A70419D30FEC9ED
SHA256: F8A12B5522B92AE7E3EDF11ACFAEEA7FDCC7FBDA8DC827D288A2D92B2B2CA5E2

Tuesday 20 January 2015

YARA Rule: Detecting JPEG Exif With eval()

Filed under: Forensics,Malware — Didier Stevens @ 20:39

My first release of 2015 was a new YARA rule to detect JPEG images with an eval() function inside their Exif data.

Such images are not new, but I needed an example to develop a complex YARA rule:

rule JPEG_EXIF_Contains_eval
{
    meta:
        author = "Didier Stevens (https://DidierStevens.com)"
        description = "Detect eval function inside JPG EXIF header (http://blog.sucuri.net/2013/07/malware-hidden-inside-jpg-exif-headers.html)"
        method = "Detect JPEG file and EXIF header ($a) and eval function ($b) inside EXIF data"
    strings:
        $a = {FF E1 ?? ?? 45 78 69 66 00}
        $b = /\Weval\s*\(/
    condition:
        uint16be(0x00) == 0xFFD8 and $a and $b in (@a + 0x12 .. @a + 0x02 + uint16be(@a + 0x02) - 0x06)
}

Here is an example of such an image:

20150120-204954

The YARA rule has 3 conditions that must be satisfied:

  1. JPEG magic header FFD8, tested with: uint16be(0x00) == 0xFFD8
  2. Exif structure: FF E1 ?? ?? 45 78 69 66 00
  3. eval function inside Exif data, tested with a regular expression: \Weval\s*\(

Condition 1 is straightforward: the file must start with FFD8. I’m using test uint16be(0x00) == 0xFFD8 instead of searching for {FF D8} at 0x00. FF D8 is a short string, searching for {FF D8} can cause performance problems (you’ll get a warning from YARA when it compiles rules with such short strings).

Condition 2 checks for the presence of the Exif data header. Bytes 3 and 4 (?? ??) encode the length of the Exif Data.

Condition 3 checks for the presence of the eval function. To reduce the number of false positives that would occur when searching for string eval, we use a regular expression that matches string eval, possibly followed by whitespace characters (\s*), and an opening parenthesis: \(. And we don’t want letters or numbers before the string eval (we don’t want to match a string like deval), eval must be the start of a word. To achieve this with regular expressions, you use a word boundary: \b. So our regular expression would be \beval\s*\(. Unfortunately, YARA’s regular expression engine does not support word boundaries, so I had to come up with something else. I match any character that is not alphanumeric: \W. Be warned that there is a small difference between \W and \b. \b also matches the beginning of a string (like $), while \W has to match a character. So the regular expression I use is \Weval\s*\(.

The eval function must also be found inside the Exif data. We don’t want to trigger on the eval function if it is found somewhere else in the image. That’s where YARA’s in ( .. ) syntax comes in.

The first 18 bytes of the Exif structure are various headers which we ignore, so our eval function $b must start at @a + 0x12 or further.

The total size of the Exif structure is given by expression 0x02 + uint16be(@a + 0x02). We add this to the start of the Exif header (@a): @a + 0x02 + uint16be(@a + 0x02). And finally, we have to subtract the size of the string matched by the regular expression. Unfortunately, YARA has no function to calculate this length. So we will use the minimum length our regular expression can match: 6 characters. So our eval function $b must start no further than @a + 0x02 + uint16be(@a + 0x02) – 0x06. Putting all this together gives: $b in (@a + 0x12 .. @a + 0x02 + uint16be(@a + 0x02) – 0x06)

FYI: Victor told me that he plans to add a string length function to YARA, so our condition will then become: $b in (@a + 0x12 .. @a + 0x02 + uint16be(@a + 0x02) – &b)

You can find all my YARA rules here: YARA Rules.

Wednesday 17 December 2014

Introducing oledump.py

Filed under: Forensics,Malware,My Software — Didier Stevens @ 0:07

If you follow my video blog, you’ve seen my oledump videos and downloaded the preview version. Here is the “official” release.

oledump.py is a program to analyze OLE files (Compound File Binary Format). These files contain streams of data. oledump allows you to analyze these streams.

Many applications use this file format, the best known is MS Office. .doc, .xls, .ppt, … are OLE files (docx, xlsx, … is the new file format: XML insize ZIP).

Run oledump on an .xls file and it will show you the streams:

20141216-223150

The letter M next to stream 7, 8, 9 and 10 indicate that the stream contains VBA macros.

You can select a stream to dump its content:

20141216-223233

The source code of VBA macros is compressed when stored inside a stream. Use option -v to decompress the VBA macros:

20141216-223705

You can write plugins (in Python) to analyze streams. I developed 3 plugins. Plugin plugin_http_heuristics.py uses a couple of tricks to extract URLs from malicious, obfuscated VBA macros, like this:

20141216-224228

You might have noticed that the file analyzed in the above screenshot is a zip file. Like many of my analysis programs, oledump.py can analyze a file inside a (password protected) zip file. This allows you to store your malware samples in password protected zip files (password infected), and then analyze them without having to extract them.

If you install the YARA Python module, you can scan the streams with YARA rules:

20141216-224952

And if you suspect that the content of a stream is encoded, for example with XOR, you can try to brute-force the XOR key with a simple decoder I provide (or you can develop your own decoder in Python):

20141216-225911

This program requires Python module OleFileIO_PL: http://www.decalage.info/python/olefileio

oledump_V0_0_3.zip (https)
MD5: 9D5AA950C9BFDB16D63D394D622C6767
SHA256: 44D8C675881245D3336D6AB6F9D7DAF152B14D7313A77CB8F84A71B62E619A70

Tuesday 16 December 2014

YARA Rules

Filed under: Forensics,Malware — Didier Stevens @ 0:00

Here are some YARA rules I developed.

contains_pe_file will find embedded PE files.

maldoc is a set of rules derived from Frank Boldewin’s OfficeMalScanner signatures, that I also use in my XORSearch program. Their goal is to find shellcode embedded in documents.

20141215-160602

yara-rules-V0.0.1.zip (https)
MD5: 4D869BD838E662E050BBFCB0B89732E4
SHA256: 0CA778EAD97FF43CF7961E3C17A88B77E8782D082CE170FC779543D67B58FC72

Monday 15 December 2014

router-forensics.net

Filed under: Forensics,Networking — Didier Stevens @ 10:20

Together with Xavier Mertens I proposed a Brucon 5×5 project. Our project was accepted, and we bought 23 Cisco routers to teach memory forensics on network devices.

21 routers are used for workshops, and 2 routers are online.

If you want to practice memory forensics with real Cisco IOS devices, go to http://router-forensics.net.

Tuesday 25 November 2014

Update: find-file-in-file.py Version 0.0.4

Filed under: Forensics,My Software,Update — Didier Stevens @ 22:05

Here is the version I talked about in my Bitcoin virus posts.

It also has an embedded man page (use option –man).

find-file-in-file_v0_0_4.zip (https)
MD5: CD381616158BD233D94B368554B824C6
SHA256: FD5C4E3EC99371754E58B93D3D96CBA7A86C230C47FC9C27C9B871ED8BFB9149

Man page:

Usage: find-file-in-file.py [options] file-contained file-containing […]
Find if a file is present in another file

Arguments:
file-containing can be a single file, several files, and/or @file
@file: run the command on each file listed in the text file specified
wildcards are supported
batch mode is enabled when more than one file is specified

Source code put in the public domain by Didier Stevens, no Copyright
Use at your own risk
https://DidierStevens.com

Options:
–version             show program’s version number and exit
-h, –help            show this help message and exit
-m MINIMUM, –minimum=MINIMUM
Minimum length of byte-sequence to find (default 10)
-o, –overlap         Found sequences may overlap
-v, –verbose         Be verbose in batch mode
-p, –partial         Perform partial search of contained file
-O OUTPUT, –output=OUTPUT
Output to file
-b RANGEBEGIN, –rangebegin=RANGEBEGIN
Select the beginning of the contained file (by default
byte 0)
-e RANGEEND, –rangeend=RANGEEND
Select the end of the contained file (by default last
byte)
-x, –hexdump         Hexdump of found bytes
-q, –quiet           Do not output to standard output
–man                 Print manual

Manual:

find-file-in-file is a program to test if one file (the contained
file) can be found inside another file (the containing file).

Here is an example.
We have a file called contained-1.txt with the following content:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
and have a file called containing-1.txt with the following content:
0000ABCDEFGHIJKLM1111NOPQRSTUVWXYZ2222

When we execute the following command:
find-file-in-file.py contained-1.txt containing-1.txt

We get this output:
0x00000004 0x0000000d (50%)
0x00000015 0x0000000d (50%)
Finished

This means that the file contained-1.txt was completely found inside
file containing-1.txt At position 0x00000004 we found a first part
(0x0000000d bytes) and at position 0x00000015 we found a second part
(0x0000000d bytes).

We can use option hexdump (-x) to see which bytes were found:
find-file-in-file.py -x contained-1.txt containing-1.txt
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000015 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished

The containing file may contain the contained file in an arbitrary
order, like file containing-2.txt:
0000NOPQRSTUVWXYZ1111ABCDEFGHIJKLM2222

Example:
find-file-in-file.py -x contained-1.txt containing-2.txt
0x00000015 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000004 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished

The containing file does not need to contain the complete contained
file, like file containing-3.txt:
0000ABCDEFGHIJKLM1111

Example:
find-file-in-file.py -x contained-1.txt containing-3.txt
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
Remaining 13 (50%)

The message “Remaining 13 (50%)” means that the last 13 bytes of the
contained file were not found in the containing file (that’s 50% of
the contained file).

If the contained file starts with a byte sequence not present in the
containing file, nothing will be found. Example with file
contained-2.txt:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ

Nothing is found:
find-file-in-file.py -x contained-2.txt containing-1.txt
Remaining 36 (100%)

If you know how long that initial byte sequence is, you can skip it.
Use option rangebegin (-b) to specify the position in the contained
file from where you want to start searching.
Example:

find-file-in-file.py -x -b 10 contained-2.txt containing-1.txt
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000015 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished

If you want to skip bytes at the end of the contained file, use option
rangeend (-e).

If you don’t know how long that initial byte sequence is, you can
instruct find-file-in-file to “brute-force” it. With option partial
(-p), one byte at a time will be removed from the beginning of the
contained file until a match is found.
Example:

find-file-in-file.py -x -p contained-2.txt containing-1.txt
File: containing-1.txt (partial 0x0a)
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000015 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished

“(partial 0x0a)” tells you that the first 10 bytes of the contained
file were skipped before a match was found.

There are some other options:
-m minimum: find-file-in-file will search for byte sequences of 10
bytes long minimum. If you want to change this minimum, use option -m
minimum.
-o overlap: find-file-in-file will not let byte sequences overlap. Use
option -o overlap to remove this restriction.
-v verbose: be verbose in batch mode (more than one containing file).
-O output: besides writing output to stdout, write the output also to
the given file.
-q quiet: do not output to stdout.

Thursday 24 July 2014

Stoned Bitcoin: My Analysis Tools

Filed under: Encryption,Forensics,Malware,My Software — Didier Stevens @ 0:00

The most interesting thing about Stoned Bitcoin for me, was to work out a method to find these Bitcoin transactions.

When this was mentioned on Twitter, I did a string search through the Bitcoin blockchain for string STONED: no hits.

Some time later I used my find-file-in-file tool. I got a copy of the Stoned Virus (md5 74A6DBB7A60915FE2111E580ACDEEAB7) and searched through the blockchain: again, no hits.

Although this means the blockchain doesn’t contain the start bytes of the Stoned Virus, it could still contain other parts of the virus. So I randomly selected a sequence of bytes from the virus, and used my tool again: I got a hit!

The command: find-file-in-file.py -s 0xFC 74A6DBB7A60915FE2111E580ACDEEAB7.vir blk00129.dat

The output:

0171c33d 00000010 (6%)
Remaining 244 (93%)

These are the bytes I found: 07 00 BA 80 00 CD 13 EB 49 90 B9 03 00 BA 00 01

How to find the transaction containing this byte sequence? A Bitcoin transaction (binary form) starts with a version number (unsigned 32 bit integer, little-endian), this number is currently 1. The ID of a transaction is the SHA-256 hash of the SHA-256 hash of all the bytes in the transaction, and this reversed and expressed in hexadecimal notation. Armed with this information, I was able to find the transaction: f09904aaa4fa4a8ec7da06f5e3d318a9b6a218e1a215f9307416fbbadf5a1c8e.

Finally, I updated my find-file-in-file tool so that I could do partial searches (and a couple of other features), and I wrote a Python script to parse and search the Bitcoin blockchain.

This is what you can do with the new version of find-file-in-file:

20140723-234257

Option partial allows you to search for parts of the file.

Option hexdump does a hexdump of the found bytes.

And options rangebegin and rangeend allow you to limit what you are searching for by specifying the range to search for. This is necessary for the Stoned Virus, because it ends with a sequence of 0x00 bytes, and such sequences are certainly not specific to the Stoned Virus, but omni-present in the blockchain.

Soon I will release these tools.

Monday 30 June 2014

Update: Stoned Bitcoin

Filed under: Encryption,Forensics,Malware,Update — Didier Stevens @ 0:04

kurt wismer pointed me to this post on pastebin after he read my Stoned Bitcoin blogpost. The author of this pastebin post works out a method to spam the Bitcoin blockchain to cause anti-virus (false) positives.

I scanned through all the Bitcoin transactions (until 24/06/2014) for the addresses listed in this pastebin post (the addresses represent antivirus signatures for 400+ malwares).

All these “malicious” Bitcoin addresses, designed to generate anti-virus false positives,  have been exclusively used in the 8 Bitcoin transactions I mentioned in my previous post.

The pastebin entry was posted on 2014/04/02 19:01:08 UTC.

And here are the 8 transactions with the UTC timestamp of the block in which they appear:

Block: 2014/04/03 23:12:48
Transaction: edb83f04e68bfe78bbfe7ce80d33e85acb9335c96ead5712517b8c70d1f27b38
Block: 2014/04/04 01:10:45
Transaction: 7e49504c7cecea7ea95d78ff14687878ba581a21dc0772805d2925c617514129
Block: 2014/04/04 01:43:25
Transaction: f65895220f04aa0084d9abae938d3f517893e3afbffe25fc9e7073e02331b9ed
Block: 2014/04/04 02:58:13
Transaction: 8a445d12f225a21d36bb78da747efd2e74861fcd033757da572c0434d423acd1
Block: 2014/04/04 04:32:24
Transaction: fcf5cf9893a142897598edfc753bd6162e3638e138fc2feaf4a3477c0cfb65eb
Block: 2014/04/04 04:32:24
Transaction: 2814673f0952b936d578d73197bfd371cefbd73c6294bab16de1575a4c3f6e80
Block: 2014/04/04 09:36:29
Transaction: f09904aaa4fa4a8ec7da06f5e3d318a9b6a218e1a215f9307416fbbadf5a1c8e
Block: 2014/04/04 09:36:29
Transaction: 5dbb9df056c36457228a841d6cc98ac90967bc88411c95372d3c2d92c18060f8

So it took a bit more than 24 hours before someone spammed the Bitcoin blockchain with these transactions designed to trigger false positives.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 262 other followers