Didier Stevens

Tuesday 10 November 2015

Update: oledump V0.0.20

Filed under: My Software,Update — Didier Stevens @ 0:00

Option -c calculates extra data per stream. This data is displayed per stream. Only the MD5 hash of the content of the stream is calculated.
Example:
C:\Demo>oledump.py -c Book1.xls
1:      4096 ‘\x05DocumentSummaryInformation’ ff1773dce227027d410b09f8f3224a56
2:      4096 ‘\x05SummaryInformation’ b46068f38a3294ca9163442cb8271028
3:      4096 ‘Workbook’ d6a5bebba74fb1adf84c4ee66b2bf8dd

In stead of adding more calculations to option -c, I added option -E (extra) which allows the user to specify which extra info needs to be displayed. From the man page:

If you need more data than the MD5 of each stream, use option -E
(extra). This option takes a parameter describing the extra data that
needs to be calculated and displayed for each stream. The following
variables are defined:
  %INDEX%: the index of the stream
  %INDICATOR%: macro indicator
  %LENGTH%': the length of the stream
  %NAME%: the printable name of the stream
  %MD5%: calculates MD5 hash
  %SHA1%: calculates SHA1 hash
  %SHA256%: calculates SHA256 hash
  %ENTROPY%: calculates entropy
  %HEADHEX%: display first 20 bytes of the stream as hexadecimal
  %HEADASCII%: display first 20 bytes of the stream as ASCII
  %TAILHEX%: display last 20 bytes of the stream as hexadecimal
  %TAILASCII%: display last 20 bytes of the stream as ASCII
  %HISTOGRAM%: calculates a histogram
                 this is the prevalence of each byte value (0x00 through 0xFF)
                 at least 3 numbers are displayed separated by a comma:
                 number of values with a prevalence > 0
                 minimum values with a prevalence > 0
                 maximum values with a prevalence > 0
                 each value with a prevalence > 0
  %BYTESTATS%: calculates byte statistics
                 byte statistics are 5 numbers separated by a comma:
                 number of NULL bytes
                 number of control bytes
                 number of whitespace bytes
                 number of printable bytes
                 number of high bytes

The parameter for -E may contain other text than the variables, which
will be printed. Escape characters \n and \t are supported.
Example displaying the MD5 and SHA256 hash per stream, separated by a
space character:
C:\Demo>oledump.py -E "%MD5% %SHA256%" Book1.xls
  1:      4096 '\x05DocumentSummaryInformation' ff1773dce227027d410b09f8f3224a56 2817c0fbe2931a562be17ed163775ea5e0b12aac203a095f51ffdbd5b27e7737
  2:      4096 '\x05SummaryInformation' b46068f38a3294ca9163442cb8271028 2c3009a215346ae5163d5776ead3102e49f6b5c4d29bd1201e9a32d3bfe52723
  3:      4096 'Workbook' d6a5bebba74fb1adf84c4ee66b2bf8dd 82157e87a4e70920bf8975625f636d84101bbe8f07a998bc571eb8fa32d3a498

If the extra parameter starts with !, then it replaces the complete
output line (in stead of being appended to the output line).
Example:
C:\Demo>oledump.py -E "!%INDEX% %MD5%" Book1.xls
1 ff1773dce227027d410b09f8f3224a56
2 b46068f38a3294ca9163442cb8271028
3 d6a5bebba74fb1adf84c4ee66b2bf8dd

To include extra data with each use of oledump, define environment
variable OLEDUMP_EXTRA with the parameter that should be passed to -E.
When environment variable OLEDUMP_EXTRA is defined, option -E can be
ommited. When option -E is used together with environment variable
OLEDUMP_EXTRA, the parameter of option -E is used and the environment
variable is ignored.

oledump_V0_0_20.zip (https)
MD5: 715B33E8E090F2A061DB2EA5A913055F
SHA256: 056CC911AEDFFB48B756F1B941E14660EBA8B613C65B1026F5DA77FB3047DAE3

Monday 9 November 2015

byte-stats.py

Filed under: My Software — Didier Stevens @ 0:00

I have a new tool that calculates byte statistics for files, like entropy. I used it recently to help me recover images from a ransomware infection, as described in these SANS ISC Diary entries:

 

Usage: byte-stats.py [options] [files ...]
Calculate byte statistics

files:
wildcards are supported
@file: run command on each file listed in the text file specified

Source code put in the public domain by Didier Stevens, no Copyright
Use at your own risk
https://DidierStevens.com

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -m, --man             Print manual
  -d, --descending      Sort descending
  -k, --keys            Sort on keys in stead of counts
  -b BUCKET, --bucket=BUCKET
                        Size of bucket (default is 10240 bytes)
  -l, --list            Print list of bucket property
  -p PROPERTY, --property=PROPERTY
                        Property to list: encwph
  -a, --all             Print all byte stats
  -s, --sequence        Detect simple sequences
  -f FILTER, --filter=FILTER
                        Minimum length of sequence for displaying (default 0)

Manual:

byte-stats is a tool to calculate byte statistics of the content of files. It
helps to determine the type or content of a file.

Let's start with some examples.
all.bin is a 256-byte large file, containing all possible byte values. The
bytes are ordered: the first byte is 0x00, the second one is 0x01, the third
one is 0x02, ... and the last one is 0xFF.

$byte-stats.py all.bin

Byte ASCII Count     Pct
0x00           1   0.39%
0x01           1   0.39%
0x02           1   0.39%
0x03           1   0.39%
0x04           1   0.39%
...
0xfb           1   0.39%
0xfc           1   0.39%
0xfd           1   0.39%
0xfe           1   0.39%
0xff           1   0.39%

Size: 256

                   File(s)
Entropy:           8.000000
NULL bytes:               1   0.39%
Control bytes:           27  10.55%
Whitespace bytes:         6   2.34%
Printable bytes:         94  36.72%
High bytes:             128  50.00%

First byte-stats.py will display a histogram of byte values found in the
file(s). The first column is the byte value in hex (Byte), the second column is
its ASCII value, third column tells us how many times the byte value appears
(Count) and the last column is the percentage (Pct).
This histogram is sorted by Count (ascending). To change the order use option
-d (descending), to sort by byte value use option -k (key).
By default, the first 5 and last 5 entries of the histogram are displayed. To
display all values, use option -a (all).

After the histogram, the size of the file(s) is displayed.

Finally, the following statistics for the files(s) are displayed:
* Entropy (between 0.0 and 8.0).
* Number and percentage of NULL bytes (0x00).
* Number and percentage of Control bytes (0x01 through 0x1F, excluding
whitespace bytes and including 0x7F).
* Number and percentage of Whitespace bytes (0x09 through 0x0D and 0x20).
* Number and percentage of Printable bytes (0x21 through 0x7E).
* Number and percentage of High bytes (0x80 through 0xFF).

byte-stats.py will also split the file in equally sized parts (called buckets)
and perform the same calculations for these buckets. The default size of a
bucket is 10KB (10240 bytes), but can be chosen with option -b (bucket). If the
file is smaller than the bucket size, no bucket calculations are performed. If
the file size is not an exact multiple of the bucket size, then no calculations
are done for the last bucket (because it is incomplete).

Here is an example with buckets (file random.bin just contains random bytes):

$byte-stats.py random.bin

Byte ASCII Count     Pct
0xce         242   0.32%
0x14         248   0.33%
0x52 R       251   0.34%
0xba         251   0.34%
0x3e >       256   0.34%
...
0x2e .       332   0.44%
0x45 E       336   0.45%
0xc9         336   0.45%
0x1b         338   0.45%
0x75 u       344   0.46%

Size: 74752  Bucket size: 10240  Bucket count: 7

                   File(s)           Minimum buckets   Maximum buckets
Entropy:           7.997180          7.981543          7.984125
                   Position:         0x0000f000        0x00005000
NULL bytes:             303   0.41%        34   0.33%        44   0.43%
Control bytes:         7888  10.55%      1046  10.21%      1117  10.91%
Whitespace bytes:      1726   2.31%       220   2.15%       254   2.48%
Printable bytes:      27278  36.49%      3680  35.94%      3812  37.23%
High bytes:           37557  50.24%      5096  49.77%      5211  50.89%

Besides the file size (74752), the size of the bucket (10240) and the number of
buckets (7) is displayed.
And next to the entropy and byte counters for the complete file, the entropy
and byte counters are calculated for each bucket. The minimum values for the
bucket entropy and byte counters are displayed (Minimum buckets), and also the
maximum values (Maximum buckets).
Position gives the start of the bucket with minimum entropy and maximum entropy
in hexadecimal.
A significant difference between the overal statistics and bucket statistics
can indicate a file that is not uniform in its content.
Like in this picture "encrypted" by ransomware:

$byte-stats.py picture.jpg.ransom

Byte ASCII Count     Pct
0x44 D      1172   0.13%
0x16        1310   0.15%
0x22 "      1371   0.16%
0xc2        1421   0.16%
0x17        1437   0.16%
...
0x7a z      7958   0.91%
0x82        8006   0.91%
0x7e ~      8571   0.98%
0x80       22232   2.53%
0x00       23873   2.72%

Size: 877456  Bucket size: 10240  Bucket count: 85

                   File(s)           Minimum buckets   Maximum buckets
Entropy:           7.815519          5.156678          7.981628
                   Position:         0x00019000        0x00005000
NULL bytes:           23873   2.72%         8   0.08%      1643  16.04%
Control bytes:        92243  10.51%        98   0.96%      1275  12.45%
Whitespace bytes:     16241   1.85%         1   0.01%       263   2.57%
Printable bytes:     303975  34.64%      2476  24.18%      5219  50.97%
High bytes:          441124  50.27%      3728  36.41%      6772  66.13%

The entropy for the file is 7.815519 (encrypted or compressed), but there is
one part of the file (bucket) with an entropy of (5.156678). This part is not
encrypted or compressed.
To locate this part, option -l (list) can be used to list the entropy values
for each bucket:

$byte-stats.py -l picture.jpg.ransom

0x00000000 7.978380
0x00002800 7.979475
0x00005000 7.981628
0x00007800 7.267890
0x0000a000 6.579047
0x0000c800 6.798210
0x0000f000 6.733402
0x00011800 6.496882
0x00014000 5.743983
0x00016800 5.488550
0x00019000 5.156678
0x0001b800 5.330629
0x0001e000 6.057448
0x00020800 6.425884
0x00023000 6.880007
0x00025800 6.856647
...

The bucket starting at position 0x00019000 has the lowest entropy.

A list for the other properties (NULL bytes, ...) can be produced by using
option -l together with option -p (property). For example options "-l -p n"
will produce a list of the number of NULL bytes for each bucket.

Option -s (sequence) instructs byte-stats to search for simple byte sequences.
A simple byte sequence is a sequence of bytes where the difference (unsigned)
between 2 consecutive bytes is a constant.
Example:

$byte-stats.py -s picture.jpg.ransom

Byte ASCII Count     Pct
0x44 D      1172   0.13%
0x16        1310   0.15%
0x22 "      1371   0.16%
0xc2        1421   0.16%
0x17        1437   0.16%
...
0x7a z      7958   0.91%
0x82        8006   0.91%
0x7e ~      8571   0.98%
0x80       22232   2.53%
0x00       23873   2.72%

Size: 877456  Bucket size: 10240  Bucket count: 85

                   File(s)           Minimum buckets   Maximum buckets
Entropy:           7.815519          5.156678          7.981628
                   Position:         0x00019000        0x00005000
NULL bytes:           23873   2.72%         8   0.08%      1643  16.04%
Control bytes:        92243  10.51%        98   0.96%      1275  12.45%
Whitespace bytes:     16241   1.85%         1   0.01%       263   2.57%
Printable bytes:     303975  34.64%      2476  24.18%      5219  50.97%
High bytes:          441124  50.27%      3728  36.41%      6772  66.13%

Position    Length Diff Bytes
0x00013984:    246  128 0x8000800080008000800080008000800080008000...
0x00013c01:    206  128 0x0080008000800080008000800080008000800080...
0x0001b186:    205  128 0x8000800080008000800080008000800080008000...
0x0001b406:    205  128 0x8000800080008000800080008000800080008000...
0x0001b906:    204  128 0x8000800080008000800080008000800080008000...
0x0001bb86:    204  128 0x8000800080008000800080008000800080008000...
0x0001be06:    200  128 0x8000800080008000800080008000800080008000...
0x0001c086:    200  128 0x8000800080008000800080008000800080008000...
0x0001c306:    200  128 0x8000800080008000800080008000800080008000...
0x0001c586:    196  128 0x8000800080008000800080008000800080008000...

Position is the start of the detected sequence, Length is the number of bytes
in the sequence, Diff is the difference (unsigned) between 2 consecutive bytes
and Bytes displays the hex values of the start of the sequence.
By default, the 10 longest sequences are displayed. All sequences (minimum 3
bytes long) can be displayed with option -a. To sort the sequences by position
use option -k (key). To filter the sequences by length, use option -f.

Sequence detection is useful as an extra check when the entropy and byte
counters indicate the file is random:

$byte-stats.py -s not-random.bin

Byte ASCII Count     Pct
0x00          16   0.39%
0x01          16   0.39%
0x02          16   0.39%
0x03          16   0.39%
0x04          16   0.39%
...
0xfb          16   0.39%
0xfc          16   0.39%
0xfd          16   0.39%
0xfe          16   0.39%
0xff          16   0.39%

Size: 4096

                   File(s)
Entropy:           8.000000
NULL bytes:              16   0.39%
Control bytes:          432  10.55%
Whitespace bytes:        96   2.34%
Printable bytes:       1504  36.72%
High bytes:            2048  50.00%

Position    Length Diff Bytes
0x00000000:   4096    1 0x000102030405060708090a0b0c0d0e0f10111213...

byte-stats_V0_0_3.zip (https)
MD5: 4287A94EC56E0BF5A936C2A16DA7F2B4
SHA256: 310B15865B332FF62F2C70CE441D322491DB79BC5D1C8D8BBC9A7245005491B5

Sunday 8 November 2015

Update: translate.py V2.1.0

Filed under: My Software,Update — Didier Stevens @ 0:00

Translate is a Python tool to translate files; you give it a Python expression that converts the input file byte per byte to the output file.

In this update, I added option -f (fullread) to process files in one go, and not byte per byte.

It works just like the byte per byte process, but in stead of a Python expression that transform a byte, you provide a Python function that transforms a string. This Python function must take a string as argument (the content of the file) and return a string as argument (the converted file).

I used this in my “Analysis Of An Office Maldoc With Encrypted Payload (Slow And Clean)” post.

translate_v2_1_0.zip (https)
MD5: AF8B1FB7A48AFC519F7656763A95980C
SHA256: 6C65ABE811263E1F687DEDB0A1064C141FFEEA5105BE3C925972BC0B9CE73FC0

Saturday 7 November 2015

Analysis Of An Office Maldoc With Encrypted Payload: oledump plugin

Filed under: maldoc,Malware,My Software,Reverse Engineering — Didier Stevens @ 0:00

After a quick and dirty analysis and a “slow and clean” analysis of a malicious document, we can integrate the Python decoder function into a plugin: the plugin_dridex.py

First we add function IpkfHKQ2Sd to the plugin. The function uses the array module, so we need to import it (line 30):

20151106-222710

Then we can add the IpkfHKQ2Sd function (line 152):

20151106-222928

And then we can add function IpkfHKQ2Sd to the list in line 217:

20151106-223132

This is the code that tries different decoding functions that take 2 arguments: a secret and a key.

I also added code (from plugin_http_heuristics) to support Chr concatenations:

20151106-223608

The result is that the plugin can now extract the URLs from this sample:

20151106-222050

Download:
oledump_V0_0_19.zip (https)
MD5: DBE32C21C564DB8467D0064A7D4D92BC
SHA256: 7F8DCAA2DE9BB525FB967B7AEB2F9B06AEB5F9D60357D7B3D14DEFCB12FD3F94

Friday 6 November 2015

Analysis Of An Office Maldoc With Encrypted Payload (Slow And Clean)

Filed under: maldoc,Malware,My Software,Reverse Engineering — Didier Stevens @ 0:00

In my previous post we used VBA and Excel to decode the URL and the PE file.

In this  post we will use Python. I translated the VBA decoding function IpkfHKQ2Sd to Python:

20151105-223017

Now we can decode the URL using Python:

20151105-223901

And also decode the downloaded file with my translate program and the IpkfHKQ2Sd function:

20151105-224328

20151105-224636

 

Thursday 5 November 2015

Analysis Of An Office Maldoc With Encrypted Payload (Quick And Dirty)

Filed under: maldoc,Malware,My Software,Reverse Engineering — Didier Stevens @ 0:00

The malicious office document we’re analyzing is a downloader: 0e73d64fbdf6c87935c0cff9e65fa3be

oledump reveals VBA macros in the document, but the plugins are not able to extract a URL:

20151104-194727

Let’s use a new plugin that I wrote: plugin_vba_dco. This plugin searches for Declare statements and CreateObject calls:

20151104-194827

In the first half of the output (1) we see all lines containing the Declare or CreateObject keyword. In the second half of the output (2) we see all lines containing calls to declared functions or created objects.

Although the code is obfuscated (obfuscation of strings and variable names), the output of this plugin allows us to guess that Ci8J27hf2 is probably a XMLHTTP object, because of the .Open, .send, .Status, … methods and properties.

The Open method of the XMLHTTP object takes 3 parameters: the HTTP method, the URL and a boolean (asynchronous or synchronous call):

20151104-195006

As we can see, the third parameter is False and the first 2 parameters are the return value of a function called IpkfHKQ2Sd. This function takes 2 parameters: 2 strings. The first string is the result of concatenated Chr functions, and the second string is a literal string. Since the Open method requires the HTTP method and URL as strings, is very likely that function IpkfHKQ2Sd is a decoding function that takes 2 strings as input (meaningless to us) and returns a meaningful string.

Here is the original IpkfHKQ2Sd function. It’s heavily obfuscated:

20151104-195102

Here is the same function that I deobfuscated. I didn’t change the function name, but I removed all useless code, renamed variables and added indentation:

20151104-195144

We can now see that this function uses a key (sKey) and XOR operations to decode a secret string (sSecret). And now we can also see that this is just a string manipulation function. It does not contain malicious or dangerous statements or function calls. So it is safe to use in a VBA interpreter, we don’t need to translate it into another language like Python.

We are going to use this deobfuscated function in a new spreadsheet to decode the URL parameter:

20151104-195359

In the VBA editor of this new spreadsheet, we have the deobfuscated IpkfHKQ2Sd function and a test subroutine that calls the IpkfHKQ2Sd function with strings that we found in the .Open method for the URL argument. The decoded string returned by function IpkfHKQ2Sd is displayed via MsgBox. Executing this test subroutine reveals the URL:

20151104-195410

Downloading this file, we see it’s not a JPEG file, but contrary to what we could expect, it’s neither an EXE file:

20151104-195912

Searching for .responseBody in the VBA code, we see that the downloaded file (present in .responseBody) is passed as an argument to function IpkfHKQ2Sd:

20151104-195823

This means that the downloaded file is also encoded. It needs to be decoded with the same function as we used for the URL: function IpkfHKQ2Sd (but with another key).

To convert this file with the deobfuscated function in our spreadsheet, we need to load the file in the spreadsheet, decode it, and save the decoded file to disk. This can be done with my FileContainer.xls tool (to be released). First we load the encoded file in the FileContainer:

20151104-200044

20151104-200105

FileContainer supports file conversion: we have to use command C and push the Process Files button:

20151104-200125

Here is the default conversion function Convert. This default function doesn’t change the file: the output is equal to the input:

20151104-200214

To decode the file, we need to update the Convert function to call the decoding function IpkfHKQ2Sd with the right key. Like this:

20151104-200424

And then, when we convert the file, we obtain an EXE file:

20151104-200952

This EXE turns out to be Dridex malware: 50E3407557500FCD0D81BB6E3B026404

Remark: reusing code from malware is dangerous unless we know exactly what the code does. To decode the downloaded file quickly, we reused the decoding VBA function IpkfHKQ2Sd (I did not translate it into another language like Python). But to be sure it was not malicious, I deobfuscated it first. The deobfuscation process gave me the opportunity to look at each individual statement, thereby giving me insight into the code and come to the conclusion that this function is not dangerous. We could also have used the obfuscated function, but then we ran the risk that malware would execute because we did not fully understand what the obfuscated function did.

Translating the obfuscating function to another language doesn’t make it less dangerous, but it allows us to execute it in a non-Windows environment (like Linux), thereby preventing Windows malware from executing.

Wednesday 4 November 2015

Overview of Content Published In October

Filed under: Announcement — Didier Stevens @ 15:31

Here is an overview of content I published in October:

Blog posts:

Videoblog posts:

SANS ISC Diary entries:

« Previous Page

Blog at WordPress.com.