Didier Stevens

Monday 27 January 2020

Update: hash.py Version 0.0.8

Filed under: My Software,Update — Didier Stevens @ 0:00

In this new version of hash.py, a tool to calculate hashes, I add “hash” checksum8.

Checksum8 calculates the sum of all bytes contained in the provided file(s), each byte is interpreted as an unsigned, 8-bit integer.

I recently had to validate that the path of a URL was a “valid” Meterpreter identifier. When the least significant byte of the 8-bit checksum of the path is equal to 92 (0x5C), then we have a valid URL for a Windows Meterpreter stager.

Take this URL: Could this be a “Windows Meterpreter” URL? Let’s calculate the checksum of RVdP:

The 8-bit checksum of RVdP is 0x015C. The least significant byte is 0x5C, or 92: this matches URI_CHECKSUM_INITW, e.g. this could indeed be a URL used by a reverse http Meterpreter payload.

Besides this new feature, hash.py comes with other features like “pack expressions” and various bug fixes.

hash_V0_0_8.zip (https)
MD5: 03F928332874447F6198A9FDE46E3AA7
SHA256: 80C493639CA7160D1455FABA38A2A04556240326D4BA78B8207CA8FF8B09E1B2

Sunday 26 January 2020

Update: format-bytes.py Version 0.0.11

Filed under: My Software,Update — Didier Stevens @ 0:00

As announced in my previous blog post, this new version of format-bytes.py adds a pack expression (#p#) and other features and (Python 3) bug fixes.

A pack expression is another “here filename”, like #h# for hexadecimal data (which now accepts spaces too).

When format-bytes.py is given a filename as argument, the content of that file is read and processed.

File arguments that start with character # have special meaning. These are not processed as actual files on disk (except when option –literalfilenames is used), but as file arguments that specify how to “generate” the file content. Generating the file content with a # file argument means that the file content is not read from disk, but generated in memory based on the characteristics provided via the file argument. For example, file argument #ABCDE specifies a file containing exactly 5 bytes: ASCII characters A, B, C, D and E.

File arguments that start with #p# are a notational convention to pack a Python expression to generate data (using Python module struct): a “pack expression”.
The string after #p# must contain 2 expressions separated by a # character, like #p#I#123456.
The first expression (I in this example) is the format string for the Python struct.pack function, and the second expression (123456 in this example) is a Python expression that needs to be packed by struct.pack.
In this example, format string I represents an unsigned, 32-bit, little-endian integer, and thus #p#I#123456 generates byte sequence 40E20100 (hexadecimal).
Remark that the Python expression is evaluated with Python’s eval function: this can be abused to achieve arbitrary code execution. Don’t use this in a situation where you have no control over arguments.

I introduced “pack expressions” because I had an IPv4 number represented as a decimal integer, and I needed the dotted quad representation. format-bytes.py will represent 4 bytes as a dotted quad, but I still had to convert a decimal integer to 4 bytes. Hence the introduction of pack expressions (#p#).

For example, number 3232235786 is IPv4 address

Pack expression #p#>I#3232235786 converts number 3232235786 to 4 bytes: >I is the struct format specifier for a big-endian, unsigned 32-bit integer. Remark that I enclose this pack expression in double-quotes (“), as most shells will interpret character > as file redirection if not escaped.

Because of CVE-2020-0601, I also introduced Object Identifier aka OID (DER) decoding. In DER encoding, an OID starts with byte 6 (excluding flags) followed by one byte indicating the length of the bytes representing the OID.

Hexadecimal sequence “06 07 2a 86 48 ce 3d 01 01” is the DER value for OID 1.2.840.10045.1.1.

I also added support for environment variable DSS_DEFAULT_HASH_ALGORITHMS to let you choose your favorite hashing algorithm, in case it is no longer MD5 🙂 .

And last, some (Python 3) bug fixes.


format-bytes_V0_0_11.zip (https)
MD5: D73D5FA410F882F03176CF5FD3E0D90A
SHA256: 34B37CA4E45E4EF0F36F5460CAD429343C0AE993297C104AA8A29C2EE4E7904F

Saturday 25 January 2020

Update: cut-bytes.py Version 0.0.11

Filed under: My Software,Update — Didier Stevens @ 21:59

Some bug fixes and new features (pack expression #p# and spaces allowed for #h#), to be covered in more detail in the next blog post on format-bytes.py.

cut-bytes_V0_0_11.zip (https)
SHA256: C805CBD23E09D80EB2AF39F8F940CC9188EF7F6B27197D018DA95093AC5D0932

Monday 6 January 2020

Analysis Of Unusual ZIP Files

Filed under: Malware,My Software — Didier Stevens @ 0:00

Intrigued by a blog post from SpiderLabs on a special ZIP file they found, I took a closer look myself.

That special ZIP file is a concatenation of 2 ZIP files, the first containing a single PNG file (with extension .jpg) and the second a single EXE file (malware). Various archive managers and security products handle this file differently, some “seeing” only the PNG file, others only the EXE file.

My zipdump.py tool reports the following for this special ZIP file:

zipdump.py is essentially a wrapper for Python’s zipfile module, and this module parses ZIP files “starting from the end of the file”. That’s why it finds the second ZIP file (appended to the first ZIP file), containing the malicious EXE file.

To help with the analysis of such special/malformed ZIP files, I added an option (-f –find) to zipdump. This option scans the content of the provided file looking for ZIP records. ZIP records start with ASCII string PK followed by 2 bytes to indicate the record type (byte values less than 16).

Here I use option “-f list” to list all PK records found in a ZIP file containing a single text file:

This is how a normal ZIP file containing a single file looks on the inside.

The file starts with a “local file header”, a PK record that starts with ASCII characters PK followed by bytes 0x03 and 0x04 (that’s 50 4B 03 04 in hexadecimal). In zipdump’s report, such a PK record is identified with PK0304. This header is followed by the contained file (usually compressed).

Then there is a “central directory header”, a PK record that starts with ASCII characters PK followed by bytes 0x01 and 0x02 (that’s 50 4B 01 02 in hexadecimal). In zipdump’s report, such a PK record is identified with PK0102. This header contains an offset pointing to the corresponding PK0304 record.

And at the end of the ZIP file, there is a “end of central directory”, a PK record that starts with ASCII characters PK followed by bytes 0x05 and 0x06 (that’s 50 4B 05 06 in hexadecimal). In zipdump’s report, such a PK record is identified with PK0506. This header contains an offset pointing to the first PK0102 record.

A ZIP file containing 2 files looks like this, when scanned with zipdump’s option -f list:

Starting with 2 PK0304 records (one for each contained file), followed by 2 PK0102 records, and 1 PK0506 record.

Armed with this knowledge, we take a look at our malicious ZIP file:

We see 2 PK0506 records, and this is unusual.

We see the following sequence of records twice: PK0304, PK0102, PK0506.

From our previous examples, we can now understand that this sample contains 2 ZIP files.

Remark that zipdump assigned an index to both PK0506 records: 1 and 2. This index can be used to select one of the 2 ZIP files for further analysis. Like in this example, where I select the first ZIP file:

Using option “-f 1” (in stead of “-f list”) selects the first ZIP file in the provide sample, and lists its content.

It can then be further analyzed with zipdump like usual, for example, selecting the first file (order.jpg) inside the first ZIP file for an hex/ascii dump:

Likewise, “-f 2” will select the second ZIP file found inside the sample:

-f is a new option that I added for special/malformed ZIP files, but this is a work in progress, as there are many ways to malform ZIP files.

For example, I created a PoC malformed ZIP file that contains a single file, with reversed PK record order. Here is the output for the normal and “reversed” zip files (malformed, e.g. PK records order reversed):

This file can be opened with Windows Explorer, but there are tools and libraries than can not handle it. Like Python’s zipfile module:

I will further develop zipdump to handle malformed ZIP files as best as possible.

The current version (zipdump 0.0.16) is just a start:

  • it parses only 3 PK record types (PK0304, PK0102 and PK0506), other types are ignored
  • it does minimal parsing of these records: for example, there is no parsing/checking of offsets in this version

And finally, I also created a video showing how to use this new feature:

Tuesday 31 December 2019

YARA “Ad Hoc Rules”

Filed under: My Software — Didier Stevens @ 14:42

Several of my tools support YARA rules.

And of those tools, many support what I like to call “Ad Hoc rules” (or Here rules).

An Ad Hoc YARA rule is a rule that isn’t stored in a file, but is passed via the command line, and is generated ad hoc by the tool for you.

Take for example oledump.py.

When you issue the command “oledump.py -y trojan.yara sample.vir”, oledump will load all the rules found inside file trojan.yara, and scan the streams of document sample.vir with these rules.

But if you want to search for a simple string, say “virus.exe”, then you have to create a YARA rule to search for this string, store it inside a file, and pass this file to oledump via option -y.

Ad hoc rules make this process simpler. Ad hoc rules start with #.

To generate an ad hoc rule for a string, use prefix #s#. Like this:

oledump.py -y “#s#virus.exe sample.vir”

This will generate the following YARA rule:

rule string {strings: $a = “virus.exe” ascii wide nocase condition: $a}

You can also use #x# for hexadecimal, oledump.py -y “#x#D0 CF 11 E0” sample.vir:

rule hexadecimal {strings: $a = { D0 CF 11 E0 } condition: $a}

And #r# for regular expressions, oledump.py -y “#r#[a-z]+” sample.vir

rule regex {strings: $a = / [a-z]+ / ascii wide nocase condition: $a}

And you can also pass YARA rules literally (#), hexadecimal encoded (#h#) and base64 encoded (#b#).

And finally, for passing rules literally with double-quotes (“), you can use #q#: this will replace every single quote (‘) with a double quote (“).


Sunday 29 December 2019

Update: pdf-parser.py Version 0.7.4 and pdfid.py Version 0.2.7

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This is a bug fix version.

pdf-parser_V0_7_4.zip (https)
MD5: 51C6925243B91931E7FCC1E39A7209CF
SHA256: FC318841952190D51EB70DAFB0666D7D19652C8839829CC0C3871BBF7E155B6A

pdfid_v0_2_7.zip (https)
MD5: F1852F238386681C2DC40752669B455B
SHA256: FE2B59FE458ECBC1F91A40095FB1536E036BDD4B7B480907AC4E387D9ADB6E60

Saturday 28 December 2019

Update: zipdump.py Version 0.0.16

Filed under: My Software,Update — Didier Stevens @ 0:00

This new version of zipdump.py, a tool to analyze ZIP files, adds option -f to scan for PK records and adds support for Python 3.

More details in an upcoming blog post.

zipdump_v0_0_16.zip (https)
MD5: 616654BDAFFDA1DDE074E6D1A41E8A42
SHA256: F3B6D52BA32D6BA3836D0919F2BBC262F043EF6E26D173DD0965735D4F3B5598

Wednesday 25 December 2019


Filed under: My Software — Didier Stevens @ 13:52

I regularly want to test the behavior of applications opening files downloaded from the Internet.

On Windows, files downloaded from the Internet (with Internet Explorer or Edge, for example) have metadata in an Alternate Data Stream to indicate their origin. This is the Zone.Identifier ADS.

To simulate a download, I will add the ADS myself, and I often refer to my own blog post here and here, as I don’t remember the exact syntax and numbers.

Until recently.

Now, I wrote a small Go program that helps me creating (and removing) the appropriate ADS for a mark-of-web (Zone.Identifier).

Just running zoneidentifier with a filename, will add a Zone.Identifier ADS for zone 3 (Internet) to the file. Like this:

Option -id is used to specify a different zone ID, like this:

And option -remove is used to remove a Zone.Identifier ADS:

zoneidentifier_V0_0_1.zip (https)
MD5: CB1EB21013C6124CB3C1320F6A12207F
SHA256: E867AE693CB5EEA8CF0D252421E347B1309D7F36C9C6A427F7361CD5DD619839

Thursday 19 December 2019

Update: oledump.py Version 0.0.44

Filed under: maldoc,My Software,Update — Didier Stevens @ 0:00

This new version of oledump adds option -f to find embedded ole files, making the analysis of .DWG files with embedded VBA macros (for example) easier.

And there is a new plugin: plugin_version_vba.py. This helps with determining the VBA version.

Here is a video showing the analysis of .DWG files with option -f:

oledump_V0_0_44.zip (https)
MD5: 2BB2CD027327FFD8857CDADC1C988133
SHA256: 1A9C951E95E2FE0FDF3A3DC8E331205BC65C617953F0E30ED3E6AC045F4DD0C0

Monday 9 December 2019

Update: oledump.py Version 0.0.43

Filed under: My Software,Update — Didier Stevens @ 0:00

This new version of oledump.py adds support for Python 3. Several plugins and decoders were also updated for Python 3.

There’s a new option to include storages in the overview: –storages.

And option –decompress now does also VBA decompression (it was zlib only). This helps to decompress the dir stream of documents with VBA macros:

And I added type 1009 to plugin_msg.py: Compressed RTF.

oledump_V0_0_43.zip (https)
MD5: F98A06CED73C4FC2CA153B7E751746B5
SHA256: 4FE1DBAB822CEC2489328CE3D4D272400F23F1FAD266C9D89B49D9F83F3AA27F

Next Page »

Blog at WordPress.com.