Didier Stevens

Monday 6 January 2020

Analysis Of Unusual ZIP Files

Filed under: Malware,My Software — Didier Stevens @ 0:00

Intrigued by a blog post from SpiderLabs on a special ZIP file they found, I took a closer look myself.

That special ZIP file is a concatenation of 2 ZIP files, the first containing a single PNG file (with extension .jpg) and the second a single EXE file (malware). Various archive managers and security products handle this file differently, some “seeing” only the PNG file, others only the EXE file.

My zipdump.py tool reports the following for this special ZIP file:

zipdump.py is essentially a wrapper for Python’s zipfile module, and this module parses ZIP files “starting from the end of the file”. That’s why it finds the second ZIP file (appended to the first ZIP file), containing the malicious EXE file.

To help with the analysis of such special/malformed ZIP files, I added an option (-f –find) to zipdump. This option scans the content of the provided file looking for ZIP records. ZIP records start with ASCII string PK followed by 2 bytes to indicate the record type (byte values less than 16).

Here I use option “-f list” to list all PK records found in a ZIP file containing a single text file:

This is how a normal ZIP file containing a single file looks on the inside.

The file starts with a “local file header”, a PK record that starts with ASCII characters PK followed by bytes 0x03 and 0x04 (that’s 50 4B 03 04 in hexadecimal). In zipdump’s report, such a PK record is identified with PK0304. This header is followed by the contained file (usually compressed).

Then there is a “central directory header”, a PK record that starts with ASCII characters PK followed by bytes 0x01 and 0x02 (that’s 50 4B 01 02 in hexadecimal). In zipdump’s report, such a PK record is identified with PK0102. This header contains an offset pointing to the corresponding PK0304 record.

And at the end of the ZIP file, there is a “end of central directory”, a PK record that starts with ASCII characters PK followed by bytes 0x05 and 0x06 (that’s 50 4B 05 06 in hexadecimal). In zipdump’s report, such a PK record is identified with PK0506. This header contains an offset pointing to the first PK0102 record.

A ZIP file containing 2 files looks like this, when scanned with zipdump’s option -f list:

Starting with 2 PK0304 records (one for each contained file), followed by 2 PK0102 records, and 1 PK0506 record.

Armed with this knowledge, we take a look at our malicious ZIP file:

We see 2 PK0506 records, and this is unusual.

We see the following sequence of records twice: PK0304, PK0102, PK0506.

From our previous examples, we can now understand that this sample contains 2 ZIP files.

Remark that zipdump assigned an index to both PK0506 records: 1 and 2. This index can be used to select one of the 2 ZIP files for further analysis. Like in this example, where I select the first ZIP file:

Using option “-f 1” (in stead of “-f list”) selects the first ZIP file in the provide sample, and lists its content.

It can then be further analyzed with zipdump like usual, for example, selecting the first file (order.jpg) inside the first ZIP file for an hex/ascii dump:

Likewise, “-f 2” will select the second ZIP file found inside the sample:

-f is a new option that I added for special/malformed ZIP files, but this is a work in progress, as there are many ways to malform ZIP files.

For example, I created a PoC malformed ZIP file that contains a single file, with reversed PK record order. Here is the output for the normal and “reversed” zip files (malformed, e.g. PK records order reversed):

This file can be opened with Windows Explorer, but there are tools and libraries than can not handle it. Like Python’s zipfile module:

I will further develop zipdump to handle malformed ZIP files as best as possible.

The current version (zipdump 0.0.16) is just a start:

  • it parses only 3 PK record types (PK0304, PK0102 and PK0506), other types are ignored
  • it does minimal parsing of these records: for example, there is no parsing/checking of offsets in this version

And finally, I also created a video showing how to use this new feature:

Tuesday 31 December 2019

YARA “Ad Hoc Rules”

Filed under: My Software — Didier Stevens @ 14:42

Several of my tools support YARA rules.

And of those tools, many support what I like to call “Ad Hoc rules” (or Here rules).

An Ad Hoc YARA rule is a rule that isn’t stored in a file, but is passed via the command line, and is generated ad hoc by the tool for you.

Take for example oledump.py.

When you issue the command “oledump.py -y trojan.yara sample.vir”, oledump will load all the rules found inside file trojan.yara, and scan the streams of document sample.vir with these rules.

But if you want to search for a simple string, say “virus.exe”, then you have to create a YARA rule to search for this string, store it inside a file, and pass this file to oledump via option -y.

Ad hoc rules make this process simpler. Ad hoc rules start with #.

To generate an ad hoc rule for a string, use prefix #s#. Like this:

oledump.py -y “#s#virus.exe sample.vir”

This will generate the following YARA rule:

rule string {strings: $a = “virus.exe” ascii wide nocase condition: $a}

You can also use #x# for hexadecimal, oledump.py -y “#x#D0 CF 11 E0” sample.vir:

rule hexadecimal {strings: $a = { D0 CF 11 E0 } condition: $a}

And #r# for regular expressions, oledump.py -y “#r#[a-z]+” sample.vir

rule regex {strings: $a = / [a-z]+ / ascii wide nocase condition: $a}

And you can also pass YARA rules literally (#), hexadecimal encoded (#h#) and base64 encoded (#b#).

And finally, for passing rules literally with double-quotes (“), you can use #q#: this will replace every single quote (‘) with a double quote (“).


Sunday 29 December 2019

Update: pdf-parser.py Version 0.7.4 and pdfid.py Version 0.2.7

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This is a bug fix version.

pdf-parser_V0_7_4.zip (https)
MD5: 51C6925243B91931E7FCC1E39A7209CF
SHA256: FC318841952190D51EB70DAFB0666D7D19652C8839829CC0C3871BBF7E155B6A

pdfid_v0_2_7.zip (https)
MD5: F1852F238386681C2DC40752669B455B
SHA256: FE2B59FE458ECBC1F91A40095FB1536E036BDD4B7B480907AC4E387D9ADB6E60

Saturday 28 December 2019

Update: zipdump.py Version 0.0.16

Filed under: My Software,Update — Didier Stevens @ 0:00

This new version of zipdump.py, a tool to analyze ZIP files, adds option -f to scan for PK records and adds support for Python 3.

More details in an upcoming blog post.

zipdump_v0_0_16.zip (https)
MD5: 616654BDAFFDA1DDE074E6D1A41E8A42
SHA256: F3B6D52BA32D6BA3836D0919F2BBC262F043EF6E26D173DD0965735D4F3B5598

Wednesday 25 December 2019


Filed under: My Software — Didier Stevens @ 13:52

I regularly want to test the behavior of applications opening files downloaded from the Internet.

On Windows, files downloaded from the Internet (with Internet Explorer or Edge, for example) have metadata in an Alternate Data Stream to indicate their origin. This is the Zone.Identifier ADS.

To simulate a download, I will add the ADS myself, and I often refer to my own blog post here and here, as I don’t remember the exact syntax and numbers.

Until recently.

Now, I wrote a small Go program that helps me creating (and removing) the appropriate ADS for a mark-of-web (Zone.Identifier).

Just running zoneidentifier with a filename, will add a Zone.Identifier ADS for zone 3 (Internet) to the file. Like this:

Option -id is used to specify a different zone ID, like this:

And option -remove is used to remove a Zone.Identifier ADS:

zoneidentifier_V0_0_1.zip (https)
MD5: CB1EB21013C6124CB3C1320F6A12207F
SHA256: E867AE693CB5EEA8CF0D252421E347B1309D7F36C9C6A427F7361CD5DD619839

Thursday 19 December 2019

Update: oledump.py Version 0.0.44

Filed under: maldoc,My Software,Update — Didier Stevens @ 0:00

This new version of oledump adds option -f to find embedded ole files, making the analysis of .DWG files with embedded VBA macros (for example) easier.

And there is a new plugin: plugin_version_vba.py. This helps with determining the VBA version.

Here is a video showing the analysis of .DWG files with option -f:

oledump_V0_0_44.zip (https)
MD5: 2BB2CD027327FFD8857CDADC1C988133
SHA256: 1A9C951E95E2FE0FDF3A3DC8E331205BC65C617953F0E30ED3E6AC045F4DD0C0

Monday 9 December 2019

Update: oledump.py Version 0.0.43

Filed under: My Software,Update — Didier Stevens @ 0:00

This new version of oledump.py adds support for Python 3. Several plugins and decoders were also updated for Python 3.

There’s a new option to include storages in the overview: –storages.

And option –decompress now does also VBA decompression (it was zlib only). This helps to decompress the dir stream of documents with VBA macros:

And I added type 1009 to plugin_msg.py: Compressed RTF.

oledump_V0_0_43.zip (https)
MD5: F98A06CED73C4FC2CA153B7E751746B5
SHA256: 4FE1DBAB822CEC2489328CE3D4D272400F23F1FAD266C9D89B49D9F83F3AA27F

Sunday 8 December 2019

Update: numbers-to-string.py Version 0.0.9

Filed under: My Software,Update — Didier Stevens @ 19:34

This is just a bugfix version (Python 3).

numbers-to-string_v0_0_9.zip (https)
MD5: C5629F102FCF58E5CFF24472D35AFF22
SHA256: 5B1CA43EDFD7BA66CF44FB552BD7882AEB13A8765017F9F865071E187410EE63

Monday 18 November 2019

Update: tcp-honeypot.py Version 0.0.7

Filed under: My Software,Networking,Update — Didier Stevens @ 0:00

This new version of tcp-honeypot.py, a simple TCP honeypot and listener, brings TCP_ECHO and option -f as new features.

TCP_ECHO can be used to send back any incoming data (echo). Like this:

dListeners = {4444: {THP_LOOP: 10,THP_ECHO: None,},}

TCP_ECHO also takes a function, which’s goal is to transform the incoming data and return it. Here is an example with a lambda function that converts all lowercase letters to uppercase:

dListeners = {4444: {THP_LOOP: 10,THP_ECHO: lambda x: x.upper(),},}

If persistence is required across function calls, a custom class can also be provide. This class has to implement a method with name Process (input: incoming data, output: transformed data). Consult the man page (option -m) for more details.

And option -f (format) can be used to change the output format of data.
Possible values are: repr, x, X, a, A, b, B
The default value (repr) output’s data on a single line using Python’s repr function.
a is an ASCII/HEX dump over several lines, A is an ASCII/HEX dump too, but with duplicate lines removed.
x is an HEX dump over several lines, X is an HEX dump without whitespace.
b is a BASE64 dump over several lines, B is a BASE64 without whitespace.



Tuesday 12 November 2019

Steganography and Malware

Filed under: Malware,My Software — Didier Stevens @ 0:00

I was reading about malware using WAV files and steganography to download payloads without triggering detection systems.

For example, here is a WAV file with a hidden, embedded PE file. The PE file is encoded in the least significant bit of 16-bit integers that encode PCM sound.

I was wondering how I could extract this embedded file with my tools. There was no easy solution, because many of my tools operate on byte streams, but here I have to operate on a bit stream. So I made an update to my format-bytes.py tool.

Using my tool file-magic.py, I get confirmation that this is a sound file (.WAV) with 16-bit PCM data.

And here is an ASCII/HEX dump of the beginning of the file made with cut-bytes.py:

The data chunk starts with magic sequence ‘data’ (in yellow), followed by the size of the data chunk (in green), and then the data itself: 16-bit, little-endian signed integers (in red).

To extract the least significant bit of each 16-bit, little-endian signed integer and assemble them into bytes, I use the latest version of format-bytes.py.

This is the command that I use:

format-bytes.py -a -f “bitstream=f:<H,b:0,j:<” #c#[‘data’]+8: DB043392816146BBE6E9F3FE669459FEA52A82A77A033C86FD5BC2F4569839C9.wav.vir

With option -f, I specify a bitstream format.

f:<H means that the format of the data is little-endian (<), unsigned 16-bit integers (H). I could also specify a signed 16-bit integer (h), but this doesn’t matter here, as I’m not going to use the sign of the integers.

b:0 means that I extract the least-significant bit (position 0) of each 16-bit integer.

j:< means that I assemble (join) these bits into bytes from least significant to most significant (<).

The data starts 8 bytes into the data chunk, e.g. 8 bytes after magic sequence ‘data’. I define this with cut-expression #c#[‘data’]+8:.

When I run this command, and perform an ASCII dump, I get this output for the beginning of the stream:

I can indeed see an executable (MZ), but it is preceded by 4 bytes. These 4 bytes are the length of the embedded file. As described in the article, the length is big-endian encoded. Hence I use a similar command to extract the length, but with j:>, as can be seen here:

The length is 733696 bytes, and this matches the IOCs from the article.

Then I use my tool pecheck.py to search for PE files inside the byte stream (-l P), like this:

MD5 7cb0e1e2cf4a9bf450a350a759490057 is indeed the hash of the malicious DLL encoded in this WAV file.






« Previous PageNext Page »

Blog at WordPress.com.