Didier Stevens

Monday 6 January 2020

Analysis Of Unusual ZIP Files

Filed under: Malware,My Software — Didier Stevens @ 0:00

Intrigued by a blog post from SpiderLabs on a special ZIP file they found, I took a closer look myself.

That special ZIP file is a concatenation of 2 ZIP files, the first containing a single PNG file (with extension .jpg) and the second a single EXE file (malware). Various archive managers and security products handle this file differently, some “seeing” only the PNG file, others only the EXE file.

My zipdump.py tool reports the following for this special ZIP file:

zipdump.py is essentially a wrapper for Python’s zipfile module, and this module parses ZIP files “starting from the end of the file”. That’s why it finds the second ZIP file (appended to the first ZIP file), containing the malicious EXE file.

To help with the analysis of such special/malformed ZIP files, I added an option (-f –find) to zipdump. This option scans the content of the provided file looking for ZIP records. ZIP records start with ASCII string PK followed by 2 bytes to indicate the record type (byte values less than 16).

Here I use option “-f list” to list all PK records found in a ZIP file containing a single text file:

This is how a normal ZIP file containing a single file looks on the inside.

The file starts with a “local file header”, a PK record that starts with ASCII characters PK followed by bytes 0x03 and 0x04 (that’s 50 4B 03 04 in hexadecimal). In zipdump’s report, such a PK record is identified with PK0304. This header is followed by the contained file (usually compressed).

Then there is a “central directory header”, a PK record that starts with ASCII characters PK followed by bytes 0x01 and 0x02 (that’s 50 4B 01 02 in hexadecimal). In zipdump’s report, such a PK record is identified with PK0102. This header contains an offset pointing to the corresponding PK0304 record.

And at the end of the ZIP file, there is a “end of central directory”, a PK record that starts with ASCII characters PK followed by bytes 0x05 and 0x06 (that’s 50 4B 05 06 in hexadecimal). In zipdump’s report, such a PK record is identified with PK0506. This header contains an offset pointing to the first PK0102 record.

A ZIP file containing 2 files looks like this, when scanned with zipdump’s option -f list:

Starting with 2 PK0304 records (one for each contained file), followed by 2 PK0102 records, and 1 PK0506 record.

Armed with this knowledge, we take a look at our malicious ZIP file:

We see 2 PK0506 records, and this is unusual.

We see the following sequence of records twice: PK0304, PK0102, PK0506.

From our previous examples, we can now understand that this sample contains 2 ZIP files.

Remark that zipdump assigned an index to both PK0506 records: 1 and 2. This index can be used to select one of the 2 ZIP files for further analysis. Like in this example, where I select the first ZIP file:

Using option “-f 1” (in stead of “-f list”) selects the first ZIP file in the provide sample, and lists its content.

It can then be further analyzed with zipdump like usual, for example, selecting the first file (order.jpg) inside the first ZIP file for an hex/ascii dump:

Likewise, “-f 2” will select the second ZIP file found inside the sample:

-f is a new option that I added for special/malformed ZIP files, but this is a work in progress, as there are many ways to malform ZIP files.

For example, I created a PoC malformed ZIP file that contains a single file, with reversed PK record order. Here is the output for the normal and “reversed” zip files (malformed, e.g. PK records order reversed):

This file can be opened with Windows Explorer, but there are tools and libraries than can not handle it. Like Python’s zipfile module:

I will further develop zipdump to handle malformed ZIP files as best as possible.

The current version (zipdump 0.0.16) is just a start:

  • it parses only 3 PK record types (PK0304, PK0102 and PK0506), other types are ignored
  • it does minimal parsing of these records: for example, there is no parsing/checking of offsets in this version

And finally, I also created a video showing how to use this new feature:

Friday 3 January 2020

Overview of Content Published in the 2010s

Filed under: Announcement — Didier Stevens @ 0:00

Here is an overview of content I published in the 2010s:

Blog posts:

YouTube videos:

Videoblog posts:

SANS ISC Diary entries:

NVISO blog posts:

Thursday 2 January 2020

Overview of Content Published in 2019

Filed under: Announcement — Didier Stevens @ 0:00

Here is an overview of content I published in 2019:

Blog posts:

YouTube videos:

Videoblog posts:

SANS ISC Diary entries:

NVISO blog posts:

Wednesday 1 January 2020

Overview of Content Published in December

Filed under: Announcement — Didier Stevens @ 0:00

Here is an overview of content I published in December:

Blog posts:

YouTube videos:

Videoblog posts:

SANS ISC Diary entries:

Tuesday 31 December 2019

YARA “Ad Hoc Rules”

Filed under: My Software — Didier Stevens @ 14:42

Several of my tools support YARA rules.

And of those tools, many support what I like to call “Ad Hoc rules” (or Here rules).

An Ad Hoc YARA rule is a rule that isn’t stored in a file, but is passed via the command line, and is generated ad hoc by the tool for you.

Take for example oledump.py.

When you issue the command “oledump.py -y trojan.yara sample.vir”, oledump will load all the rules found inside file trojan.yara, and scan the streams of document sample.vir with these rules.

But if you want to search for a simple string, say “virus.exe”, then you have to create a YARA rule to search for this string, store it inside a file, and pass this file to oledump via option -y.

Ad hoc rules make this process simpler. Ad hoc rules start with #.

To generate an ad hoc rule for a string, use prefix #s#. Like this:

oledump.py -y “#s#virus.exe sample.vir”

This will generate the following YARA rule:

rule string {strings: $a = “virus.exe” ascii wide nocase condition: $a}

You can also use #x# for hexadecimal, oledump.py -y “#x#D0 CF 11 E0” sample.vir:

rule hexadecimal {strings: $a = { D0 CF 11 E0 } condition: $a}

And #r# for regular expressions, oledump.py -y “#r#[a-z]+” sample.vir

rule regex {strings: $a = / [a-z]+ / ascii wide nocase condition: $a}

And you can also pass YARA rules literally (#), hexadecimal encoded (#h#) and base64 encoded (#b#).

And finally, for passing rules literally with double-quotes (“), you can use #q#: this will replace every single quote (‘) with a double quote (“).

 

Sunday 29 December 2019

Update: pdf-parser.py Version 0.7.4 and pdfid.py Version 0.2.7

Filed under: My Software,PDF,Update — Didier Stevens @ 0:00

This is a bug fix version.

pdf-parser_V0_7_4.zip (https)
MD5: 51C6925243B91931E7FCC1E39A7209CF
SHA256: FC318841952190D51EB70DAFB0666D7D19652C8839829CC0C3871BBF7E155B6A

pdfid_v0_2_7.zip (https)
MD5: F1852F238386681C2DC40752669B455B
SHA256: FE2B59FE458ECBC1F91A40095FB1536E036BDD4B7B480907AC4E387D9ADB6E60

Saturday 28 December 2019

Update: zipdump.py Version 0.0.16

Filed under: My Software,Update — Didier Stevens @ 0:00

This new version of zipdump.py, a tool to analyze ZIP files, adds option -f to scan for PK records and adds support for Python 3.

More details in an upcoming blog post.

zipdump_v0_0_16.zip (https)
MD5: 616654BDAFFDA1DDE074E6D1A41E8A42
SHA256: F3B6D52BA32D6BA3836D0919F2BBC262F043EF6E26D173DD0965735D4F3B5598

Wednesday 25 December 2019

zoneidentifier.exe

Filed under: My Software — Didier Stevens @ 13:52

I regularly want to test the behavior of applications opening files downloaded from the Internet.

On Windows, files downloaded from the Internet (with Internet Explorer or Edge, for example) have metadata in an Alternate Data Stream to indicate their origin. This is the Zone.Identifier ADS.

To simulate a download, I will add the ADS myself, and I often refer to my own blog post here and here, as I don’t remember the exact syntax and numbers.

Until recently.

Now, I wrote a small Go program that helps me creating (and removing) the appropriate ADS for a mark-of-web (Zone.Identifier).

Just running zoneidentifier with a filename, will add a Zone.Identifier ADS for zone 3 (Internet) to the file. Like this:

Option -id is used to specify a different zone ID, like this:

And option -remove is used to remove a Zone.Identifier ADS:

zoneidentifier_V0_0_1.zip (https)
MD5: CB1EB21013C6124CB3C1320F6A12207F
SHA256: E867AE693CB5EEA8CF0D252421E347B1309D7F36C9C6A427F7361CD5DD619839

Thursday 19 December 2019

Update: oledump.py Version 0.0.44

Filed under: maldoc,My Software,Update — Didier Stevens @ 0:00

This new version of oledump adds option -f to find embedded ole files, making the analysis of .DWG files with embedded VBA macros (for example) easier.

And there is a new plugin: plugin_version_vba.py. This helps with determining the VBA version.

Here is a video showing the analysis of .DWG files with option -f:

oledump_V0_0_44.zip (https)
MD5: 2BB2CD027327FFD8857CDADC1C988133
SHA256: 1A9C951E95E2FE0FDF3A3DC8E331205BC65C617953F0E30ED3E6AC045F4DD0C0

Monday 16 December 2019

Analyzing .DWG Files With Embedded VBA Macros

Filed under: maldoc,Malware — Didier Stevens @ 0:00

AutoCAD’s drawing files (.dwg) can contain VBA macros. The .dwg format is a proprietary file format. There is some documentation, for example here.

When VBA macros are stored inside a .dwg file, an OLE file is embedded inside the .dwg file. There’s a quick-and-dirty way to find this embedded file inside the .dwg file: search for magic sequence D0CF11E0.

My tool cut-bytes.py can be used to search for the first occurrence of byte sequence D0CF11E0 and extract all bytes starting from this sequence until the end of the .dwg file. This can be done with cut-expression [D0CF11E0]: and pipe the result into oledump.py, like this:

Next, oledump can be used to conduct the analysis as usual, for example by extracting the VBA macro source code:

There is also a more structured approach to locate the embedded OLE file inside a .dwg file. When one looks at a .dwg file with a hexadecimal editor, the following can be seen:

First there is a magic sequence identifying this as a .dwg file: AC1032. This sequence varies with the file format version, but since many, many years, it starts with AC10. You can find more details regarding this magic sequence here and here.

At position 0x24 (36 decimal), there is a 32-bit little-endian integer. This is a pointer to the embedded OLE file (this pointer is NULL when no OLE file with VBA macros is embedded).

In our example, this pointer value is 0x00008080. And here is what can be found at this position inside the .dwg file:

First there is a 16-byte long header. At position 8 inside this header, there is a 32-bit little-endian integer that represents the length of the embedded file. 0x00001C00 in our example. And after the header one can find the embedded OLE file (notice magic sequence D0CF11E0).

This information can then be used to extract the OLE file from the .dwg like, like this:

Achieving exactly he same result as the quick-and-dirty method. The reason we don’t have to figure out the length of embedded OLE the file using the quick-and-dirty method, is that oledump ignores all bytes appended to an OLE file.

I will adapt my oledump.py tool to extract macros directly from .dwg files, without the need of a tool like cut-bytes.py, but I will probably implement something like the quick-and-dirty method, as this method would potentially work for other file formats with embedded OLE files, not only .dwg files.

 

« Previous PageNext Page »

Blog at WordPress.com.