Analyzing Malformed ZIP Files

Wednesday 15 April 2020

Analyzing Malformed ZIP Files

Filed under: Forensics,maldoc,My Software — Didier Stevens @ 0:00

With version 0.0.16 (we are now at version 0.0.18), I updated my zipdump.py tool to handle (deliberately) malformed ZIP files. My zipdump tool uses Python’s ZIP module to analyze ZIP files.

Now, zipdump has a an option (-f) to scan arbitrary binary files for ZIP records.

I will show here how this feature can be used, by analyzing a sample Xavier Mertens wrote a diary entry about. This sample is a Word document with macros, an OOXML (Office Open XML format) file (.docm). It is malformed, because 1) there’s an extra byte at the beginning and 2) there’s a byte missing at the end.

When you use my zipdump tool to look at the file, you get an error:

Using option -f l (list), we can find all PKZIP records inside arbitrary, binary files:

When using option -f with value l, a listing will be created of all PKZIP records found in the file, plus extra data. Some of these entries in this report will have an index, that can be used to select the entry.

In this example, 2 entries can be selected:

p: extra bytes at the beginning of the file (prefix)

1: an end-of-central-directory record (PK0506 end)

Using option -f p, we can select the prefix (extra data at the beginning of the file) for further analysis:

And from this hex/ascii dump, we learn that there is one extra byte at the beginning of the ZIP file, and that it is a newline characters (0x0A).

Using option -f 1, we can select the EOCD record to analyze the ZIP file:

As this generates an error, we need to take a closer look at the EOCD record by adding option -i (info):

With this info, we understand that the missing byte makes that the comment length field is one byte short, and this causes the error seen in previous image.

ZIP files can contain comments (for the ZIP container, and also for individual files): these are stored at the end of the PKZIP records, preceded by a 2-byte long, little-endian integer. This integer is the length of the comment. If there is no comment, this integer is zero (0x00).

Hence, the byte we are missing here is a NULL (0x00) byte. We can append a NULL byte to the sample, and then we should be able to analyze the ZIP file. In stead of modifying the sample, I use my tool cut-bytes.py to add a single NULL byte to the file (suffix option: -s #h#00) and then pipe this into zipdump:

File 5 (vbaProject.bin) contains the VBA macros, and can be piped into oledump.py:

I also created a video:

zipdump_v0_0_18.zip (https)
MD5: 34DC469E8CD4E5D3E9520517DEFED888
SHA256: 270B26217755D7ECBCB6D642FBB349856FAA1AE668DB37D8D106B37D062FADBB

Comments (4)

4 Comments »

[…] Blog post: Analyzing Malformed ZIP Files […]

Pingback by zipdump.py: Malformed .docm File – Didier Stevens Videos — Thursday 16 April 2020 @ 0:14
Hi Didier!

Sometimes I have to triage the incoming emails of our company and their attachments.
Regarding zip archives, I see malformed files with two forms of the payload: [blob] + [zip] and [zip] + [blob]

I have written my tools to analyze these kind of files and I have done a tiny change to the python module “zipfile”.
I submitted my change to python.org: https://bugs.python.org/issue40301

Could you take a look at the request? I appreciate your feedback.

Comment by Massimo Sala — Friday 17 April 2020 @ 23:28
You should do your request for Python 3, in my opinion you have no chance to get your change accepted 3 months after the Python 2.7 code freeze.

Comment by Didier Stevens — Sunday 19 April 2020 @ 8:51
Yes, I see. I appreciate python isn’t compiled: after the freeze… it is a breeze to patch the sources. On intranet legacy servers, the compilation of tools is a real pain (toolchain, libraries, dependencies, and so on). Compliments for your tools and articles.

Comment by Massimo Sala — Tuesday 21 April 2020 @ 7:05

RSS feed for comments on this post. TrackBack URI

Didier Stevens

Wednesday 15 April 2020