Didier Stevens

Friday 22 May 2020

Update: oledump.py Version 0.0.50

Filed under: maldoc,Malware,My Software,Update — Didier Stevens @ 0:00

This new version brings updates to plugin plugin_biff.py.

This plugin can now produce a CSV list of cell values and formulas (option -c) or a JSON file of values and formulas (option -j).

Cell references are in RC format (row-column), but can also be produced in letters-numbers format (LN, option -r LN).

CSV or JSON output can be piped into my ad-hoc decoding programs.


oledump_V0_0_50.zip (https)
MD5: 30EB6A0E0924E72350B268ADDE4E4EC7
SHA256: 870167AE5576B169EB52572788D04F1FFCEC5C8AFDEBCC59FE3B8B01CBDE6CD9

Wednesday 15 April 2020

Analyzing Malformed ZIP Files

Filed under: Forensics,maldoc,My Software — Didier Stevens @ 0:00

With version 0.0.16 (we are now at version 0.0.18), I updated my zipdump.py tool to handle (deliberately) malformed ZIP files. My zipdump tool uses Python’s ZIP module to analyze ZIP files.

Now, zipdump has a an option (-f) to scan arbitrary binary files for ZIP records.

I will show here how this feature can be used, by analyzing a sample Xavier Mertens wrote a diary entry about. This sample is a Word document with macros, an OOXML (Office Open XML format) file (.docm). It is malformed, because 1) there’s an extra byte at the beginning and 2) there’s a byte missing at the end.

When you use my zipdump tool to look at the file, you get an error:

Using option -f l (list), we can find all PKZIP records inside arbitrary, binary files:

When using option -f with value l, a listing will be created of all PKZIP records found in the file, plus extra data. Some of these entries in this report will have an index, that can be used to select the entry.

In this example, 2 entries can be selected:

p: extra bytes at the beginning of the file (prefix)

1: an end-of-central-directory record (PK0506 end)

Using option -f p, we can select the prefix (extra data at the beginning of the file) for further analysis:

And from this hex/ascii dump, we learn that there is one extra byte at the beginning of the ZIP file, and that it is a newline characters (0x0A).

Using option -f 1, we can select the EOCD record to analyze the ZIP file:

As this generates an error, we need to take a closer look at the EOCD record by adding option -i (info):

With this info, we understand that the missing byte makes that the comment length field is one byte short, and this causes the error seen in previous image.

ZIP files can contain comments (for the ZIP container, and also for individual files): these are stored at the end of the PKZIP records, preceded by a 2-byte long, little-endian integer. This integer is the length of the comment. If there is no comment, this integer is zero (0x00).

Hence, the byte we are missing here is a NULL (0x00) byte. We can append a NULL byte to the sample, and then we should be able to analyze the ZIP file. In stead of modifying the sample, I use my tool cut-bytes.py to add a single NULL byte to the file (suffix option: -s #h#00) and then pipe this into zipdump:

File 5 (vbaProject.bin) contains the VBA macros, and can be piped into oledump.py:

I also created a video:

zipdump_v0_0_18.zip (https)
MD5: 34DC469E8CD4E5D3E9520517DEFED888
SHA256: 270B26217755D7ECBCB6D642FBB349856FAA1AE668DB37D8D106B37D062FADBB

Wednesday 11 March 2020

CLSIDs in OLE Files

Filed under: maldoc,My Software — Didier Stevens @ 0:00

Directory entries in “OLE” files (Compound File Binary Format) have a GUID field. Like this “Root Entry” inside a binary Word document file (Doc1.doc):

The GUID value found in this directory entry is: 00020906-0000-0000-C000-000000000046 (the endianness of GUIDs is mixed-endian: it’s a mix of little-endian and big-endian).

This GUID is a COM class id (CLSID) for Word.

You can display the CLSID with oledump.py using option -E to display extra information. Use parameter %CLSID% to display the CLSID, like this:

No class IDs were displayed in this output, and that’s because all the CLSID fields in the directory entries of these streams are zero (16 0x00 bytes). Most of the time, streams in Office documents have no CLSID. You’re more likely to find CLSIDs inside the directory entries of storages. To include storages in oledump’s output, use option –storages like this:

Starting with version 0.0.46, oledump.py will also display the Root Entry. And as can be seen in the above output, the Root Entry of this .doc file has a CLSID.

Philippe Lagadec, the developer of olefile and oletools, maintains a list of CLSIDs relevant to Office documents.

When oletools is installed, oledump.py looks up CLSIDs in this list when you use parameter %CLSIDDESC% (CLSID description). Here is the same command as before, but with parameter %CLSIDDESC%:

This result shows that 00020906-0000-0000-C000-000000000046 is COM Object “Microsoft Word 97-2003 Document (Word.Document.8)”.

 

Class IDs can also be found inside some streams, and that’s why I developed a new oledump.py plugin: plugin_clsid.py.

This plugin searches for CLSIDs (defined in oletools) inside streams. Like in this malicious document:

With the class IDs found in this stream, one can quickly conclude that this must be an exploit for the URL moniker.

And here is the Root Entry CLSID for this document:

 

 

Monday 10 February 2020

Update: oledump.py Version 0.0.45

Filed under: maldoc,My Software,Update — Didier Stevens @ 0:00

This new version of oledump.py has a feature to display Ad Hoc YARA rules using option –verbose.

In this example, I show a string Ad Hoc YARA rule to search for string attri (-y #s#attri). By including option –verbose, the YARA rule generated by oledump for string attri is displayed first:

Plugin plugin_http_heuristics has a new option: -c –contains.

By default, plugin_http_heuristics looks for (obfuscated) strings that start with keywords (http:// and https:// by default). Option -c changes this behavior: when this option is used, the keywords are searched in the entire string, and not just at the start.

In this example, I use this feature to search for the filename of the dropped executable (strings containing “.exe”):

And I also include plugin_vba: this is an old plugin that I failed to release. It searches for string concatenation in VBA code.

Video:

oledump_V0_0_45.zip (https)
MD5: FB9694358CCEAE4AFDFCF97FDA0D5205
SHA256: FB75B1E19E5067751E2DE1AD21826245B7E11EDBE03278566484754F606F3965

Thursday 19 December 2019

Update: oledump.py Version 0.0.44

Filed under: maldoc,My Software,Update — Didier Stevens @ 0:00

This new version of oledump adds option -f to find embedded ole files, making the analysis of .DWG files with embedded VBA macros (for example) easier.

And there is a new plugin: plugin_version_vba.py. This helps with determining the VBA version.

Here is a video showing the analysis of .DWG files with option -f:

oledump_V0_0_44.zip (https)
MD5: 2BB2CD027327FFD8857CDADC1C988133
SHA256: 1A9C951E95E2FE0FDF3A3DC8E331205BC65C617953F0E30ED3E6AC045F4DD0C0

Monday 16 December 2019

Analyzing .DWG Files With Embedded VBA Macros

Filed under: maldoc,Malware — Didier Stevens @ 0:00

AutoCAD’s drawing files (.dwg) can contain VBA macros. The .dwg format is a proprietary file format. There is some documentation, for example here.

When VBA macros are stored inside a .dwg file, an OLE file is embedded inside the .dwg file. There’s a quick-and-dirty way to find this embedded file inside the .dwg file: search for magic sequence D0CF11E0.

My tool cut-bytes.py can be used to search for the first occurrence of byte sequence D0CF11E0 and extract all bytes starting from this sequence until the end of the .dwg file. This can be done with cut-expression [D0CF11E0]: and pipe the result into oledump.py, like this:

Next, oledump can be used to conduct the analysis as usual, for example by extracting the VBA macro source code:

There is also a more structured approach to locate the embedded OLE file inside a .dwg file. When one looks at a .dwg file with a hexadecimal editor, the following can be seen:

First there is a magic sequence identifying this as a .dwg file: AC1032. This sequence varies with the file format version, but since many, many years, it starts with AC10. You can find more details regarding this magic sequence here and here.

At position 0x24 (36 decimal), there is a 32-bit little-endian integer. This is a pointer to the embedded OLE file (this pointer is NULL when no OLE file with VBA macros is embedded).

In our example, this pointer value is 0x00008080. And here is what can be found at this position inside the .dwg file:

First there is a 16-byte long header. At position 8 inside this header, there is a 32-bit little-endian integer that represents the length of the embedded file. 0x00001C00 in our example. And after the header one can find the embedded OLE file (notice magic sequence D0CF11E0).

This information can then be used to extract the OLE file from the .dwg like, like this:

Achieving exactly he same result as the quick-and-dirty method. The reason we don’t have to figure out the length of embedded OLE the file using the quick-and-dirty method, is that oledump ignores all bytes appended to an OLE file.

I will adapt my oledump.py tool to extract macros directly from .dwg files, without the need of a tool like cut-bytes.py, but I will probably implement something like the quick-and-dirty method, as this method would potentially work for other file formats with embedded OLE files, not only .dwg files.

 

Monday 21 October 2019

Quickpost: ExifTool, OLE Files and FlashPix Files

Filed under: Forensics,maldoc,Malware,Quickpost — Didier Stevens @ 0:00

ExifTool can misidentify VBA macro files as FlashPix files.

The binary file format of Office documents (.doc, .xls) uses the Compound File Binary Format, what I like to refer as OLE files. These files can be analyzed with my tool oledump.py.

Starting with Office 2007, the default file format (.docx, .docm, .xlsx, …) is Office Open XML: OOXML. It’s in essence a ZIP container with XML files inside. However, VBA macros inside OOXML files (.docm, .xlsm) are not stored as XML files, they are still stored inside an OLE file: the ZIP container contains a file with name vbaProject.bin. That is an OLE file containing the VBA macros.

This can be observed with my zipdump.py tool:

oledump.py can look inside the ZIP container to analyze the embedded vbaProject.bin file:

And of course, it can handle an OLE file directly:

When ExifTool is given a vbaProject.bin file for analysis, it will misidentify it as a picture file: a FlashPix file.

That’s because when ExifTool doesn’t have enough metadata or an identifying extension to identify an OLE file, it will fall back to FlashPix file detection. That’s because FlashPix files are also based on the OLE file format, and AFAIK ExifTool started out as an image tool:

That is why on VirusTotal, vbaProject.bin files from OOXML files with macros, will be misidentified as FlashPix files:

When the extension of a vbaProject.bin file is changed to .doc, ExifTool will misidentify it as a Word document:

ExifTool is not designed to identify VBA macro files (vbaProject.bin). These files are not Office documents, neither pictures. But since they are also OLE files, ExifTool tries to guess what they are, based on the extension, and if that doesn’t help, it falls back to the FlashPix file format (based on OLE).

There’s no “bug” to fix, you just need to be aware of this particular behavior of ExifTool: it is a tool to extract information from media formats, when it analyses an OLE file and doesn’t have enough metadata/proper file extension, it will fall back to FlashPix identification.

 


Quickpost info


Monday 30 September 2019

Update Of My PDF Tools

Filed under: maldoc,Malware,My Software,PDF,Update — Didier Stevens @ 19:16

This is an update of my PDF tools.

There are a couple of bug fixes for pdf-parser and pdfid.

And 2 new features in pdf-parser, inspired by a private training on maldoc analysis I gave last week. I often get good ideas from my students, and sometimes, even I get a good idea in class 🙂 .

Option -o can now be used to select multiple objects: separate the indices by a comma.

There’s a new environment variable, PDFPARSER_OPTIONS, that can be used to provide extra options you want to include with each execution of pdf-parser.py. This is useful for option -O, an option to parse stream objects.

It’s actually best to always parse stream objects, i.e. always use option -O. But I decided not to make this an option that is on by default, so that the behavior of pdf-parser would remain unchanged. I consider this important for the many people that rely on a predictable behavior of pdf-parser, like teachers and students of infosec trainings where my tools are used/mentioned.

However, always including option -O is tedious and error prone. So now you can have best of both worlds, by defining an environment variable with name PDFPARSER_OPTIONS and value -O.

And finally, I started to add a man page (option -m), like I do with many of my other tools. This is a work in progress: for the moment, it points to my free PDF analysis e-book that explains the use of pdfid and pdf-parser.

pdf-parser_V0_7_3.zip (https)
MD5: 7EB1713631D255B36BC698CD2422C7EB
SHA256: D4D5AC9C26A9D8FEF65CE58A769D3F64A737860DC26606068CCDD3F04FDEA0D7

pdfid_v0_2_6.zip (https)
MD5: 9CCE332914A6C76410F04B7C35DA3155
SHA256: 95F7C91EEFB561F3F3BE9809ED339D85E7109BAA7E128EF056651EE018DBDBA0

Wednesday 7 August 2019

Downloading Executables Over DNS: Capture Files

Filed under: maldoc,Networking — Didier Stevens @ 0:00

In my BruCON training “Malicious Documents For Red Teams” (October 2019), we will cover downloading of files over DNS. I Tweeted about downloading Mimikatz via DNS-over-HTTPS with an Excel sheet.

I’m not releasing the Python code to serve files via DNS, nor the VBA code to download files over DNS/DoH: this is reserved for the attendees of my training.

But here I am sharing capture files of the downloads via DNS, so that you can understand how traffic looks like, and how to detect it.

Capture files inside the ZIP container (password is infected):

  1. 1-dns-txt.pcap: downloading of files via DNS TXT records, EICAR file (binary, hexadecimal and BASE64 encoded) and Mimikatz.exe (BASE64 encoded)
  2. 2-DoH-txt.pcap: downloading of Mimikatz.exe via DNS TXT records via dns.google.com (Google’s DNS over HTTPS)
  3. 3-DoH-txt-domain-fronting.pcap: same as 2, but with domain fronting (www.google.com)
  4. 4-DoH-txt.pcapng: same as 2, but in a PCAPNG file with decryption keys
  5. 5-DoH-txt.pcapng: same as 4, but with shorter DNS TXT records (to help with decryption)

DNS_TXT_captures.zip (https)
MD5: 5DB5091B9B641E9B8DA0E29CE9870981
SHA256: 49858B8BBA851B86EAB2DB6C5F329C5B587A3B1C7EB1A1E6028BCFBCDF445ECC

Friday 15 March 2019

Maldoc: Excel 4.0 Macro

Filed under: maldoc,Malware,My Software — Didier Stevens @ 0:00

MD5 007de2c71861a3e1e6d70f7fe8f4ce9b is a malicious document: a spreadsheet with Excel 4.0 macros.

Excel 4.0 macros predate VBA macros: they are composed of functions placed inside cells of a macro sheet.

These macros are not stored in dedicated VBA streams, but as BIFF records in the Workbook stream.

Spreadsheets with Excel 4.0 macros can be analyzed with oledump.py and plugin plugin_biff.py.

Option -x of plugin_biff will select all BIFF records relevant for the analysis of Excel 4.0 macros:

In this output, we have all the BIFF records necessary to 1) determine that this is a malicious document and 2) report what this maldoc does.

The first BIFF record, BOUNDSHEET, tells us that the spreadsheet contains a Excel 4.0 macro sheet that is hidden.

The third BIFF LABEL record tells us that there is a cell with name Auto_Open: the macros will execute when the spreadsheet is opened.

And then we have BIFF FORMULA records that tell us that something is CONCATENATEd and EXECuted.

The BIFF STRING record provides us with the exact command (msiexec …) that will be executed.

The latest version of plugin_biff contains much larger lists of tokens and functions used in formula expressions. Of course, it’s still possible that tokens and/or functions are used unknown by my plugin. This is now clearly indicated in the output:

*UNKNOWN FUNCTION* is reported when a function number is unknown. The function number is always reported. Here, for the sake of this example, a crippled version of plugin_biff reports functions with number 0x0037 and 0x0150. In the released version of plugin_biff, functions 0x0037 and 0x0150 are identified as RETURN and CONCATENATE respectively.

*INCOMPLETE FORMULA PARSING* is reported when a formula expression can not be fully parsed. Left of the warning *INCOMPLETE FORMULA PARSING*, the partially parsed expression can be found, and right of the warning, the remaining, unparsed expression is reported as a Python string. If the remainder contains bytes that could be potentially dangerous functions like EXEC, then this is reported too.

The complete analysis of the maldoc is explained in this video:

Next Page »

Blog at WordPress.com.