Didier Stevens

Monday 21 October 2019

Quickpost: ExifTool, OLE Files and FlashPix Files

Filed under: Forensics,maldoc,Malware,Quickpost — Didier Stevens @ 0:00

ExifTool can misidentify VBA macro files as FlashPix files.

The binary file format of Office documents (.doc, .xls) uses the Compound File Binary Format, what I like to refer as OLE files. These files can be analyzed with my tool oledump.py.

Starting with Office 2007, the default file format (.docx, .docm, .xlsx, …) is Office Open XML: OOXML. It’s in essence a ZIP container with XML files inside. However, VBA macros inside OOXML files (.docm, .xlsm) are not stored as XML files, they are still stored inside an OLE file: the ZIP container contains a file with name vbaProject.bin. That is an OLE file containing the VBA macros.

This can be observed with my zipdump.py tool:

oledump.py can look inside the ZIP container to analyze the embedded vbaProject.bin file:

And of course, it can handle an OLE file directly:

When ExifTool is given a vbaProject.bin file for analysis, it will misidentify it as a picture file: a FlashPix file.

That’s because when ExifTool doesn’t have enough metadata or an identifying extension to identify an OLE file, it will fall back to FlashPix file detection. That’s because FlashPix files are also based on the OLE file format, and AFAIK ExifTool started out as an image tool:

That is why on VirusTotal, vbaProject.bin files from OOXML files with macros, will be misidentified as FlashPix files:

When the extension of a vbaProject.bin file is changed to .doc, ExifTool will misidentify it as a Word document:

ExifTool is not designed to identify VBA macro files (vbaProject.bin). These files are not Office documents, neither pictures. But since they are also OLE files, ExifTool tries to guess what they are, based on the extension, and if that doesn’t help, it falls back to the FlashPix file format (based on OLE).

There’s no “bug” to fix, you just need to be aware of this particular behavior of ExifTool: it is a tool to extract information from media formats, when it analyses an OLE file and doesn’t have enough metadata/proper file extension, it will fall back to FlashPix identification.

 


Quickpost info


Monday 30 September 2019

Update Of My PDF Tools

Filed under: maldoc,Malware,My Software,PDF,Update — Didier Stevens @ 19:16

This is an update of my PDF tools.

There are a couple of bug fixes for pdf-parser and pdfid.

And 2 new features in pdf-parser, inspired by a private training on maldoc analysis I gave last week. I often get good ideas from my students, and sometimes, even I get a good idea in class 🙂 .

Option -o can now be used to select multiple objects: separate the indices by a comma.

There’s a new environment variable, PDFPARSER_OPTIONS, that can be used to provide extra options you want to include with each execution of pdf-parser.py. This is useful for option -O, an option to parse stream objects.

It’s actually best to always parse stream objects, i.e. always use option -O. But I decided not to make this an option that is on by default, so that the behavior of pdf-parser would remain unchanged. I consider this important for the many people that rely on a predictable behavior of pdf-parser, like teachers and students of infosec trainings where my tools are used/mentioned.

However, always including option -O is tedious and error prone. So now you can have best of both worlds, by defining an environment variable with name PDFPARSER_OPTIONS and value -O.

And finally, I started to add a man page (option -m), like I do with many of my other tools. This is a work in progress: for the moment, it points to my free PDF analysis e-book that explains the use of pdfid and pdf-parser.

pdf-parser_V0_7_3.zip (https)
MD5: 7EB1713631D255B36BC698CD2422C7EB
SHA256: D4D5AC9C26A9D8FEF65CE58A769D3F64A737860DC26606068CCDD3F04FDEA0D7

pdfid_v0_2_6.zip (https)
MD5: 9CCE332914A6C76410F04B7C35DA3155
SHA256: 95F7C91EEFB561F3F3BE9809ED339D85E7109BAA7E128EF056651EE018DBDBA0

Wednesday 7 August 2019

Downloading Executables Over DNS: Capture Files

Filed under: maldoc,Networking — Didier Stevens @ 0:00

In my BruCON training “Malicious Documents For Red Teams” (October 2019), we will cover downloading of files over DNS. I Tweeted about downloading Mimikatz via DNS-over-HTTPS with an Excel sheet.

I’m not releasing the Python code to serve files via DNS, nor the VBA code to download files over DNS/DoH: this is reserved for the attendees of my training.

But here I am sharing capture files of the downloads via DNS, so that you can understand how traffic looks like, and how to detect it.

Capture files inside the ZIP container (password is infected):

  1. 1-dns-txt.pcap: downloading of files via DNS TXT records, EICAR file (binary, hexadecimal and BASE64 encoded) and Mimikatz.exe (BASE64 encoded)
  2. 2-DoH-txt.pcap: downloading of Mimikatz.exe via DNS TXT records via dns.google.com (Google’s DNS over HTTPS)
  3. 3-DoH-txt-domain-fronting.pcap: same as 2, but with domain fronting (www.google.com)
  4. 4-DoH-txt.pcapng: same as 2, but in a PCAPNG file with decryption keys
  5. 5-DoH-txt.pcapng: same as 4, but with shorter DNS TXT records (to help with decryption)

DNS_TXT_captures.zip (https)
MD5: 5DB5091B9B641E9B8DA0E29CE9870981
SHA256: 49858B8BBA851B86EAB2DB6C5F329C5B587A3B1C7EB1A1E6028BCFBCDF445ECC

Friday 15 March 2019

Maldoc: Excel 4.0 Macro

Filed under: maldoc,Malware,My Software — Didier Stevens @ 0:00

MD5 007de2c71861a3e1e6d70f7fe8f4ce9b is a malicious document: a spreadsheet with Excel 4.0 macros.

Excel 4.0 macros predate VBA macros: they are composed of functions placed inside cells of a macro sheet.

These macros are not stored in dedicated VBA streams, but as BIFF records in the Workbook stream.

Spreadsheets with Excel 4.0 macros can be analyzed with oledump.py and plugin plugin_biff.py.

Option -x of plugin_biff will select all BIFF records relevant for the analysis of Excel 4.0 macros:

In this output, we have all the BIFF records necessary to 1) determine that this is a malicious document and 2) report what this maldoc does.

The first BIFF record, BOUNDSHEET, tells us that the spreadsheet contains a Excel 4.0 macro sheet that is hidden.

The third BIFF LABEL record tells us that there is a cell with name Auto_Open: the macros will execute when the spreadsheet is opened.

And then we have BIFF FORMULA records that tell us that something is CONCATENATEd and EXECuted.

The BIFF STRING record provides us with the exact command (msiexec …) that will be executed.

The latest version of plugin_biff contains much larger lists of tokens and functions used in formula expressions. Of course, it’s still possible that tokens and/or functions are used unknown by my plugin. This is now clearly indicated in the output:

*UNKNOWN FUNCTION* is reported when a function number is unknown. The function number is always reported. Here, for the sake of this example, a crippled version of plugin_biff reports functions with number 0x0037 and 0x0150. In the released version of plugin_biff, functions 0x0037 and 0x0150 are identified as RETURN and CONCATENATE respectively.

*INCOMPLETE FORMULA PARSING* is reported when a formula expression can not be fully parsed. Left of the warning *INCOMPLETE FORMULA PARSING*, the partially parsed expression can be found, and right of the warning, the remaining, unparsed expression is reported as a Python string. If the remainder contains bytes that could be potentially dangerous functions like EXEC, then this is reported too.

The complete analysis of the maldoc is explained in this video:

Thursday 7 March 2019

Analyzing a Phishing PDF with /ObjStm

Filed under: maldoc,Malware,My Software,PDF — Didier Stevens @ 0:00

I got hold of a phishing PDF where the /URI is hiding inside a stream object (/ObjStm).

First I start the analysis with pdfid.py:

There is no /URI reported, but remark that the PDF contains 5 stream objects (/ObjStm). These can contain /URIs. In the past, I would search and decompress these stream objects with pdf-parser.py, and then pipe the result through pdfid.py, in order to detect /URIs (or other objects that require further analysis).

Since pdf-parser.py version 0.7.0, I prefer another method: using option -O to let pdf-parser.py extract and parse the objects inside stream objects.

With option -a (here combined with option -O), I can get statistics and keywords just like with pdfid:

Now I can see that there is a /URI inside the PDF (object 43).

Thus I can use option -k to get the value of /URI entries, combined with option -O to look inside stream objects:

And here I have the /URI.

Another method, is to select object 43:

From this output, we also see that object 43 is inside stream object 16.

Remark: if you use option -O on a PDF that does not contain stream objects (/ObjStm), pdf-parser will behave as if you didn’t provide this option. Hence, if you want, you can always use option -O to analyze PDFs.

Monday 31 December 2018

New Tool: msoffcrypto-crack.py

Filed under: Encryption,maldoc,My Software — Didier Stevens @ 0:00

This is a new tool to recover the password of encrypted MS Office documents. I quickly put together this script to help with the analysis of encrypted, malicious documents.

This tool relies completely on Python module msoffcrypto to decrypt MS Office documents.

Since this is a Python tool based on a Python library, don’t except fast password recovery. This is more a convenience program.

It can recover passwords using a build-in password list, or you can provide your own list via option -p.

The tool can also decrypt the encrypted MS Office document if the password is recovered: used option -o to achieve this. Otherwise, the tool just displays the recovered password.

Like many of my tools, it can take its input from stdin and provide the decrypted document via stdout.

It’s developed with Python 2, and also tested on Python 3.

Read the man page for all the details: option -m.

msoffcrypto-crack_V0_0_1.zip (https)
MD5: F67060E0DE62727A1A69D0FD6F39013A
SHA256: 1466B94B56595BA0B91F0A2606F699E1D737E964F3F1A4DFDF7EAA47843DD063

Wednesday 14 November 2018

Video: Analyzing PowerPoint Maldocs with oledump Plugin plugin_ppt

Filed under: maldoc — Didier Stevens @ 0:00

I produced a video for my blog post “Analyzing PowerPoint Maldocs with oledump Plugin plugin_ppt“:

Thursday 25 October 2018

Analyzing PowerPoint Maldocs with oledump Plugin plugin_ppt

Filed under: maldoc,My Software — Didier Stevens @ 0:00

VBA macros inside a PowerPoint document are not stored directly inside streams, but as records in the “PowerPoint Document” stream. I have a plugin to parse the records of the “PowerPoint Document” stream, but I failed to extract the embedded, compressed OLE file with the macros. Until a recent tweet by @AngeAlbertini brought this up again. On his sample too I failed to extract the compressed OLE file, but then I remembered I had fixed a problem with zlib extraction in pdf-parser.py. Taking this code into plugin_ppt.py fixed the decompression problems.

VBA macros in a PowerPoint document do not appear directly in streams:

Plugin plugin_ppt parses records found in stream “PowerPoint Document”:

Each line represents a record, prefixed by an index generated by the plugin (to easily reference records). Records with a C indicator (like 1 and 435) contain sub-records. Records prefixed with ! contain an embedded object.

Record 441 (RT_ExternalOleObjectStg) interests us because it contains an OLE file with VBA macros.

Plugin option -s can be used to select this record:

Plugin option -a can then be used to do an hex/ascii dump:

The first four bytes are the size, and then follows the zlib compressed OLE file (as indicated by 0x78).

This OLE file can be decompressed and extracted with option -e, but pay attention to use option -q (quiet) so that oledump will only report the output of the plugin, and nothing else. This can then be piped into a second instance of oledump:

And now we can extract the VBA macros:

oledump_V0_0_38.zip (https)
MD5: C1D7F71A390497A516F67D798BA25128
SHA256: 4CADEE69D024E9242CDA0CE3A9C22BCB1CAFF9D5BA2D946519C6B7C18F895B81

Wednesday 24 October 2018

Update: oledump.py Version 0.0.38

Filed under: maldoc,My Software,Update — Didier Stevens @ 0:00

This new version of oledump.py includes a new plugin to extract VBA code from PowerPoint files and an update to plugin plugin_http_heuristics.

plugin_http_heuristics was updated to increase the chance of success for the XOR dictionary attack, triggered by a maldoc sample I analyzed.

Two new options were added: -e and -k.

By default, plugin_http_heuristics searchers for keywords http: and https:. Using option -e, this list is extended with keywords msxml, adodb, shell, c:\, cmd and powershell.

With option -k, the default keyword list is replaced by your own list (using , as separator). Here I look for ftp (which is not present), remark that http is no longer detected:

oledump_V0_0_38.zip (https)
MD5: C1D7F71A390497A516F67D798BA25128
SHA256: 4CADEE69D024E9242CDA0CE3A9C22BCB1CAFF9D5BA2D946519C6B7C18F895B81

Thursday 7 June 2018

Encrypted OOXML Documents

Filed under: Encryption,maldoc — Didier Stevens @ 0:00

The Office Open XML format introduced with MS Office 2007, is essentially composed of XML files stored inside a ZIP container.

When an OOXML file (like a .docx file) is protected with a password for reading, it is encrypted. The encrypted OOXML file is stored inside a Compound File Binary Format file, or what I like to call an OLE file. This is the “old” MS Office file format (like .doc), the default file format used before MS Office 2007.

This is how an encrypted .docx file looks like, when analyzed with oledump:

Stream EncryptedPackage contains the encrypted document, and stream EncryptionInfo contains information necessary to help with the decryption of stream EncryptedPackage.

The structure of stream EncryptedPackage is simple:

First there’s an integer with the size of the encrypted document, followed by the encrypted document. If we decode the binary data for the integer with format-bytes.py, we get the size 11841:

The EncryptionInfo stream starts with binary data, the version format, and is then followed by more binary data, or XML data, depending on the version:

The first bytes specify the major and minor version used for the EncryptionInfo stream. This example is mostly XML:

Which can be further parsed with xmldump.py:

To help identifying what version is used, I developed an oledump plugin named plugin_office_crypto:

Depending on the version, different tools can be used to decrypt office documents.

Python program msoffcrypto-tool can only decrypt agile encryption (for the moment, it’s a work in progress).

C program msoffice-crypt can decrypt standard, extended and agile encryption.

 

Sometimes, malicious documents will be encrypted to try to avoid detection. The victim will have to enter the password to open the document. There is one exception though: Excel documents encrypted with password VelvetSweatshop.

 

Next Page »

Blog at WordPress.com.