Didier Stevens

Tuesday 12 November 2019

Steganography and Malware

Filed under: Malware,My Software — Didier Stevens @ 0:00

I was reading about malware using WAV files and steganography to download payloads without triggering detection systems.

For example, here is a WAV file with a hidden, embedded PE file. The PE file is encoded in the least significant bit of 16-bit integers that encode PCM sound.

I was wondering how I could extract this embedded file with my tools. There was no easy solution, because many of my tools operate on byte streams, but here I have to operate on a bit stream. So I made an update to my format-bytes.py tool.

Using my tool file-magic.py, I get confirmation that this is a sound file (.WAV) with 16-bit PCM data.

And here is an ASCII/HEX dump of the beginning of the file made with cut-bytes.py:

The data chunk starts with magic sequence ‘data’ (in yellow), followed by the size of the data chunk (in green), and then the data itself: 16-bit, little-endian signed integers (in red).

To extract the least significant bit of each 16-bit, little-endian signed integer and assemble them into bytes, I use the latest version of format-bytes.py.

This is the command that I use:

format-bytes.py -a -f “bitstream=f:<H,b:0,j:<” #c#[‘data’]+8: DB043392816146BBE6E9F3FE669459FEA52A82A77A033C86FD5BC2F4569839C9.wav.vir

With option -f, I specify a bitstream format.

f:<H means that the format of the data is little-endian (<), unsigned 16-bit integers (H). I could also specify a signed 16-bit integer (h), but this doesn’t matter here, as I’m not going to use the sign of the integers.

b:0 means that I extract the least-significant bit (position 0) of each 16-bit integer.

j:< means that I assemble (join) these bits into bytes from least significant to most significant (<).

The data starts 8 bytes into the data chunk, e.g. 8 bytes after magic sequence ‘data’. I define this with cut-expression #c#[‘data’]+8:.

When I run this command, and perform an ASCII dump, I get this output for the beginning of the stream:

I can indeed see an executable (MZ), but it is preceded by 4 bytes. These 4 bytes are the length of the embedded file. As described in the article, the length is big-endian encoded. Hence I use a similar command to extract the length, but with j:>, as can be seen here:

The length is 733696 bytes, and this matches the IOCs from the article.

Then I use my tool pecheck.py to search for PE files inside the byte stream (-l P), like this:

MD5 7cb0e1e2cf4a9bf450a350a759490057 is indeed the hash of the malicious DLL encoded in this WAV file.

 

 

 

 

 

Monday 21 October 2019

Quickpost: ExifTool, OLE Files and FlashPix Files

Filed under: Forensics,maldoc,Malware,Quickpost — Didier Stevens @ 0:00

ExifTool can misidentify VBA macro files as FlashPix files.

The binary file format of Office documents (.doc, .xls) uses the Compound File Binary Format, what I like to refer as OLE files. These files can be analyzed with my tool oledump.py.

Starting with Office 2007, the default file format (.docx, .docm, .xlsx, …) is Office Open XML: OOXML. It’s in essence a ZIP container with XML files inside. However, VBA macros inside OOXML files (.docm, .xlsm) are not stored as XML files, they are still stored inside an OLE file: the ZIP container contains a file with name vbaProject.bin. That is an OLE file containing the VBA macros.

This can be observed with my zipdump.py tool:

oledump.py can look inside the ZIP container to analyze the embedded vbaProject.bin file:

And of course, it can handle an OLE file directly:

When ExifTool is given a vbaProject.bin file for analysis, it will misidentify it as a picture file: a FlashPix file.

That’s because when ExifTool doesn’t have enough metadata or an identifying extension to identify an OLE file, it will fall back to FlashPix file detection. That’s because FlashPix files are also based on the OLE file format, and AFAIK ExifTool started out as an image tool:

That is why on VirusTotal, vbaProject.bin files from OOXML files with macros, will be misidentified as FlashPix files:

When the extension of a vbaProject.bin file is changed to .doc, ExifTool will misidentify it as a Word document:

ExifTool is not designed to identify VBA macro files (vbaProject.bin). These files are not Office documents, neither pictures. But since they are also OLE files, ExifTool tries to guess what they are, based on the extension, and if that doesn’t help, it falls back to the FlashPix file format (based on OLE).

There’s no “bug” to fix, you just need to be aware of this particular behavior of ExifTool: it is a tool to extract information from media formats, when it analyses an OLE file and doesn’t have enough metadata/proper file extension, it will fall back to FlashPix identification.

 


Quickpost info


Monday 30 September 2019

Update Of My PDF Tools

Filed under: maldoc,Malware,My Software,PDF,Update — Didier Stevens @ 19:16

This is an update of my PDF tools.

There are a couple of bug fixes for pdf-parser and pdfid.

And 2 new features in pdf-parser, inspired by a private training on maldoc analysis I gave last week. I often get good ideas from my students, and sometimes, even I get a good idea in class 🙂 .

Option -o can now be used to select multiple objects: separate the indices by a comma.

There’s a new environment variable, PDFPARSER_OPTIONS, that can be used to provide extra options you want to include with each execution of pdf-parser.py. This is useful for option -O, an option to parse stream objects.

It’s actually best to always parse stream objects, i.e. always use option -O. But I decided not to make this an option that is on by default, so that the behavior of pdf-parser would remain unchanged. I consider this important for the many people that rely on a predictable behavior of pdf-parser, like teachers and students of infosec trainings where my tools are used/mentioned.

However, always including option -O is tedious and error prone. So now you can have best of both worlds, by defining an environment variable with name PDFPARSER_OPTIONS and value -O.

And finally, I started to add a man page (option -m), like I do with many of my other tools. This is a work in progress: for the moment, it points to my free PDF analysis e-book that explains the use of pdfid and pdf-parser.

pdf-parser_V0_7_3.zip (https)
MD5: 7EB1713631D255B36BC698CD2422C7EB
SHA256: D4D5AC9C26A9D8FEF65CE58A769D3F64A737860DC26606068CCDD3F04FDEA0D7

pdfid_v0_2_6.zip (https)
MD5: 9CCE332914A6C76410F04B7C35DA3155
SHA256: 95F7C91EEFB561F3F3BE9809ED339D85E7109BAA7E128EF056651EE018DBDBA0

Thursday 13 June 2019

New Tool: amsiscan.py

Filed under: Malware,My Software — Didier Stevens @ 0:00

amsiscan.py is a Python script that uses Windows 10’s AmsiScanBuffer function to scan input for malware.

It reads one or more files or stdin.

The AmsiScanBuffer function returns 5 possible values when it is called for a scan:

AMSI_RESULT_CLEAN
AMSI_RESULT_NOT_DETECTED
AMSI_RESULT_BLOCKED_BY_ADMIN_START
AMSI_RESULT_BLOCKED_BY_ADMIN_END
AMSI_RESULT_DETECTED

Example:

amsiscan_V0_0_1.zip (https)
MD5: 47E50599E0CFAF1D27416E68394289A0
SHA256: 044E41D7F31D8333CB5295FD6E430933CA67F9AC37CD400D38189C96AE48544D

Wednesday 12 June 2019

Update: virustotal-search.py Version 0.1.5

Filed under: Malware,My Software,Update — Didier Stevens @ 0:00

virustotal-search.py is a tool to query VirusTotal via its public API for file reports by providing hashes to search for.

This new version adds searching for URLs. Use option -t to select the type of search you want: file (default) or url.

Like this:

Option -e can be used to include extra information (present in the JSON reply) not included by default.

For example, a default file search does not include sha256 hashes:

But you can include it with option “-e sha256” like this:

The public API can also be used for queries for domain names and IP addresses. These queries are much simpler than file and url, and therefor, I developed a very generic program to query APIs. This will be released soon.

virustotal-search_V0_1_5.zip (https)
MD5: 2155347687726A321D1ADBB9C9B81CFD
SHA256: 4F614C9D01C694AEAA16F7D5E4DBFBCF37E8E8D01D382C1137F401612D02E110

Saturday 20 April 2019

Extracting “Stack Strings” from Shellcode

Filed under: Malware,My Software,Reverse Engineering — Didier Stevens @ 0:00

A couple of years ago, I wrote a Python script to enhance Radare2 listings: the script extract strings from stack frame instructions.

Recently, I combined my tools to achieve the same without a 32-bit disassembler: I extract the strings directly from the binary shellcode.

What I’m looking for is sequences of instructions like this: mov dword [ebp – 0x10], 0x61626364. In 32-bit code, that’s C7 45 followed by one byte (offset operand) and 4 bytes (value operand).

Or: C7 45 10 64 63 62 61. I can write a regular expression for this instruction, and use my tool re-search.py to extract it from the binary shellcode. I want at least 2 consecutive mov … instructions: {2,}.

I’m using option -f because I want to process a binary file (re-search.py expects text files by default).

And I’m using option -x to produce hexadecimal output (to simplify further processing).

I want to get rid of the bytes for the instruction and the offset operand. I do this with sed:

I could convert this back to text with my tool hex-to-bin.py:

But that’s not ideal, because now all characters are merged into a single line.

My tool python-per-line.py gives a better result by processing this hexadecimal input line per line:

Remark that I also use function repr to escape unprintable characters like 00.

This output provides a good overview of all API functions called by this shellcode.

If you take a close look, you’ll notice that the last strings are incomplete: that’s because they are missing one or two characters, and these are put on the stack with another mov instruction for single or double bytes. I can accommodate my regular expression to take these instructions into account:

This is the complete command:

re-search.py -x -f "(?:\xC7\x45.....){2,}(?:(?:\xC6\x45..)|(?:\x66\xC7\x45...))?" shellcode.bin.vir | sed "s/66c745..//g" | sed "s/c[67]45..//g" | python-per-line.py -e "import binascii" "repr(binascii.a2b_hex(line))"

Friday 15 March 2019

Maldoc: Excel 4.0 Macro

Filed under: maldoc,Malware,My Software — Didier Stevens @ 0:00

MD5 007de2c71861a3e1e6d70f7fe8f4ce9b is a malicious document: a spreadsheet with Excel 4.0 macros.

Excel 4.0 macros predate VBA macros: they are composed of functions placed inside cells of a macro sheet.

These macros are not stored in dedicated VBA streams, but as BIFF records in the Workbook stream.

Spreadsheets with Excel 4.0 macros can be analyzed with oledump.py and plugin plugin_biff.py.

Option -x of plugin_biff will select all BIFF records relevant for the analysis of Excel 4.0 macros:

In this output, we have all the BIFF records necessary to 1) determine that this is a malicious document and 2) report what this maldoc does.

The first BIFF record, BOUNDSHEET, tells us that the spreadsheet contains a Excel 4.0 macro sheet that is hidden.

The third BIFF LABEL record tells us that there is a cell with name Auto_Open: the macros will execute when the spreadsheet is opened.

And then we have BIFF FORMULA records that tell us that something is CONCATENATEd and EXECuted.

The BIFF STRING record provides us with the exact command (msiexec …) that will be executed.

The latest version of plugin_biff contains much larger lists of tokens and functions used in formula expressions. Of course, it’s still possible that tokens and/or functions are used unknown by my plugin. This is now clearly indicated in the output:

*UNKNOWN FUNCTION* is reported when a function number is unknown. The function number is always reported. Here, for the sake of this example, a crippled version of plugin_biff reports functions with number 0x0037 and 0x0150. In the released version of plugin_biff, functions 0x0037 and 0x0150 are identified as RETURN and CONCATENATE respectively.

*INCOMPLETE FORMULA PARSING* is reported when a formula expression can not be fully parsed. Left of the warning *INCOMPLETE FORMULA PARSING*, the partially parsed expression can be found, and right of the warning, the remaining, unparsed expression is reported as a Python string. If the remainder contains bytes that could be potentially dangerous functions like EXEC, then this is reported too.

The complete analysis of the maldoc is explained in this video:

Thursday 7 March 2019

Analyzing a Phishing PDF with /ObjStm

Filed under: maldoc,Malware,My Software,PDF — Didier Stevens @ 0:00

I got hold of a phishing PDF where the /URI is hiding inside a stream object (/ObjStm).

First I start the analysis with pdfid.py:

There is no /URI reported, but remark that the PDF contains 5 stream objects (/ObjStm). These can contain /URIs. In the past, I would search and decompress these stream objects with pdf-parser.py, and then pipe the result through pdfid.py, in order to detect /URIs (or other objects that require further analysis).

Since pdf-parser.py version 0.7.0, I prefer another method: using option -O to let pdf-parser.py extract and parse the objects inside stream objects.

With option -a (here combined with option -O), I can get statistics and keywords just like with pdfid:

Now I can see that there is a /URI inside the PDF (object 43).

Thus I can use option -k to get the value of /URI entries, combined with option -O to look inside stream objects:

And here I have the /URI.

Another method, is to select object 43:

From this output, we also see that object 43 is inside stream object 16.

Remark: if you use option -O on a PDF that does not contain stream objects (/ObjStm), pdf-parser will behave as if you didn’t provide this option. Hence, if you want, you can always use option -O to analyze PDFs.

Monday 20 August 2018

Obtaining Malware Samples for Analysis

Filed under: Announcement,Malware — Didier Stevens @ 0:00

In my malware analysis blog posts and videos, I always try to include the hash or VirusTotal link of the sample(s) I analyze. If I don’t, it means I’m not at liberty to share the hash.

For every video that I post on YouTube, I create a corresponding video blog post (https://videos.DidierStevens.com) with more info like the sample’s hash and a link to VirusTotal.

In the description of the YouTube video, you will find a link to the video blog post.

Example:

I will often use the MD5 hash, but since I include a link to VirusTotal, you can consult the report and find other hashes like sha256 in that report.

Regarding MD5: I don’t worry about hash collisions for malware samples. Actually, if there is an MD5 hash collision, VirusTotal will inform me, and that would make my day 🙂 .

Don’t ask me for the malware samples I analyze, I don’t host or send these malware samples. If you or your organization have a VirusTotal Intelligence subscription, you can download the sample from VirusTotal.

If you don’t, there are several free repositories online (sometimes they require free registration). Lenny Zeltser has a list of repositories.

 

 

Wednesday 25 July 2018

Extracting DotNetToJScript’s PE Files

Filed under: Forensics,Malware — Didier Stevens @ 0:00

I added a new option (-I, –ignorehex) to base64dump.py to make the extraction of the PE file inside a JScript script generated with DotNetToJScript a bit easier.

DotNetToJScript is James Forshaw‘s “tool to generate a JScript which bootstraps an arbitrary .NET Assembly and class”.

Here is an example of a script generated by James’ tool:

The serialized .NET object is embedded as a string concatenation of BASE64 strings, assigned to variable serialized_obj.

With re-search.py, I extract all strings from the script (e.g. strings delimited by double quotes):

The first 3 strings are not part of the BASE64 encoded object, hence I get rid of them (there are no unwanted strings at the end):

And now I have BASE64 characters, I just have to get rid of the doubles quotes and the newlines (base64dump searches for continuous strings of BASE64 characters). With base64dump‘s -w option I can get rid of whitespace (including newlines), and with option -i I can get rid of the double-quote character. Unfortunately, escaping of this character (\”) works on Windows, but then cmd.exe gets confused for the next pipe (it expects a closing double-quote). That’s why I introduced option -I, to specify characters with their hexadecimal value. Double-quote is 0x22, thus I use option -I 22:

This is the serialized object, and it contains the .NET assembly I want to analyze. .NET assemblies are .DLLs, e.g. PE files. With my YARA rule to detect PE files, I can find it inside the serialized data:

A PE file was found, and it starts at position 0x04C7. I can cut this data out with option -c:

Another method to find the start of the PE file, is to use a cut expression that searches for ‘MZ’, like this:

If there is more than one instance of string MZ, different cut-expressions must be tried to find the real start of the PE file. For example, this is the cut-expression to select data starting with the second instance of string MZ: -c “[‘MZ’]2:”

It’s best to pipe the cut-out data into pecheck, to validate that it is indeed a PE file:

pecheck also helps with finding the length of the PE file (with the given cut-expression, I select all data until the end of the serialized data).

Remark that there is an overlay (bytes appended to the end of the PE file), and that it starts at position 0x1400. Since I don’t expect an overlay in this .NET assembly, the overlay is not part of the PE file, but it is part of the serialization meta data.

Hence I can cut out the PE file precisely like this:

This PE file can be saved to disk now for reverse-engineering.

I have not read the .NET serialization format specification, but I can make an educated guess. Right before the PE file, there is the following data:

Remark the first 4 bytes (5 bytes before the beginning of the PE file): 00 14 00 00. That’s 0x1400 as a little-endian 32-bit integer, exactly the length of the PE file, 5120 bytes:

So that’s most likely another method to determine the length of the PE file.

 

Next Page »

Blog at WordPress.com.