Didier Stevens

Thursday 13 June 2019

New Tool: amsiscan.py

Filed under: Malware,My Software — Didier Stevens @ 0:00

amsiscan.py is a Python script that uses Windows 10’s AmsiScanBuffer function to scan input for malware.

It reads one or more files or stdin.

The AmsiScanBuffer function returns 5 possible values when it is called for a scan:



amsiscan_V0_0_1.zip (https)
MD5: 47E50599E0CFAF1D27416E68394289A0
SHA256: 044E41D7F31D8333CB5295FD6E430933CA67F9AC37CD400D38189C96AE48544D

Wednesday 12 June 2019

Update: virustotal-search.py Version 0.1.5

Filed under: Malware,My Software,Update — Didier Stevens @ 0:00

virustotal-search.py is a tool to query VirusTotal via its public API for file reports by providing hashes to search for.

This new version adds searching for URLs. Use option -t to select the type of search you want: file (default) or url.

Like this:

Option -e can be used to include extra information (present in the JSON reply) not included by default.

For example, a default file search does not include sha256 hashes:

But you can include it with option “-e sha256” like this:

The public API can also be used for queries for domain names and IP addresses. These queries are much simpler than file and url, and therefor, I developed a very generic program to query APIs. This will be released soon.

virustotal-search_V0_1_5.zip (https)
MD5: 2155347687726A321D1ADBB9C9B81CFD
SHA256: 4F614C9D01C694AEAA16F7D5E4DBFBCF37E8E8D01D382C1137F401612D02E110

Saturday 20 April 2019

Extracting “Stack Strings” from Shellcode

Filed under: Malware,My Software,Reverse Engineering — Didier Stevens @ 0:00

A couple of years ago, I wrote a Python script to enhance Radare2 listings: the script extract strings from stack frame instructions.

Recently, I combined my tools to achieve the same without a 32-bit disassembler: I extract the strings directly from the binary shellcode.

What I’m looking for is sequences of instructions like this: mov dword [ebp – 0x10], 0x61626364. In 32-bit code, that’s C7 45 followed by one byte (offset operand) and 4 bytes (value operand).

Or: C7 45 10 64 63 62 61. I can write a regular expression for this instruction, and use my tool re-search.py to extract it from the binary shellcode. I want at least 2 consecutive mov … instructions: {2,}.

I’m using option -f because I want to process a binary file (re-search.py expects text files by default).

And I’m using option -x to produce hexadecimal output (to simplify further processing).

I want to get rid of the bytes for the instruction and the offset operand. I do this with sed:

I could convert this back to text with my tool hex-to-bin.py:

But that’s not ideal, because now all characters are merged into a single line.

My tool python-per-line.py gives a better result by processing this hexadecimal input line per line:

Remark that I also use function repr to escape unprintable characters like 00.

This output provides a good overview of all API functions called by this shellcode.

If you take a close look, you’ll notice that the last strings are incomplete: that’s because they are missing one or two characters, and these are put on the stack with another mov instruction for single or double bytes. I can accommodate my regular expression to take these instructions into account:

This is the complete command:

re-search.py -x -f "(?:\xC7\x45.....){2,}(?:(?:\xC6\x45..)|(?:\x66\xC7\x45...))?" shellcode.bin.vir | sed "s/66c745..//g" | sed "s/c[67]45..//g" | python-per-line.py -e "import binascii" "repr(binascii.a2b_hex(line))"

Friday 15 March 2019

Maldoc: Excel 4.0 Macro

Filed under: maldoc,Malware,My Software — Didier Stevens @ 0:00

MD5 007de2c71861a3e1e6d70f7fe8f4ce9b is a malicious document: a spreadsheet with Excel 4.0 macros.

Excel 4.0 macros predate VBA macros: they are composed of functions placed inside cells of a macro sheet.

These macros are not stored in dedicated VBA streams, but as BIFF records in the Workbook stream.

Spreadsheets with Excel 4.0 macros can be analyzed with oledump.py and plugin plugin_biff.py.

Option -x of plugin_biff will select all BIFF records relevant for the analysis of Excel 4.0 macros:

In this output, we have all the BIFF records necessary to 1) determine that this is a malicious document and 2) report what this maldoc does.

The first BIFF record, BOUNDSHEET, tells us that the spreadsheet contains a Excel 4.0 macro sheet that is hidden.

The third BIFF LABEL record tells us that there is a cell with name Auto_Open: the macros will execute when the spreadsheet is opened.

And then we have BIFF FORMULA records that tell us that something is CONCATENATEd and EXECuted.

The BIFF STRING record provides us with the exact command (msiexec …) that will be executed.

The latest version of plugin_biff contains much larger lists of tokens and functions used in formula expressions. Of course, it’s still possible that tokens and/or functions are used unknown by my plugin. This is now clearly indicated in the output:

*UNKNOWN FUNCTION* is reported when a function number is unknown. The function number is always reported. Here, for the sake of this example, a crippled version of plugin_biff reports functions with number 0x0037 and 0x0150. In the released version of plugin_biff, functions 0x0037 and 0x0150 are identified as RETURN and CONCATENATE respectively.

*INCOMPLETE FORMULA PARSING* is reported when a formula expression can not be fully parsed. Left of the warning *INCOMPLETE FORMULA PARSING*, the partially parsed expression can be found, and right of the warning, the remaining, unparsed expression is reported as a Python string. If the remainder contains bytes that could be potentially dangerous functions like EXEC, then this is reported too.

The complete analysis of the maldoc is explained in this video:

Thursday 7 March 2019

Analyzing a Phishing PDF with /ObjStm

Filed under: maldoc,Malware,My Software,PDF — Didier Stevens @ 0:00

I got hold of a phishing PDF where the /URI is hiding inside a stream object (/ObjStm).

First I start the analysis with pdfid.py:

There is no /URI reported, but remark that the PDF contains 5 stream objects (/ObjStm). These can contain /URIs. In the past, I would search and decompress these stream objects with pdf-parser.py, and then pipe the result through pdfid.py, in order to detect /URIs (or other objects that require further analysis).

Since pdf-parser.py version 0.7.0, I prefer another method: using option -O to let pdf-parser.py extract and parse the objects inside stream objects.

With option -a (here combined with option -O), I can get statistics and keywords just like with pdfid:

Now I can see that there is a /URI inside the PDF (object 43).

Thus I can use option -k to get the value of /URI entries, combined with option -O to look inside stream objects:

And here I have the /URI.

Another method, is to select object 43:

From this output, we also see that object 43 is inside stream object 16.

Remark: if you use option -O on a PDF that does not contain stream objects (/ObjStm), pdf-parser will behave as if you didn’t provide this option. Hence, if you want, you can always use option -O to analyze PDFs.

Monday 20 August 2018

Obtaining Malware Samples for Analysis

Filed under: Announcement,Malware — Didier Stevens @ 0:00

In my malware analysis blog posts and videos, I always try to include the hash or VirusTotal link of the sample(s) I analyze. If I don’t, it means I’m not at liberty to share the hash.

For every video that I post on YouTube, I create a corresponding video blog post (https://videos.DidierStevens.com) with more info like the sample’s hash and a link to VirusTotal.

In the description of the YouTube video, you will find a link to the video blog post.


I will often use the MD5 hash, but since I include a link to VirusTotal, you can consult the report and find other hashes like sha256 in that report.

Regarding MD5: I don’t worry about hash collisions for malware samples. Actually, if there is an MD5 hash collision, VirusTotal will inform me, and that would make my day 🙂 .

Don’t ask me for the malware samples I analyze, I don’t host or send these malware samples. If you or your organization have a VirusTotal Intelligence subscription, you can download the sample from VirusTotal.

If you don’t, there are several free repositories online (sometimes they require free registration). Lenny Zeltser has a list of repositories.



Wednesday 25 July 2018

Extracting DotNetToJScript’s PE Files

Filed under: Forensics,Malware — Didier Stevens @ 0:00

I added a new option (-I, –ignorehex) to base64dump.py to make the extraction of the PE file inside a JScript script generated with DotNetToJScript a bit easier.

DotNetToJScript is James Forshaw‘s “tool to generate a JScript which bootstraps an arbitrary .NET Assembly and class”.

Here is an example of a script generated by James’ tool:

The serialized .NET object is embedded as a string concatenation of BASE64 strings, assigned to variable serialized_obj.

With re-search.py, I extract all strings from the script (e.g. strings delimited by double quotes):

The first 3 strings are not part of the BASE64 encoded object, hence I get rid of them (there are no unwanted strings at the end):

And now I have BASE64 characters, I just have to get rid of the doubles quotes and the newlines (base64dump searches for continuous strings of BASE64 characters). With base64dump‘s -w option I can get rid of whitespace (including newlines), and with option -i I can get rid of the double-quote character. Unfortunately, escaping of this character (\”) works on Windows, but then cmd.exe gets confused for the next pipe (it expects a closing double-quote). That’s why I introduced option -I, to specify characters with their hexadecimal value. Double-quote is 0x22, thus I use option -I 22:

This is the serialized object, and it contains the .NET assembly I want to analyze. .NET assemblies are .DLLs, e.g. PE files. With my YARA rule to detect PE files, I can find it inside the serialized data:

A PE file was found, and it starts at position 0x04C7. I can cut this data out with option -c:

Another method to find the start of the PE file, is to use a cut expression that searches for ‘MZ’, like this:

If there is more than one instance of string MZ, different cut-expressions must be tried to find the real start of the PE file. For example, this is the cut-expression to select data starting with the second instance of string MZ: -c “[‘MZ’]2:”

It’s best to pipe the cut-out data into pecheck, to validate that it is indeed a PE file:

pecheck also helps with finding the length of the PE file (with the given cut-expression, I select all data until the end of the serialized data).

Remark that there is an overlay (bytes appended to the end of the PE file), and that it starts at position 0x1400. Since I don’t expect an overlay in this .NET assembly, the overlay is not part of the PE file, but it is part of the serialization meta data.

Hence I can cut out the PE file precisely like this:

This PE file can be saved to disk now for reverse-engineering.

I have not read the .NET serialization format specification, but I can make an educated guess. Right before the PE file, there is the following data:

Remark the first 4 bytes (5 bytes before the beginning of the PE file): 00 14 00 00. That’s 0x1400 as a little-endian 32-bit integer, exactly the length of the PE file, 5120 bytes:

So that’s most likely another method to determine the length of the PE file.


Tuesday 24 April 2018

SpiderMonkey and STDIN

Filed under: Malware,My Software — Didier Stevens @ 0:00

With most of my tools, I try to support input via STDIN.

It’s also possible to provide JavaScript scripts for parsing to SpiderMonkey via STDIN. You can pass filename – to js for processing STDIN input:

I often store malware in password protected ZIP files, these files can be analyzed too provided you use zipdump.py:

And with option -e, it’s also possible to change output type via the command line:

Sunday 21 January 2018

Quickpost: Retrieving Malware Via Tor On Windows

Filed under: Malware,Quickpost — Didier Stevens @ 22:46

I sometimes retrieve malware over Tor, just as a simple trick to use another IP address than my own. I don’t do anything particular to be anonymous, just use Tor in its default configuration.

On Linux, its easy: I install tor and torsocks packages, then start tor, and use wget or curl with torsocks, like this:

torsocks wget URL

torsocks curl URL

On Windows, its a bit more difficult, because the torsocks trick doesn’t work.

I run Tor (Windows Expert Bundle) without any configuration:

This will give me a Socks listener, that curl can use:

curl --socks5-hostname http://www.didierstevens.com

option –socks5-hostname makes curl use the Socks listener provided by Tor to make connections and perform DNS requests (option –socks5 does not use the Socks listener for DNS request, just for connections).

wget has no option to use a Socks listener, but it can use an HTTP(S) proxy.

Privoxy is a filtering proxy that I can use to help wget to talk to Tor like this.

I make 2 changes to Privoxy’s configuration config.txt:

1) I change line 811 from “toggle 1” to “toggle 0” to configure Privoxy as a normal proxy, without filtering.

2) I add this line 1363: “forward-socks5t / .”, this makes Privoxy use Tor.

Then I launch Privoxy:

And then I can use wget like this:

wget -e use_proxy=yes -e http_proxy= -e https_proxy= URL

Port 8118 is Privoxy’s port. If you want, you can also put these options in a configuration file.

Often, my wget command will be a bit more complex (I’ll explain this in another blog post, but it’s based on this ISC diary entry):

wget -d -o 01.log -U "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" -e use_proxy=yes -e http_proxy= -e https_proxy= --no-check-certificate URL


I can also use Tor browser in stead of Tor, but then I need to connect to port 9150.

Quickpost info

Tuesday 31 October 2017

Analyzing A Malicious Document Cleaned By Anti-Virus

Filed under: maldoc,Malware — Didier Stevens @ 0:00

@futex90 shared a sample with me detected by many anti-virus programs on VirusTotal but, according to oledump.py, without VBA macros:

I’ve seen this once before: this is a malicious document that has been cleaned by an anti-virus program. The macros have been disabled by orphaning the streams containing macros, just like when a file is deleted from a filesystem, it’s the index that is deleted but not the content. FYI: olevba will find macros.

Using the raw option, it’s possible to extract the macros:

I was able to find back the original malicious document: f52ea8f238e57e49bfae304bd656ad98 (this sample was analyzed by Talos).

The anti-virus that cleaned this file, just changed 13 bytes in total to orphan the macro streams and change the storage names:

This can be clearly seen using oledir:


Next Page »

Blog at WordPress.com.