Didier Stevens

Tuesday 12 November 2019

Steganography and Malware

Filed under: Malware,My Software — Didier Stevens @ 0:00

I was reading about malware using WAV files and steganography to download payloads without triggering detection systems.

For example, here is a WAV file with a hidden, embedded PE file. The PE file is encoded in the least significant bit of 16-bit integers that encode PCM sound.

I was wondering how I could extract this embedded file with my tools. There was no easy solution, because many of my tools operate on byte streams, but here I have to operate on a bit stream. So I made an update to my format-bytes.py tool.

Using my tool file-magic.py, I get confirmation that this is a sound file (.WAV) with 16-bit PCM data.

And here is an ASCII/HEX dump of the beginning of the file made with cut-bytes.py:

The data chunk starts with magic sequence ‘data’ (in yellow), followed by the size of the data chunk (in green), and then the data itself: 16-bit, little-endian signed integers (in red).

To extract the least significant bit of each 16-bit, little-endian signed integer and assemble them into bytes, I use the latest version of format-bytes.py.

This is the command that I use:

format-bytes.py -a -f “bitstream=f:<H,b:0,j:<” #c#[‘data’]+8: DB043392816146BBE6E9F3FE669459FEA52A82A77A033C86FD5BC2F4569839C9.wav.vir

With option -f, I specify a bitstream format.

f:<H means that the format of the data is little-endian (<), unsigned 16-bit integers (H). I could also specify a signed 16-bit integer (h), but this doesn’t matter here, as I’m not going to use the sign of the integers.

b:0 means that I extract the least-significant bit (position 0) of each 16-bit integer.

j:< means that I assemble (join) these bits into bytes from least significant to most significant (<).

The data starts 8 bytes into the data chunk, e.g. 8 bytes after magic sequence ‘data’. I define this with cut-expression #c#[‘data’]+8:.

When I run this command, and perform an ASCII dump, I get this output for the beginning of the stream:

I can indeed see an executable (MZ), but it is preceded by 4 bytes. These 4 bytes are the length of the embedded file. As described in the article, the length is big-endian encoded. Hence I use a similar command to extract the length, but with j:>, as can be seen here:

The length is 733696 bytes, and this matches the IOCs from the article.

Then I use my tool pecheck.py to search for PE files inside the byte stream (-l P), like this:

MD5 7cb0e1e2cf4a9bf450a350a759490057 is indeed the hash of the malicious DLL encoded in this WAV file.







  1. […] In a great application of testing theories, research, and writing your own tools when no others exist, Didier Stevens thinks about ways of obfuscating payloads to avoid detection systems and uses Stevens’ own Python tools. Steganography and Malware  […]

    Pingback by Week 46 – 2019 – This Week In 4n6 — Sunday 17 November 2019 @ 21:41

  2. […] Steganography and Malware […]

    Pingback by Overview of Content Published in November | Didier Stevens — Sunday 8 December 2019 @ 9:36

  3. […] Blog post: Steganography and Malware […]

    Pingback by Stego & Cryptominers – Didier Stevens Videos — Sunday 2 February 2020 @ 13:07

RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.