Didier Stevens

Wednesday 22 June 2022

Examples Of Encoding Reversing

Filed under: Forensics,Malware,Reverse Engineering — Didier Stevens @ 15:08

I recently created 2 blog posts with corresponding videos for the reversing of encodings.

The first one is on the ISC diary: “Decoding Obfuscated BASE64 Statistically“. The payload is encoded with a variation of BASE64, and I show how to analyze the encoded payload to figure out how to decode it.

And this is the video for this diary entry:

And on this blog, I have another example, more complex, where the encoding is a variation of hexadecimal encoding, with some obfuscation: “Another Exercise In Encoding Reversing“.

And here is the video:

Monday 20 June 2022

Another Exercise In Encoding Reversing

Filed under: Forensics,Malware,Reverse Engineering — Didier Stevens @ 23:50

I also recorded a video for this blog post.

In this blog post, I will show how to decode a payload encoded in a variation of hexadecimal encoding, by performing statistical analysis and guessing some of the “plaintext”.

I do have the decoder too now (a .NET assembly), but here I’m going to show how you can try to decode a payload like this without having the decoder.

The payload looks like this:

Seeing all these letters, I thought: this is lowercase Netbios Name encoding. That is an encoding where each byte is represented by 2 hexadecimal characters, but the characters are all letters, in stead of digits and letters. Since my tool base64dump.py can handle netbios name encoding, I let it try all encodings:

That failed: no netbios encoding was found. Only base64 and 2 variants of base85, but that doesn’t decode to anything I recognize. Plus, for the last 2 decodings, only 17 unique characters were found. That makes it very unlikely that it is indeed base64 or base85.

Next I use my tools byte-stats.py to produce statistics for the bytes found inside the payload:

There are 17 unique bytes used to encode this payload. The ranges are:

  • abcdef
  • i
  • opqrstuvw
  • y

This is likely some form of variant of hexadecimal encoding (16 characters) with an extra character (17 in total).

To analyze and try to decode this, I’m making a custom Python program based on my Python template for processing binary files.

You will find this default processing code in the template:

I am replacing this default code with the following code (I will post a link to the complete program at the end of this blog post):

The content of the file is in variable data. These are bytes.

Since I’m actually dealing with letters only, I’m converting these bytes to characters and store this into variable encodedpayload.

The next piece of code, starting with “data = []” and ending with “data = bytes(data)”, will read two characters from the encodedpayload, and try to convert them from an hexadecimal byte to a byte. If that fails (ValueError), that pair of characters is just ignored.

And then, the last statement, I do an hexadecimal/ascii dump of the data that I was able to convert. This gives me the following:

That doesn’t actually make me any wiser.

Looking at the statistics produced by byte-stats.py, I see that there are 2 letters that appear most frequently, around 9% of the time: d and q.

I do know that the payload is a Windows executable (PE file). PE files that are not packed, contain a lot of NULL bytes. Character 0 is by far the most frequent when we do a frequency analysis of the hexadecimal representation of a “classic” PE file. It often has a frequency of 20% or higher.

That is not the case here for letters d and q. So I don’t know which letter represents digit 0.

Let’s make a small modification to the program, and represent each pair of characters that couldn’t be decoded as hexadecimal, by a NULL byte (data.append(0):

This code produces the following output:

And that is still not helpful.

Since I know this is a PE file, I know the file has to start with the letters MZ. That’s 4D5A in hexadecimal.

The encoded payload starts with ydua. So let’s assume that this represents MZ (4D5A in hexadecimal), thus y is 4, d is d, u is 5 and a is a.

I will now add a small dictionary (dSubstitute) with this translation, and add code to do a search and replace for each of these letters (that’s the for loop):

This code produces the following output:

Notice that apart from MZ, letters DO also appear. DO is 444F in hexadecimal, and is part of the well-known string found at the beginning of (most) PE files: !This program cannot be run in DOS mode

I will know use this string to try to match more letters with hexadecimal digits (I’m assuming the PE file contains this string).

I add the following lines to print out string “!This program cannot be run in DOS mode” in hexadecimal:

This results in the following output:

Notice that the letter T is represented as 54 in hexadecimal. Hexadecimal digits 5 and 4 are part of the digits we already decoded. 5 is u and and 4 is y.

I add code to find the position of the first occurrence of string uy inside the encoded payload:

And this is the output:

Position 86. That’s at the beginning of the payload, so it’s possible that I have found the location of the encoded string “!This program cannot be run in DOS mode”.

I will now add code that does the following: for each letter of the encoded string, I will lookup the corresponding hexadecimal digit in the hexadecimal representation of the unencoded string, and add this decoding pair to the dictionary. If the letter that I add to the dictionary is already present in the dictionary, I compare the stored hexadecimal digit for that letter with the one I looked up, and if they are different, I generate an exception. Because if that happens, I don’t have a one-to-one relationship, and my hypothesis that this is a variant of hexadecimal, is wrong. This is the extra code:

After completing the dictionary, I do a return. I don’t want to do the decoding yet, I just want to make sure that no exception is generated by finding 2 different hexadecimal digits. This is the output:

No exception was thrown: we have a one-to-one relationship.

Next I add 2 lines to see how many and what letters I have inside the dictionary:

This is the output:

That is 14 letters (we have 17 in total). That’s a great result.

I remove the return statement now, to let the decoding take place:

Giving this result:

That is a great result. Not only do I see strings MZ and “!This program cannot be run in DOS mode”, but also PE, .text, .data, .rdata, …

I am now adding code to see which letters I’m still missing:

Giving me this output:

The letters I still need to match to hexadecimal digits are: b, c and q.

I want to know where these letters are found inside the partially decoded payload, and for that I add the following code:

Giving me this result:

The letter q appears very soon: as the 6th character.

Let’s compare this with the start of another, well-known PE file: notepad.exe:

So notepad.exe starts with 4d5a90000300000004

And the partially decoded payload starts with: 4d5a9q03qq04

Let’s put that right under each other:



If I replace q with 000, I match the beginning of notepad.exe.



I add this to the dictionary:

And run the program:

That starts to look like a completely decoded PE file.

But I still have letters b and c.

I’m adding some code to see which hexadecimal characters are left unpaired with a letter:


Hexadecimal digits b and c have not been paired with a letter.

Now, since a translates to a, d to d, e to e and f to f, I’m going to guess that b translates to b and c to c.

I’m adding code to write the decoded payload to disk:

And after running one more time my script, I’m using my tool pe-check.py to validate that I have indeed a properly decoded PE file:

This looks good.

From the process memory dump I have for this malware, I know that I’m dealing with a Cobalt Strike beacon. Let’s check with my 1768.py tool:

This is indeed a Cobalt Strike beacon.

The encoding that I reversed here, is used by GootLoader to encode beacons. It’s an hexadecimal representation, where the decimal digits have been replaced by letters other that abcdef. With an extra twist: while letter v represents digit 0, letter q represent digits 000.

The complete analysis & decoding script can be found here.

Friday 27 May 2022

PoC: Cobalt Strike mitm Attack

Filed under: Encryption,Hacking,Malware — Didier Stevens @ 0:00

I did this about 6 months ago, but this blog post didn’t get posted back then. I’m posting it now.

I made a small Proof-of-Concept: cs-mitm.py is a mitmproxy script that intercepts Cobalt Strike traffic, decrypts it and injects its own commands.

In this video, a malicious beacon is terminated by sending it a sleep command followed by an exit command. I just included the sleep command to show that it’s possible to do this for more than one command.

I selected this malicious beacon for this PoC because it uses one of the leaked private keys, enabling the script to decrypt the metadata and obtain the necessary AES and HMAC keys.

The PoC does not support malleable C2 data transforms, but the code to do this can be taken from my other cs-* tools.

Monday 4 April 2022

.ISO Files With Office Maldocs & Protected View in Office 2019 and 2021

Filed under: maldoc,Malware,Uncategorized — Didier Stevens @ 0:00

We have seen ISO files being used to deliver malicious documents via email. There are different variants of this attack.

One of the reasons to do this, is to evade “mark-of-web propagation”.

When a file (attached to an email, or downloaded from the Internet) is saved to disk on a Windows system, Microsoft applications will mark this file as coming from the Internet. This is done with a ZoneIdentifier Alternate Data Stream (like a “mark-of-web”).

When a Microsoft Office application, like Word, opens a document with a ZoneIdentifier ADS, the document is opened in Protected View (e.g., sandboxed).

But when an Office document is stored inside an ISO file, and that ISO has a ZoneIdentifier ADS, then Word will not open the document in Protected View. That is something I observed 5 years ago.

But this has changed recently. When exactly, I don’t know (update: August 2021).

But when I open an Office document stored inside an ISO file marked with a ZoneIdentifier ADS, Office 2021 will open the document in protected view:

With an unpatched version of Office 2019, that I installed a year ago, that same file is not opened in Protected View:

After updating Office:

Word’s behavior has changed:

The file is now opened in Protected View.

If you want to test this yourself, you can use my ZoneIdentifier tool to easily settings a “mark-of-web” without having to download your test file from the Internet:

Or you can just add the ZoneIdentifier ADS with notepad.

I did the same test with Office 2016, I updated an old version and: the document is not opened in Protected View.

I don’t know exactly when Microsoft Office 2019 was updated so that it would open documents in Protected View when they are inside an ISO file marked as originating from the Internet. But if you do know, please post a comment.

Update: this change happened in August 2021. See comments below. Thanks Philippe.

Saturday 11 December 2021

MiTM Cobalt Strike Network Traffic

Filed under: Encryption,Hacking,Malware,My Software — Didier Stevens @ 10:14

I made a small PoC. cs-mitm. py is a mitmproxy script that intercepts Cobalt Strike traffic, decrypts it and injects its own commands. In this video, a malicious beacon is terminated by sending it an exit command. I selected a malicious beacon that uses one of the leaked private keys.

The script does not support data transforms, but that can be easily added, for example with code found in cs-parse-traffic.py.

Thursday 21 October 2021

“Public” Private Cobalt Strike Keys

Filed under: Encryption,Malware,My Software — Didier Stevens @ 18:05

I found 6 private keys used by malicious Cobalt Strike servers. There’s a significant number of malicious CS servers on the Internet that reuse these keys, thus allowing us to decrypt their C2 traffic. For the details, I recommend reading the following blog post I wrote “Cobalt Strike: Using Known Private Keys To Decrypt Traffic – Part 1“.

I integrated these keys in the database (1768.json) of my tool 1768.py (starting version 0.0.8).

Whenever you analyze a beacon with 1768.py that uses a public key with a known private key, the report will point this out:

And when you use option verbose, the private key will be included:

If you want to integrated these 6 keys in your own tools: be my guest. You can find these key pairs in 1768.json.

Sunday 26 September 2021

Patching A Java .class File

Filed under: 010 Editor,Forensics,Hacking,Malware — Didier Stevens @ 0:00

010 Editor is one of few commercial applications that I use daily. It’s a powerful binary editor with scripting and templates.

I recently had to patch a Java .class file: extend a string inside that class. Before going the route of decompiling / editing / recompiling, I tried with 010 Editor.

Here is the file opened inside the editor:

When opening the file, 010 Editor recognized the .class extension and installed and ran the template for .class files. That’s what I wanted to know: is there a template for .class files? Yes, there is!

Here is how you can apply a template manually, in case the file extension is not the original extension:

And this is how the template results look like:

Under the hex/ascii dump, the template results are displayed: a set of nested fields that match the internal structure of .class file. For example, the first field I selected here, u4 magic, is the magic header of a .class file: CAFEBABE.

The string I want to extend is this one:

I need to extend string “1.2 (20210922)”. Into something like “1.2 (20210922a)”.

Doing so will make the string longer, thus I need to add a byte to the file (trivial), but I also need to make sure that the binary structure of .java files remain valid: for example, if there is something in that structure like a field length, I need to change the field length too.

I’m not familiar with the internal structure of .class files, that why I’m using 010 Editor’s .class template, hoping that the template will make it clear to me what needs to be changed.

To find the template result field I need to modify, I position my cursor on the string I want to modify inside the ASCII dump, I right-click and select “Jump To Template Variable”:

Which selects the corresponding template variable:

So my cursor was on the 10th byte (bytes[9]) of the string, which is part of template variable cp_info constant_pool[27]. From that I gather that the string I want to modify is inside a pool of constants.

I can select that template variable:

And here I can see which bytes inside the .class file were selected. It’s not only the string, but also bytes that represent the tag and length. The length is 14, that’s indeed the length of the string I want to extend. Since I want to add 1 character, I change the length from 14 to 15: I can do that inside the template results by double-clicking the value 14, I don’t need to make that change inside the hexdump:

Next I need to add a character to the string. I can do that in the ASCII dump:

I have to make sure that the editor is in insert mode (INS), so that when I type characters, they are inserted at the cursor, in stead of overwriting existing bytes:

And then I can type my extra character:

So I have changed the constant string I wanted to change. Maybe there are more changes to make to the internal structure of this .class file, like other length fields … I don’t know. But what I do as an extra check is: save the modified file and run the template again. It runs without errors, and the result looks good.

So I guess there are no more changes to make, and I decide to tryout my modified .class file and see what happens: it works, so there are no other changes to make.

Monday 26 April 2021

Quickpost: Decrypting Cobalt Strike Traffic

Filed under: Encryption,Malware,My Software,Quickpost — Didier Stevens @ 0:00

I have been looking at several samples of Cobalt Strike beacons used in malware attacks. Although work is still ongoing, I already want to share my findings.

Cobalt Strike beacons communicating over HTTP encrypt their data with AES (unless a trial version is used). I found code to decrypt/encrypt such data in the PyBeacon and Geacon Github repositories.

This code works if you know the AES key: which is not a problem in the use cases of the code above, as it is developed to simulate a beacon. Beacons generate their own AES key, and thus these beacon simulations also generate their own AES key.

But what if you’re analyzing real beacons used in malware attacks? How do you obtain the AES key?

I found a way to extract the keys (AES and HMAC) from process memory of a running beacon.

I use the following procdump command to prepare process memory dumps:

procdump -mp -w -s 1 -n 5 malware.exe

Then I start the beacon malware.exe in a malware analysis virtual machine while capturing traffic with Wireshark.

My new tool cs-extract-key.py looks in the dumped process memory for the unencrypted (RSA encryption) metadata that a beacon sends to the C2. This metadata contains the AES en HMAC keys.


This method does not always work: the metadata is overwritten after some time, so the process dump needs to be taken quickly after the beacon is started. And there are also cases were this metadata can not be found (I suspect this is version bound).

For those cases, my tool has another way of obtaining the keys. I extract the encrypted data of the first post of the beacon to the C2 (this is called a callback in the PyBeacon code):

And then I provide this to my tool, together with the process dump. My tool will then proceed with a dictionary attack: extract all possible AES and HMAC keys from the process dump, and try do authenticate and decrypt the callback. If this works, the keys have been found:

And once I have obtained the keys, I can pass them to my traffic decoding program that I have updated to include decryption (and that I have renamed to cs-parse-http-traffic.py):

Quickpost info

Friday 12 February 2021

Quickpost: oledump.py plugin_biff.py: Remove Sheet Protection From Spreadsheets

Filed under: Malware,My Software,Quickpost — Didier Stevens @ 0:00

My new version of plugin_biff.py has a new option: –hexrecord.

Here I’ll show how I use this to remove the sheet protection from malicious spreadsheets.

If you want to open a malicious spreadsheet (for example with Excel 4 macros) in a sandbox, to inspect its content with Excel, chances are that it is protected.

I’m not talking about encryption (this is something that can be handled with my tool msoffcrypto-crack.py), but about sheet protection.

Enabling sheet protection can be done in Excel as follows:

Although you have to provide a password, that password is not used to derive an encryption key. An .xls file with sheet protection is not encrypted.

If you use my tool oledump.py together with plugin_biff.py, you can select all BIFF records that have the string “protect” in their name or description (-O protect). This will give you different records that govern sheet protection.

First, let’s take a look at an empty, unprotected (and unencrypted) .xls spreadsheet. With option -O protect I select the appropriate records, and with option -a I get an hex/ascii dump of the record data:

We can see that there are several records, and that their data is all NULL (0x00) bytes.

When we do the same for a spreadsheet with sheet protection, we get a different view:

First of all we have 4 extra records, and their data isn’t zero: the flags are set to 1 (01 00 little-endian) and the Protection Password data is AB94. That is the hash of the password (P@ssw0rd) we typed to create this sheet.

To remove this sheet protection, we just need to set all data to 0x00. This is something that can be done with an hex editor.

First use option -R instead of option -a:

This will give you the complete records (type, length and data) in hexadecimal. Next you can search for each record using this hexadecimal data with an hex editor and set the data bytes to 0x00.

Searching for the first record 120002000100:

Setting the data to 0x00: 0100 -> 0000

Do this for the 4 records, and then save the spreadsheet under a different name (keep the original intact).

Now you can open the spreadsheet, and the sheet protection is gone. You can now unhide hidden sheets for example.

Quickpost info

Thursday 28 January 2021

Update: XORSelection.1sc Version 6.0

Filed under: 010 Editor,Encryption,Malware,My Software — Didier Stevens @ 0:00

I released an update to my 010 Editor script XORSelection.1sc.

010 is a binary editor with a scripting engine. XORSelection.1sc is a script I wrote years ago, that will XOR-encode a (partial) file open in the editor.

The first version just accepted a printable, arbitrary-length string as XOR-key. Later versions accepted an hexadecimal key too, and introduced various options.

With version 6.0, I add support for a dynamic XOR-key. That is a key that changes while it is being used. It can change, one byte at-a-time, before or after each XOR operation at byte-level is executed.

Hence option cb means change before, and ca means change after. Watch this video to understand exactly how the key changes (if you want to skip the part explaining my script XORSelection, you can jump directly to the dynamic XOR-key explanation).


I made this update to my XORSelection script, because I had to “manually” decode a Cobalt Strike beacon that was XOR-encoded with a changing XOR key (it is part of a WebLogic server attack). Later I included this decoding in my Cobalt Strike beacon analysis tool 1768.py.

The decoding shellcode is in the first 62 bytes (0x3E) of the file:

After the shellcode comes the XOR-key, the size and the beacon:

We can decode the beacon size, that is XOR-encoded with key 0x3F0882FB, as follows. First we select the bytes to be decoded:

Then we launch 010 Editor script XORSelection.1sc:

Provide the XOR key (prefix 0x is to indicate that the key is provide as hexadecimal byte values):

And then, after pressing OK, the bytes that contain the beacon size are decoded by XOR-ing them with the provided key:

This beacon size (bytes 00 14 04 00) is a little-endian, 32-bit integer: 0x041400.

To decode the beacon, we select the encoded beacon and launch script XORSelection.1sc again:

This time, we need to provide an option to change the XOR-decoding process. We press OK without entering a value, this will make the next prompt appear, where we can provide options:

The option we need to use to decode this Cobalt Strike beacon, is cb: change before.

In the next prompt, we can provide the XOR-key:

And we end up with the decoded beacon (you can see parts of the PE file that is the beacon):

Remark that you can enter “h” at the option prompt, to get a help screen:

I made this video explaining how to use this new option, and also explaining how the XOR key is changed exactly when using option change before (cb) or change after (ca).

If you want to skip the part explaining my script XORSelection, you can jump directly to the dynamic XOR-key explanation.

XORSelection_V6_0.zip (https)
MD5: C1872C275B59E236906D38B2302F3F4B
SHA256: 1970A506299878FAC2DDD193F9CE230FD717854AC1C85554610DDD95E04DE9E9


Next Page »

Blog at WordPress.com.