Didier Stevens

Saturday 20 April 2019

Extracting “Stack Strings” from Shellcode

Filed under: Malware,My Software,Reverse Engineering — Didier Stevens @ 0:00

A couple of years ago, I wrote a Python script to enhance Radare2 listings: the script extract strings from stack frame instructions.

Recently, I combined my tools to achieve the same without a 32-bit disassembler: I extract the strings directly from the binary shellcode.

What I’m looking for is sequences of instructions like this: mov dword [ebp – 0x10], 0x61626364. In 32-bit code, that’s C7 45 followed by one byte (offset operand) and 4 bytes (value operand).

Or: C7 45 10 64 63 62 61. I can write a regular expression for this instruction, and use my tool re-search.py to extract it from the binary shellcode. I want at least 2 consecutive mov … instructions: {2,}.

I’m using option -f because I want to process a binary file (re-search.py expects text files by default).

And I’m using option -x to produce hexadecimal output (to simplify further processing).

I want to get rid of the bytes for the instruction and the offset operand. I do this with sed:

I could convert this back to text with my tool hex-to-bin.py:

But that’s not ideal, because now all characters are merged into a single line.

My tool python-per-line.py gives a better result by processing this hexadecimal input line per line:

Remark that I also use function repr to escape unprintable characters like 00.

This output provides a good overview of all API functions called by this shellcode.

If you take a close look, you’ll notice that the last strings are incomplete: that’s because they are missing one or two characters, and these are put on the stack with another mov instruction for single or double bytes. I can accommodate my regular expression to take these instructions into account:

This is the complete command:

re-search.py -x -f "(?:\xC7\x45.....){2,}(?:(?:\xC6\x45..)|(?:\x66\xC7\x45...))?" shellcode.bin.vir | sed "s/66c745..//g" | sed "s/c[67]45..//g" | python-per-line.py -e "import binascii" "repr(binascii.a2b_hex(line))"

Blog at WordPress.com.