Usage: translate.py [options] [file-in] [file-out] command [script] Translate bytes according to a Python expression Example: translate.py -o svchost.exe.dec svchost.exe 'byte ^ 0x10' "byte" is the current byte in the file, 'byte ^ 0x10' does an XOR 0x10 Extra functions: rol(byte, count) ror(byte, count) shl(bytes, count) shr(bytes, count) IFF(expression, valueTrue, valueFalse) Sani1(byte) Sani2(byte) ZlibD(bytes) ZlibRawD(bytes) GzipD(bytes) Variable "position" is an index into the input file, starting at 0 Source code put in the public domain by Didier Stevens, no Copyright Use at your own risk https://DidierStevens.com Options: --version show program's version number and exit -h, --help show this help message and exit -o OUTPUT, --output=OUTPUT Output file (default is stdout) -s SCRIPT, --script=SCRIPT Script with definitions to include -f, --fullread Full read of the file -r REGEX, --regex=REGEX Regex to search input file for and apply function to -R FILTERREGEX, --filterregex=FILTERREGEX Regex to filter input file for and apply function to -e EXECUTE, --execute=EXECUTE Commands to execute -2 SECONDBYTESTREAM, --secondbytestream=SECONDBYTESTREAM Second bytestream -l, --literalfilenames Do not interpret filenames -m, --man print manual Manual: Translate.py is a Python script to perform bitwise operations on files (like XOR, ROL/ROR, ...). You specify the bitwise operation to perform as a Python expression, and pass it as a command-line argument. translate.py malware -o malware.decoded "byte ^ 0x10" This will read file malware, perform XOR 0x10 on each byte (this is, expressed in Python: byte ^ 0x10), and write the result to file malware.decoded. byte is a variable containing the current byte from the input file. Your expression has to evaluate to the modified byte. When your expression evaluates to None, no byte will be written to output. This can be used to delete bytes from the input. For complex manipulation, you can define your own functions in a script file and load this with translate.py, like this: translate.py malware -o malware.decoded "Process(byte)" process.py process.py must contain the definition of function Process. Function Process must return the modified byte. Another variable is also available: position. This variable contains the position of the current byte in the input file, starting from 0. If only part of the file has to be manipulated, while leaving the rest unchanged, you can do it like this: def Process(byte): if position >= 0x10 and position < 0x20: return byte ^ 0x10 else: return byte This example will perform an XOR 0x10 operation from the 17th byte till the 32nd byte included. All other bytes remain unchanged. Because Python has built-in shift operators (<< and >>) but no rotate operators, I've defined 2 rotate functions that operate on a byte: rol (rotate left) and ror (rotate right). They accept 2 arguments: the byte to rotate and the number of bit positions to rotate. For example, rol(0x01, 2) gives 0x04. translate.py malware -o malware.decoded "rol(byte, 2)" Another function I defined is IFF (the IF Function): IFF(expression, valueTrue, valueFalse). This function allows you to write conditional code without an if statement. When expression evaluates to True, IFF returns valueTrue, otherwise it returns valueFalse. And yet 2 other functions I defined are Sani1 and Sani2. They can help you with input/output sanitization: Sani1 accepts a byte as input and returns the same byte, except if it is a control character. All control characters (except VT, LF and CR) are replaced by a space character (0x20). Sani2 is like Sani1, but sanitizes even more bytes: it sanitizes control characters like Sani1, and also all bytes equal to 0x80 and higher. translate.py malware -o malware.decoded "IFF(position >= 0x10 and position < 0x20, byte ^ 0x10, byte)" By default this program translates individual bytes via the provided Python expression. With option -f (fullread), translate.py reads the input file as one byte sequence and passes it to the function specified by the expression. This function needs to take one string as an argument and return one string (the translated file). Option -r (regex) uses a regular expression to search through the file and then calls the provided function with a match argument for each matched string. The return value of the function (a string) is used to replace the matched string. Option -R (filterregex) is similar to option -r (regex), except that it does not operate on the complete file, but on the file filtered for the regex. Here are 2 examples with a regex. The input file (test-ah.txt) contains the following: 1234&H41&H42&H43&H444321 The first command will search for strings &Hxx and replace them with the character represented in ASCII by hexadecimal number xx: translate.py -r "&H(..)" test-ah.txt "lambda m: chr(int(m.groups()[0], 16))" Output: 1234ABCD4321 The second command is exactly the same as the first command, except that it uses option -R in stead or -r: translate.py -R "&H(..)" test-ah.txt "lambda m: chr(int(m.groups()[0], 16))" Output: ABCD Option -e (execute) is used to execute Python commands before the command is executed. This can, for example, be used to import modules. Here is an example to decompress a Flash file (.swf): translate.py -f -e "import zlib" sample.swf "lambda b: zlib.decompress(b[8:])" You can use build-in function ZlibD too, and ZlibRawD for inflating without header, and GzipD for gzip decompression. Build-in function Xor can be used for Xor decoding with a multi-byte key, like in this example: Example: translate.py -f #h#320700130717 "lambda data: Xor(data, b'abc')" Output: Secret A second file can be used as input with option -2. The value of the current byte of the second input file is stored in variable byte2 (this too advances byte per byte together with the primary input file). Example: translate.py -2 #021230 #Scbpbt "byte + byte2 - 0x30" Output: Secret In stead of using an input filename, the content can also be passed in the argument. To achieve this, prefix the text with character #. If the text to pass via the argument contains control characters or non-printable characters, hexadecimal (#h#) or base64 (#b#) can be used. Example: translate.py #h#89B5B4AEFDB4AEFDBCFDAEB8BEAFB8A9FC "byte ^0xDD" Output: This is a secret! File arguments that start with #e# are a notational convention to use expressions to generate data. An expression is a single function/string or the concatenation of several functions/strings (using character + as concatenation operator). Strings can be characters enclosed by single quotes ('example') or hexadecimal strings prefixed by 0x (0xBEEF). 4 functions are available: random, loremipsum, repeat and chr. Function random takes exactly one argument: an integer (with value 1 or more). Integers can be specified using decimal notation or hexadecimal notation (prefix 0x). The random function generates a sequence of bytes with a random value (between 0 and 255), the argument specifies how many bytes need to be generated. Remark that the random number generator that is used is just the Python random number generator, not a cryptographic random number generator. Example: tool.py #e#random(100) will make the tool process data consisting of a sequence of 100 random bytes. Function loremipsum takes exactly one argument: an integer (with value 1 or more). The loremipsum function generates "lorem ipsum" text (fake latin), the argument specifies the number of sentences to generate. Example: #e#loremipsum(2) generates this text: Ipsum commodo proin pulvinar hac vel nunc dignissim neque eget odio erat magna lorem urna cursus fusce facilisis porttitor congue eleifend taciti. Turpis duis suscipit facilisi tristique dictum praesent natoque sem mi egestas venenatis per dui sit sodales est condimentum habitasse ipsum phasellus non bibendum hendrerit. Function chr takes one argument or two arguments. chr with one argument takes an integer between 0 and 255, and generates a single byte with the value specified by the integer. chr with two arguments takes two integers between 0 and 255, and generates a byte sequence with the values specified by the integers. For example #e#chr(0x41,0x45) generates data ABCDE. Function repeat takes two arguments: an integer (with value 1 or more) and a byte sequence. This byte sequence can be a quoted string of characters (single quotes), like 'ABCDE' or an hexadecimal string prefixed with 0x, like 0x4142434445. The repeat function will create a sequence of bytes consisting of the provided byte sequence (the second argument) repeated as many times as specified by the first argument. For example, #e#repeat(3, 'AB') generates byte sequence ABABAB. When more than one function needs to be used, the byte sequences generated by the functions can be concatenated with the + operator. For example, #e#repeat(10,0xFF)+random(100) will generate a byte sequence of 10 FF bytes followed by 100 random bytes. To prevent the tool from processing file arguments with wildcard characters or special initial characters (@ and #) differently, but to process them as normal files, use option --literalfilenames.
translate_v2_5_12.zip (http)
MD5: 4B0C79AF8A1D41BA735C5030912E6C28
SHA256: 899109A9D787D6781AEB0569330A01709063BB3FD58F4AED068A57951B230F88
This is _very_ inefficient:
def rol(byte, count):
while count > 0:
byte = (byte <> 7) & 0xFF
count -= 1
return byte
This should work faster:
def rol(byte, count):
byte = (byte <> (8 – count)) & 0xFF
return byte
Comment by io — Thursday 10 July 2008 @ 15:58
I believe you wanted to write this:
def rol(byte, count):
byte = (byte << count | byte >> (8 – count)) & 0xFF
return byte
You’ll be surprised by the gain in performance: about 10%
Translating a 3MB file with the original ROL (rolling 4 bits) takes 168 seconds, translating the same file with the faster ROL takes 155 seconds.
There is a huge overhead in the translation of each byte by the eval function:
outbyte = eval(command)
For every byte, Python has to parse, compile and execute the command. Parsing and compiling takes much more time than the loop in the original ROL command. This is _very_ ineffecient, but _very_ flexible. You can provide your own Python expression without having to edit the translate program.
I used a loop in the ROL and ROR commands for didactic reasons. Manipulating bits is very foreign for most people, even programmers. I believe my version is more readable and understandable, and thus extendable by other people.
But you’re right, removing inner loops adds to the performance. But in this specific case, most CPU cycles go to the eval function, and not to the loop.
Anyways, thanks for your comment, I’ll have to think about how to include your code. Maybe I can leave the original ROL and use your code for the ROR 😉
Comment by Didier Stevens — Thursday 10 July 2008 @ 21:05
I think the posting process somehow managed to steal some of my text (especially since the first function is copy&pasted from your post), but yes, that’s what I wanted to write. 🙂
And _especially_ for didactic reasons I think the code should be as good as possible, since other people are learning from it. The folks who don’t understand bit operations should probably stay away from decryption & malware analysis altogether… might do more harm than good. 😉
As for the optimality of the rest of the code, I’ve only skimmed it I’m affraid. I was actually looking for an efficient way of doing ROL/ROR in .py, and that’s how I stumbled over your code. I have plenty of experience with Python, and by accident I work in the malware anlysis industry myself. 🙂 Getting back to the efficiency issue, I’m probably going to write a ROL/ROR module in C/asm to make it efficient enough. That code might even be worth including…
Cheers!
Comment by io — Friday 11 July 2008 @ 12:27
hi,
this is nice … thanks
Comment by sanjeev — Wednesday 21 April 2010 @ 10:40
Hey Didier,
I used the code from translate to build a Mcafee .BUP file decoder. Here’s the code:
# Mcafee .BUP File XOR converter
# Based on Didier Stevens “Translate.py”
# https://blog.didierstevens.com/programs/translate/
#
# Usage: nobup.py file.bup
#
#####################################################
import sys
if len(sys.argv) != 2:
print ‘usage: ./nobup.py file.bup’
else:
encoded = open(sys.argv[1], ‘rb’)
bup = (sys.argv[1] + ‘.decoded.bin’)
decoded = open(bup, ‘wb’)
command = ‘byte ^ 0x6A’
position = 0
while True:
inbyte = encoded.read(1)
if not inbyte:
break
byte = ord(inbyte)
outbyte = eval(command)
decoded.write(chr(outbyte))
position += 1
encoded.close()
decoded.close()
Comment by Lucas Lyon — Saturday 10 July 2010 @ 2:22
[…] time ago, Chris John Riley reminded me of a program I had written, published … and forgotten: translate.py. Apparently, it is used in SANS […]
Pingback by Update: translate.py | Didier Stevens — Wednesday 16 July 2014 @ 19:37
[…] https://blog.didierstevens.com/programs/translate/ […]
Pingback by Bitwise operations with Python | Mick's Mix — Tuesday 15 September 2015 @ 2:12
[…] also decode the downloaded file with my translate program and the IpkfHKQ2Sd […]
Pingback by Analysis Of An Office Maldoc With Encrypted Payload (Slow And Clean) | Didier Stevens — Friday 6 November 2015 @ 0:00
[…] Translate is a Python tool to translate files; you give it a Python expression that converts the input file byte per byte to the output file. […]
Pingback by Update: translate.py V2.1.0 | Didier Stevens — Sunday 8 November 2015 @ 0:01
[…] at the ISC Diary I have an entry on Locky JavaScript Deobfuscation. I use my translate tool to perform part of the static […]
Pingback by Update: translate.py Version 2.2.0 for Locky JavaScript Deobfuscation | Didier Stevens — Sunday 28 February 2016 @ 10:45
[…] this update of my translate program, I added support for searching and replacing with regular […]
Pingback by Update translate.py Version 2.3.0 | Didier Stevens — Tuesday 26 April 2016 @ 0:00
[…] https://blog.didierstevens.com/programs/translate/ […]
Pingback by translate.py: Regex Option | Didier Stevens Videos — Monday 30 May 2016 @ 0:24
[…] needed to decompress the content of a Flash file (.swf). I thought of using my translate.py program with a command to inflate (zlib) the content (minus the header of 8 bytes): lambda b: […]
Pingback by Update: translate.py Version 2.3.1 | Didier Stevens — Monday 19 September 2016 @ 0:00
[…] oledump.py, translate.py, […]
Pingback by Maldoc VBA: .pub File | Didier Stevens Videos — Tuesday 11 October 2016 @ 10:39
[…] added a feature similar to “here files” to translate.py. It’s something I already did in […]
Pingback by Update: translate.py Version 2.4.0 | Didier Stevens — Sunday 26 February 2017 @ 9:19
[…] Windows, where you have no gzip (unless you use Cygwin or a similar solution), you can use my translate.py […]
Pingback by Gzip Decompression Via Pipes | Didier Stevens — Thursday 4 May 2017 @ 0:00
[…] just show how I would have used my translate.py tool to remove the […]
Pingback by I Will Follow (no, not talking about social media) | Didier Stevens — Thursday 6 July 2017 @ 20:54
[…] I analyzed a malicious document send by a reader of the Internet Storm Center, and to decode the payload I wanted to use my tool translate.py. […]
Pingback by Update: translate.py Version 2.5.0 | Didier Stevens — Monday 31 July 2017 @ 20:17
[…] 3 positions to the left) and analyzing the result. To subtract 3 from every byte, we use program translate.py. translate.py takes a file as input and an arithmetic operation: operation “byte – […]
Pingback by Decoding malware via simple statistical analysis | NVISO LABS – blog — Wednesday 30 August 2017 @ 13:18
[…] Office. I will use my zipdump.py tool to extract the XML file with the content, and then use sed or translate.py to strip out XML […]
Pingback by New Tool: xmldump.py | Didier Stevens — Monday 18 December 2017 @ 0:00
[…] to analyze a malicious document, carrying embedded PowerShell scripts with Gzip compression. I use translate.py to do the Gzib decompression as I explained in this blog […]
Pingback by Update: translate.py Version 2.5.2 | Didier Stevens — Tuesday 30 January 2018 @ 0:00
[…] some thinking, I thought I could use my translate program to select every 4th byte (position % 4 == 3) and then calculate byte statistics. But actually, […]
Pingback by Update: translate.py Version 2.5.3 | Didier Stevens — Sunday 18 February 2018 @ 0:00
[…] Tools: translate.py […]
Pingback by Fileless Input Options – Didier Stevens Videos — Sunday 17 June 2018 @ 22:09
[…] once decoded and decompressed, we will end up with another PowerShell script. translate.py has a function (GzipD) to decompress GZip compressed […]
Pingback by PowerShell Inside a Certificate? – Part 2 – NVISO Labs — Wednesday 1 August 2018 @ 7:18
[…] added function ZlibRawD to translate.py to decompress Zlib compression without header (ZlibD already exists, and is for Zlib compression […]
Pingback by Update: translate.py Version 2.5.5 | Didier Stevens — Wednesday 27 February 2019 @ 0:00
[…] Tools: oledump.py, base64dump.py, translate.py […]
Pingback by Analyzing Compressed PowerShell Scripts – Didier Stevens Videos — Sunday 28 July 2019 @ 20:49
[…] oledump.py, zipdump.py, xmldump.py, translate.py, […]
Pingback by Obfuscated Maldoc: Reversed BASE64 – Didier Stevens Videos — Tuesday 23 November 2021 @ 19:39
[…] base64dump.py, translate.py, 1768.py, […]
Pingback by MSBuild & Cobalt Strike – Didier Stevens Videos — Wednesday 9 March 2022 @ 11:03
[…] oledump.py, re-search.py, hex-to-bin.py, translate.py, xorsearch, […]
Pingback by VBA Maldoc & UTF7 (APT-C-35) – Didier Stevens Videos — Sunday 4 September 2022 @ 14:42
[…] 1768.py, xor-kpa.py, pecheck.py, translate.py, […]
Pingback by An Obfuscated Beacon – Extra XOR Layer – Didier Stevens Videos — Tuesday 6 September 2022 @ 7:59
[…] pngdump.py, byte-stats.py, translate.py, decrypt-icedid.py, […]
Pingback by PNG Analysis – Didier Stevens Videos — Thursday 13 October 2022 @ 22:34