To celebrate my Microsoft MVP award 2016, I’m releasing a new XOR-tool. Because you can never have enough XOR-tools in your toolbox :-).
When data is XOR-encrypted with a repeating key and you known some of the plaintext, you can perform a simple known-plaintext attack. Because when you XOR the ciphertext with the plaintext, you recover the key-stream.
With “repeating key” I mean the following: let’s assume that the encryption key is “Secret”. Then the first byte of the plaintext is XORed with “S”, the second byte with “e”, the third byte with “c”, …, the sixth byte with “t”. And for the seventh byte, we start again with “S”, then for the eighth byte again with “e”, …
When we know some of the plaintext, for example the beginning of the file, and we XOR this with the ciphertext, we obtain the key-stream: SecretSecretSecretSec It’s simple to extract the repeating key (Secret).
I’ve written a small Python program that automates this process: xor-kpa.py.
As an example, I’ve XORed the notepad.exe program with a key. We know that PE files contain the string “This program cannot be run in DOS mode”, this string is store in text file plaintext.txt. This is how you use xor-kpa:
C:\Demo>xor-kpa.py -e 3 plaintext.txt notepad-ciphertext.exe Key: Password Extra: 30 Keystream: rdPasswordPasswordPasswordPasswordPass
This result shows that the recovered keystream is “rdPasswordPasswordPasswordPasswordPass”, and that the repeating key is “Password”. Extra (30) is the difference between the keystream length (38) and the key length (8). The higher the value of extra is, the higher the confidence is we recovered the correct key. When Extra is only 1, the confidence is low. To properly recover the key, the known-plaintext must be longer than the key.
With option -e you can filter for the minimum value of Extra.
Since the known-plaintext can often be a a short ASCII string, you can provide it directly as an argument in stead of writing it in a text file. To achieve this, just precede the argument with character #, like in this example (the double quotes are necessary because of the space characters):
C:\Demo>xor-kpa.py -e 3 "#This program cannot be run in DOS mode" notepad-ciphertext.exe Key: Password Extra: 30 Keystream: rdPasswordPasswordPasswordPasswordPass
xor-kpa_V0_0_1.zip (https)
MD5: 4265BB1AFCD470A98070FFBDFCB1B52A
SHA256: CF41CEDE7281459FA47061B366AA9B4A5F579CC9BA46E73098B52EA8CAB6E816
Seen this? https://github.com/ThomasHabets/xor-analyze
No need for (exact) known plaintext.
Comment by Anonymous — Friday 8 January 2016 @ 13:54
yes
update (now that I have access to my test machine):
I wrote this program to decode the xml config file used by Java RATs. That encoded config file is 389 bytes and the key is 48 bytes. The config file contains a string of 149 random base-64 characters.
When I run xor-analyze it reports a keylength of 3:
./xor-analyze -M 100 -l config.pl.vir
xor-analyze version 0.4 by Thomas Habets
Counting coincidences… 100 / 100
Key length is probably 3 (or a factor of it)
I think that because of the small ciphertext length / key length ratio and the large amount of random base-64 characters, xor-analyze has not enough statistical data to recover the key.
Even when I generate a frequency table from the decoded config file:
./xor-analyze -M 100 config.pl.vir freqconfig
xor-analyze version 0.4 by Thomas Habets
Counting coincidences… 100 / 100
Key length is probably 3 (or a factor of it)
Finding key based on byte frequency… 3 / 3
Checking redundancy… 66.67 %
Probable key: “gst”
Here is the result of xor-kpa.py on this config file:
type prefix-xml.txt
xor-kpa.py -e 2 prefix-xml.txt config.pl.vir
Key: VY999sisosouuqjqhyysuhahyujssddqsad22rhggdsfsdfs
Extra: 6
Keystream: VY999sisosouuqjqhyysuhahyujssddqsad22rhggdsfsdfsVY999s
Comment by Didier Stevens — Friday 8 January 2016 @ 14:00
[…] I added support for ZIP files to xor-kpa.py. […]
Pingback by Update: xor-kpa.py Version 0.0.2 | Didier Stevens — Saturday 30 January 2016 @ 8:48