Didier Stevens

Monday 31 July 2006


Filed under: Reverse Engineering — Didier Stevens @ 17:24

A friend faced the following problem: his company has to provide confidential data to a financial company. To maintain the confidentiality of the data, this financial company provided my friend with a custom-made program to “protect” the data to be provided.

But my friend doesn’t trust unknown programs, he wanted to know exactly what protection this program offered. The financial company didn’t want to provide further details about their program, so my friend called me for help.

To remain confidential, data transferred on public channels must be protected with strong encryption, the implementation of the cryptographic process must be free of errors and the cryptographic keys must be managed securely.

First we get acquainted with the program. In fact, it’s very simple: you start the program, open the file to be protected and save the resulting file with another extension.
So there’s no password to be provided. This is an indication that the cryptographic key is stored in the program. This is no problem, as long as public key cryptography is used. However, if secret-key cryptography is used, the secret key can be retrieved from the program by reverse engineering and can then be used to decrypt the data.

The protected file is much smaller (around 4 times), so compression is involved. A first glance at the protected file with a hex editor (like XVI32) doesn’t reveal much, there’s nothing readable.

One can follow 2 paths to identify if cryptographic methods are used in a program: you can analyze the program and you can analyze the data.

When analyzing the program , the goal is to identify cryptographic algorithms. The cryptographic library can be linked statically or dynamically. For Windows programs, you can use a dependency viewer (like Dependency Walker) to view the imported DLLs. For statically linked programs, you can use FindCrypt2 by Ilfak. It’s an IDA Pro plugin that looks for cryptographic constants in the disassembled code.

We decide to proceed with the analysis of the protected data. Reverse engineering will come later, but I can’t resist a quick peek at the strings in the program (with BinText). These strings stand out to me:

deflate 1.1.4 Copyright 1995-2002 Jean-loup Gailly 
inflate 1.1.4 Copyright 1995-2002 Mark Adler

They come from the zlib library, an open source library for GZip compression.

We encrypt the same data file a second time, and save it with another name. The size of the 2 protected files is the same. Comparing these 2 files with JojoDiff (a binary comparison program) shows that the files are almost identical:

jdiff-w32 -lr test1.prt test2.prt 
       1        1 EQL 10 
      11       11 MOD 256 
     267      267 EQL 2380

The –lr options displays ASCII output with regions (sequential parts of the binary files).
This result shows us that the first 10 bytes and the last 2380 bytes are the same, there’s only a region of 256 bytes that differ. Because most of the file is the same, we can deduce that the protection always uses the same encryption key (the 256 different bytes are probably a structure of status fields like filename, timestamps and other stuff). So this program doesn’t use “fancy” stuff like session keys, salting, initialization vectors, …

Now we protect several other files and compare them: the size is different and only the first 10 bytes are the same.
We formulate our hypothesis for the file format:
Bytes 1-10: header (magic bytes)
Bytes 11-266: status data
Bytes from 267 on: encrypted data

Now we will concentrate on the encrypted data. Strong ciphertext should be hard to distinguish from a series of random bytes. CrypTool is a freeware program which enables you to apply and analyse cryptographic mechanisms. It’s an excellent educational program. We will use it to see how “random” the encrypted data is.

The Analysis / General menu option in CrypTool has several tools to analyze ciphertext (like calculating the entropy), but because we have no clear idea of the results we can expect with strong encryption, we do the following:

We take our data file and use several methods to generate a “transformed” file:

  1. We protect it with the provided program
  2. We ZIP it
  3. We GZip it
  4. We password protect it with ZIP but don’t use compression, just store.
  5. We encrypt it with RSA
  6. We encrypt it with Ncrypt

We analyze each file with the CrypTool cryptanalysis tools and compare the results. I won’t detail each result here, but we have 2 important results.

First, the entropy of our protected file is in the same range as the entropy of the compressed files, rather than the encrypted files:

File Entropy
File 1 7.92
File 2 7.89
File 3 7.93
File 4 7.98
File 5 7.97
File 6 7.97

The maximum entropy is 8.

Second, we find the same periodicity cycle in the protected file and the GZipped file, but at a different offset:

Periodicity analysis of test1.prt: 
No.    Offset    Length    Number of cycles    Cycle content 
1    2637    1    2        .     00
Periodicity analysis of test.gz: 
No.    Offset    Length    Number of cycles    Cycle content 
1    2388    1    2        .     00

The difference in offset is 249 bytes, almost the size of the header and status data (265)!
This is a strong indication that the protected data is just compressed, not encrypted, and that it’s GZip compressed.

The binary comparison of the protected data and the GZipped opens our eyes:

jdiff-w32 -lr test1.prt test.gz 
       1        1 MOD 598 
     599      599 DEL 129 
     727      598 EQL 1791

Both files share the same sequence of 1791 bytes!

We review our hypothesis for the file format:
Bytes 1-10: header (magic bytes)
Bytes 11-266: status data
Bytes from 267 on: encrypted GZipped data

I know that jdiff can be confused when comparing files which start differently but then continue identically, so we decide to compare them starting from the end. We binary reverse both files and compare them again:

jdiff-w32 -lr reverse-test1.prt reverse-test.gz 
       1        1 EQL 2370 
    2371     2370 MOD 19

Wow! The GZipped file is almost completely included in the protected file, except for 19 bytes (this is very likely the GZip header which contains, among other things, the original file name).

To test our hypothesis, we strip the first 266 bytes from the protected file (with the tail command), name it test.gz and decompress it with the gzip command. Success! We have recovered our original file, and we prove that the so-called “protection” provided by the program is not encryption, just standard compression! It can easily be defeated in a few seconds with 2 simple commands: tail and gzip.

This analysis has taken us about 2 hours. My friend has his answer about the protection level provided by the program. Now it’s up to him to report this to his manager and decide how to proceed.

Later on, I started reverse engineering the program.
The first 10 bytes are a fixed string, the so-called magic bytes, used to identify the file type.
The next 256 bytes are just random bytes generated by the program, and have no meaning whatsoever! The program seeds the RNG with the current time, explaining why protecting the same file twice gives a different 256 byte sequence.

By now I knew enough to formulate a final, proven hypothesis about the file format:
Bytes 1-10: header (magic bytes)
Bytes 11-266: status data garbage
Bytes from 267 on: encrypted GZipped data.

Yet Another Case of Security Through Obscurity. Or, quoting Bruce Schneier, “Snake Oil”!


  1. Nice one

    Comment by Talha — Tuesday 1 August 2006 @ 14:34

  2. […] Filed under: Reverse Engineering — Didier Stevens @ 6:04 One year ago, to the day, I posted YACoSTO. I explained how I reversed a program that “protects” data. This is one of my favorite […]

    Pingback by YACoSTO, One Year Ago « Didier Stevens — Tuesday 31 July 2007 @ 6:04

  3. What does the 10 byte header contain ?
    I know the first and second bytes are constant and that the third byte is for the mode.. what are the rest for ?

    Comment by Dj — Monday 13 August 2007 @ 3:46

  4. The magic bytes sequence.

    Comment by Didier Stevens — Monday 13 August 2007 @ 20:35

  5. Very good. I enjoyed this a lot and learnt about the value of comparing entropy. This one gets filed and saved.

    Comment by Adrian — Friday 7 September 2007 @ 13:10

  6. Terrific work and great write-up! And thanks for all the helpful links.

    I was, out of curiosity, wanting to analyze the entropy of my AxCrypt encrypted file (AxCrypt is great, btw). The CryptoTool you linked is perfect for the job.

    So I’d love to hear what happened when your friend reported to the bank that they got scammed.


    Comment by J_Tom_Moon_79 — Saturday 23 August 2008 @ 0:18

  7. Well…

    They had developed the program themselves, and they saw now problem with it…

    Comment by Didier Stevens — Saturday 23 August 2008 @ 12:06

  8. Just discovered your blog and this great post!
    So your friend had to pretend he never got this analysis from you, right? Because otherwise, as a professional, he should refuse to use the “encryption” tool (*) made by the company to deliver confidential data to them.
    This story is a shame, the company deserves a lawsuit.

    (*) Better call it an “obfuscation” tool.

    Comment by D0R — Thursday 16 October 2008 @ 13:36

RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.