Didier Stevens

Monday 30 October 2006

OllyStepNSearch v0.6.0

Filed under: Reverse Engineering — Didier Stevens @ 10:06

I’ve released a new version of my OllyDbg plugin called OllyStepNSearch.

The new features are:

  • an options dialog
  • Disable After Break option
  • Search in Information Pane
  • a new help function

And this time, there is also a demo movie here on YouTube, a hires (XviD) version can be found here.

Thursday 26 October 2006

Wardriving on the Eurostar

Filed under: Nonsense — Didier Stevens @ 19:47

No big surprise here, I didn’t get a WiFi signal in the Channel Tunnel today. I only logged packets: plenty enough in London, Lille and Brussels, but not in the tunnel.

eurostar_wardriving.jpg

Monday 23 October 2006

Spamdexing “R” Us

Filed under: Malware — Didier Stevens @ 10:17

Still wondering how likely is it to land on a drive-by download page when doing a (Google) search, I analyzed the infamous AOL search data to try to answer this question.

Conclusion: for every 2800 click throughs, 1 landed on a spamdexing site. 1% of the AOL users clicking through landed on a spamdexing site.

The AOL search data was collected over a period of 3 months (01 March, 2006 – 31 May, 2006), it contains 19,442,629 user click through events. A click through event is an entry in the database indicating that the AOL user clicked on the link presented in a Search Engine Result Page (SERP). These are the fields of a click-trough event entry:

  • AnonID – an anonymous user ID number.
  • Query – the query issued by the user, case shifted with most punctuation removed.
  • QueryTime – the time at which the query was submitted for search.
  • ItemRank – if the user clicked on a search result, the rank of the item on which they clicked is listed.
  • ClickURL – if the user clicked on a search result, the domain portion of the URL in the clicked result is listed.

Of the 19,442,629 user click through events, 15,066 events have a URL of the following format: digits.alphanums.info.

Expressed as a Perl regular expression, this format is: %\d+\.\w+\.info$ (\w is not strictly limited to alphanumerical characters, the underscore character is also included).

I search for URLs of this format because it was used by the original drive-by download I discovered.

Extracting the main domain (alphanums.info) of the URL of these 15,066 click through events produces a list of 1099 unique domains.

I wrote a script to retrieve and analyze one page for each these 1099 domains. 874 of these pages have the same look and feel as the original drive-by download page:

This leads me to believe that these 874 domains form a network of sites that use spamdexing techniques to rank high in SERPs. They use

  • lots of keywords
  • lots of links to different domains (874 domains)
  • lots of different IP addresses (352 unique IP addresses)

From now on, I’ll refer to these sites as Spamdexing “R” Us.

These domains are used now (October 2006) for spamdexing, and I assume they were also used for spamdexing 6 months ago (time frame of the AOL data).

Of the 19,442,629 user click through events, 6,988 events landed on a Spamdexing “R” Us site (i.e. one of the 874 domains I identified). This is 0,04%, or around 1 hit per 2800 SERP click throughs! According to some people I talked with, this is an excellent result for Spamdexing “R” Us: for every 2800 SERP click throughs the AOL users executed, 1 landed in their spider web.

Spamdexing “R” Us rank high on the SERPs:

Rank Click throughs
1 1313 19%
2 836 12%
3 710 10%
4 553 8%
5 542 8%
6 438 6%
7 376 5%
8 390 6%
9 366 5%
10 384 5%

41% of the traffic comes from the 3 highest ranking click troughs.

How do Spamdexing “R” Us sites compare to the other click through sites in the AOL search data? Ranking all the click throughs per URL shows that Spamdexing “R” Us sites rank high: 142th place. As a side note, it’s interesting to mention that the number 1 in the ranking is http://www.google.com, with 366,623 click throughs.

Here’s a selection of some well-known sites that are in the same click through range as the Spamdexing “R” Us sites:

Rank URL Click throughs
87 http://www.flickr.com 9369
101 http://www.mtv.com 8760
102 http://www.bbc.co.uk 8739
116 http://www.apple.com 7998
120 http://www.facebook.com 7771
128 http://www.washingtonpost.com 7465
133 http://www.usatoday.com 7300
142 Spamdexing ‘R Us 6988
144 http://www.download.com 6929
160 http://www.youtube.com 6259
161 http://www.playboy.com 6241

The AOL search data contains 657,426 unique user ID’s. 521,694 users clicked on links in the SERPs, and 4,952 users landed on Spamdexing “R” Us sites. That’s about 1 AOL user per 100 (0,95%) in a 3 month period.

Some caveats / remarks concerning this research:

  1. I don’t feel I’m prying into AOL users private lives, the URLs I analyzed are meaningless and I didn’t analyze the queries.
  2. The published AOL search data is only a fraction of the AOL search data for that time period. I don’t know how the selection was made.
  3. My research is post factum. I assume that the Spamdexing “R” Us sites were already spamdexing sites since 01 March, 2006.
  4. There can be other spamdexing sites in the AOL search data that don’t use digits.alphanums.info URLs.
  5. I crawled the Spamdexing “R” Us sites over a period of a couple of weeks, during which the iframe to the drive-by download site disappeared.
  6. The size of the Spamdexing “R” Us network is probably larger than I mention (874 domains, 352 IP addresses). I only looked at the part of the spider web that trapped AOL users.
  7. I talk about AOL users, but more precisely, I should talk about AOL search users. I suppose not AOL users use AOL search.
  8. I did not analyze the Query and QueryTime fields
  9. joy thinks AOL search is powered by Google
  10. The WHOIS data for the Spamdexing “R” Us sites is complete nonsense
  11. I don’t know what the relationship is between cleansearch.info, http://www.cucush.info, http://www.veryfastsearch.info and the Spamdexing “R” Us sites
  12. I’ve found Spamdexing “R” Us pages in English, French, German, Spanish and Italian
  13. The Spamdexing “R” Us sites use DNS wildcards
  14. It’s difficult to judge on the success of Spamdexing “R” Us without knowing their business model, costs and revenues. If it’s pay per click (0,04%), I don’t know. If it’s installing a bot on the computers of AOL search users, it’s successful (1%)

Monday 16 October 2006

USBVirusScan

Filed under: My Software — Didier Stevens @ 10:09

When Bruce Scheiner blogged about USBDumper, I downloaded the program and filed it for some later experimentation.

During our vacation I started programming on a rainy evening, and USBVirusScan was born.

USBVirusScan will launch any program you provide as a command line parameter each time a USB stick is inserted. I use it to start a full virus scan on the inserted USB drive, hence the name.

For example, to start a cmd.exe on each USB drive you insert, you start USBVirusScan like this:

USBVirusScan cmd /k %c:

%c is a placeholder for the drive-letter of the inserted USB drive (yes, that’s %c like C’s printf function, and no, that’s not completely secure, but feel free to adapt it…).

USBVirusScan uses a system tray icon and balloons to announce the insertion of a USB drive. If you want to hide this system tray icon, start USBVirusScan with option -i, like this:
USBVirusScan -i cmd /k %c:

You can also hide the command line console with option -c. This only works with Console applications, not with Windows applications.

Here’s a Windows Script example (log.vbs) that will create a log.txt file on the inserted USB drive with the current date & time:

Dim objFSO

Dim objTextFile

Dim strFilename
strFilename = Wscript.Arguments.Item(0) & ":\\log.txt"

Set objFSO = CreateObject("Scripting.FileSystemObject")

If objFSO.FileExists(strFilename) Then

   Set objTextFile = objFSO.OpenTextFile(strFilename, 8 )

Else

   Set objTextFile = objFSO.CreateTextFile(strFilename)

End If

objTextFile.WriteLine Now()

objTextFile.Close

You start it with this command: USBVirusScan.exe wscript log.vbs %c

Example of the content of the log file after inserting the USB drive twice:

14/10/2006 17:05:00
14/10/2006 17:05:21

I used sample code for system tray programming from this Code Project article, and for the rest I generate a new GUID and did some cosmetic changes to the original USBDumper code.

Here is a YouTube movie showing you the program starting a virus scan. A hires (XviD) version can be found here.

Download:

USBVirusScan_V1_0_0.zip (https)

MD5: 7EC0D456717162B84A229CC4A8335B51

This ZIP file contains both the executable and the source code. If you don’t plan to modify the source code of this program, you’ll only need to extract USBVirusScan.exe.

Compiled with Borland’s free C++ 5.5 compiler. Tested on Windows XP SP2 and Windows Vista.

Thursday 12 October 2006

Update 2: Google and the Drive-by Download

Filed under: Malware,Update — Didier Stevens @ 19:44

This is an unexpected result of my post Google and the Drive-by Download:

vanderelstchauffagiste.png

Friday 6 October 2006

Update: Google and the Drive-by Download

Filed under: Malware,Update — Didier Stevens @ 21:49

At the end of my post Google and the Drive-by Download, I wondered how prevalent such query results were.

This is an attempt to answer this question.

Here’s a Perl script that will execute Google queries and look for suspect URLs in the first page with a regular expression (remember, suspect URLs are of the form 123.1a2b3c.info). If you want to use the script on your Windows machine and don’t have a Perl interpreter, you can use ActiveState’s free ActivePerl.

Since I have no list of common Google queries used here in Belgium, I included a simple algorithm in my program to generate its own queries. They look like this: name profession. I feed my program with a list of frequently occurring last-names in Belgium and a list of professions you might want to search for (like a plumber).
Here’s the output of my program:

Suspect queries:

613.6x2q1y.info http://www.google.be/search?hl=fr&q=Thys+Blanchisseur

4859.4rhw0hk.info http://www.google.be/search?hl=fr&q=Gerard+Plombier

Suspect URLs:

4859.4rhw0hk.info

613.6x2q1y.info

2 suspect queries out of 2322 queries (0.0861326442721792%).

About 1 out of 1000 queries (looking for a profession) list a drive-by download site on the first result page. That’s not too bad, but still a surprising result to me.

Google and the Drive-by Download

Filed under: Malware — Didier Stevens @ 9:50

I’ve encountered an interesting Drive-by Download and made a movie of a Windows XP SP2 machine getting infected.

Drive-by downloads are nothing new, but it’s the first time I see one were you are directed to the drive-by download site by a normal, innocent Google query.

These are the steps to get infected:

  1. Start Internet Explorer
  2. Goto http://www.google.be
  3. search for vanderelst chauffagiste
  4. click on the first link (like I’m Feeling Lucky)

Searching for vanderelst chauffagiste is a normal, innocent query: I look for a heating technician (chauffagiste) called vanderelst (a common name here in Belgium).

Here is a post of someone (joy) experiencing the same thing when looking for a dentist in Illinois. But apparently joy doesn’t get infected.

I won’t explain how this drive-by download works. My point is rather that spyware makers have found ways to get their infected websites highly ranked by Google when you execute a normal, innocent query. We know that you’re likely to get infected when you look for keygens or cracks, but not when you’re searching for a local dentist.

My Search Engine Optimisation knowledge is very limited, I cannot explain you how they got their sites top listed by Google. According to joy, it has something to do with the fake Google result page they host (see the movie).

The movie is hosted here on YouTube, and you can find a hires version (XviD) here.

First I show that there’s no service32.exe file in the c:\windows directory. You can see that I’m running as local admin, which is a bad idea, but please bear with me.

Next I search for my heating technician with Google and click on the first link (you’ll notice the strange URL of the .info TLD with random subdomains).

The free Kerio Personal Firewall alerts me of programs (spywares) that are being started. I installed the firewall to visualize the infection in action. And I’m feeling stupid today, so I click on Permit.

Notice that the page looks like a Google search result page, but that all entries point to .info sites that are probably also drive-by download sites.

There’s a half minute of inactivity after 1:30 minutes, be patient and you’ll see other programs being started and the service32.exe file appearing in the Windows directory.

Finally, I go to the Virustotal site to get some files scanned by 20+ virus scanners. This part of the movie is rather boring, but I didn’t want to spend much time editing it, feel free to fast forward. The point is that most virus scanners don’t detect the infected files.

I also used Lavasoft’s Ad-Aware SE Personal (freeware) anti-spyware program to scan the machine: no files were detected.

It should be interesting to know how prevalent these sites are in Google query results.

Monday 2 October 2006

Reversing an anonymous proxy

Filed under: Reverse Engineering — Didier Stevens @ 10:08

Unipeak is a free anonymous proxy, it encodes the URLs like this:

http://www.unipeak.com/gethtml.php?_u_r_l_=aHR0cDovL3d3dy5nb29nbGUuY29t (this is http://www.google.com).

Suppose you had to reverse engineer the encoding scheme, how could you proceed? You are in a comfortable position, because you can execute a Chosen Plaintext Attack.

First we need to find out if the encoding scheme is reversible, because it could also be a hash or another key used to access the cache of the proxy (if it’s a caching proxy).

So we add a letter ‘a’ to the encoded URL and see what Unipeak replies:

http://www.unipeak.com/gethtml.php?_u_r_l_=aHR0cDovL3d3dy5nb29nbGUuY29ta

and we see the Google website.

So it’s not a hash, it’s reversible.

We add another ‘a’:

http://www.unipeak.com/gethtml.php?_u_r_l_=aHR0cDovL3d3dy5nb29nbGUuY29taa

and now we get an error message:

unable to connect to http://www.google.comi:80/

It’s definitely reversible.

Searching with Google via Unipeak gives another URL:
http://www.unipeak.com/gethtml.php?_u_r_l_=aHR0cDovL3d3dy5nb29nbGUuY29tOjgwL3NlYXJjaA%3D%3D&hl=en&q=unipeak&btnG=Google+Search

This URL starts with the same sequence as our first URL, so it’s probably a simple encoding scheme where the characters are processed from left to right.

So let’s start another experiment, we enter this URL: aaaaaaaaaa

The encoded URL is:

http://www.unipeak.com/gethtml.php?_u_r_l_=YWFhYWFhYWFhYQ==

Very interesting, we also get a repeating pattern, but the cycle is 4 characters long (YWFh).

Ok, now let’s use a trick: we enter a series of characters Us. The character U is special, its ASCII encoding written in binary is 01010101. Thus UU is 0101010101010101, UUU is 010101010101010101010101, …

Entering UUUUUUUUUU gives us:

http://www.unipeak.com/gethtml.php?_u_r_l_=VVVVVVVVVVVVVQ==

Another nice sequence!

This is a strong indication that the encoding is done at the bit level: the input is seen as a stream of bits, the bits are grouped in groups of X bits (where X is unknown). Each group is transformed to another sequence of bits by a function F, and the same function F is used for each group. We can also assume that X is even, otherwise we wouldn’t get a sequence of identical characters, but a sequence of identical pairs.

We perform some extra tests to prove (or disprove) our hypothesis.

We encode sequences of different lengths and compare the length of the cleartext and the cyphertext: the ratio is about 3 to 4, 3 input characters generate 4 output characters (BTW, the fact that we get a cycle of 4 characters for aaaaa… is also a strong indication for this ratio).

So X can be 3, 6, 9, 12, … . Except we assume X is even: 6, 12, …

Let’s test X = 6.

We try URL 000, this gives us MDAw (http://www.unipeak.net/gethtml.php?_u_r_l_=MDAw)
Now 000 is 30 30 30 (in hexadecimal ASCII)

or 00110000 00110000 00110000 in binary, grouped in 8 bits (1 byte)
or 001100 000011 000000 110000 in binary, but grouped in 6 bits (X = 6)

Now increment the first group:

001101 000011 000000 110000

or 00110100 00110000 00110000 in binary, grouped in 8 bits (1 byte)

or 34 30 30 (in hexadecimal ASCII)

or 400

So 000 becomes 400 when you increment the first group of 6 bits.

Testing URL 400 gives NDAw: changing the first 6 bits changes only the first character!

We do the same for the remaining groups:

000 -> 0@0 -> MEAw

000 -> 00p -> MDBw

000 -> 001 -> MDAx
So X is indeed 6, because changing a group of 6 bits at a time changes only one encoded character.

And we can also assume that function F is linear, because incrementing the input with 1 increments the output with 1 (M -> N, D -> E, A -> B and w -> x).

Now we could try every possible permutation of 6 bits, and see what the corresponding encoded character is.

We would discover that F maps 0..63 to ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

And this is a very common encoding scheme: base64

Blog at WordPress.com.