I’ve released a new version of my OllyDbg plugin called OllyStepNSearch.
The new features are:
- an options dialog
- Disable After Break option
- Search in Information Pane
- a new help function
I’ve released a new version of my OllyDbg plugin called OllyStepNSearch.
The new features are:
No big surprise here, I didn’t get a WiFi signal in the Channel Tunnel today. I only logged packets: plenty enough in London, Lille and Brussels, but not in the tunnel.
Still wondering how likely is it to land on a drive-by download page when doing a (Google) search, I analyzed the infamous AOL search data to try to answer this question.
Conclusion: for every 2800 click throughs, 1 landed on a spamdexing site. 1% of the AOL users clicking through landed on a spamdexing site.
The AOL search data was collected over a period of 3 months (01 March, 2006 – 31 May, 2006), it contains 19,442,629 user click through events. A click through event is an entry in the database indicating that the AOL user clicked on the link presented in a Search Engine Result Page (SERP). These are the fields of a click-trough event entry:
Of the 19,442,629 user click through events, 15,066 events have a URL of the following format: digits.alphanums.info.
Expressed as a Perl regular expression, this format is: %\d+\.\w+\.info$ (\w is not strictly limited to alphanumerical characters, the underscore character is also included).
I search for URLs of this format because it was used by the original drive-by download I discovered.
Extracting the main domain (alphanums.info) of the URL of these 15,066 click through events produces a list of 1099 unique domains.
I wrote a script to retrieve and analyze one page for each these 1099 domains. 874 of these pages have the same look and feel as the original drive-by download page:
This leads me to believe that these 874 domains form a network of sites that use spamdexing techniques to rank high in SERPs. They use
From now on, I’ll refer to these sites as Spamdexing “R” Us.
These domains are used now (October 2006) for spamdexing, and I assume they were also used for spamdexing 6 months ago (time frame of the AOL data).
Of the 19,442,629 user click through events, 6,988 events landed on a Spamdexing “R” Us site (i.e. one of the 874 domains I identified). This is 0,04%, or around 1 hit per 2800 SERP click throughs! According to some people I talked with, this is an excellent result for Spamdexing “R” Us: for every 2800 SERP click throughs the AOL users executed, 1 landed in their spider web.
Spamdexing “R” Us rank high on the SERPs:
41% of the traffic comes from the 3 highest ranking click troughs.
How do Spamdexing “R” Us sites compare to the other click through sites in the AOL search data? Ranking all the click throughs per URL shows that Spamdexing “R” Us sites rank high: 142th place. As a side note, it’s interesting to mention that the number 1 in the ranking is http://www.google.com, with 366,623 click throughs.
Here’s a selection of some well-known sites that are in the same click through range as the Spamdexing “R” Us sites:
|142||Spamdexing ‘R Us||6988|
The AOL search data contains 657,426 unique user ID’s. 521,694 users clicked on links in the SERPs, and 4,952 users landed on Spamdexing “R” Us sites. That’s about 1 AOL user per 100 (0,95%) in a 3 month period.
Some caveats / remarks concerning this research:
During our vacation I started programming on a rainy evening, and USBVirusScan was born.
USBVirusScan will launch any program you provide as a command line parameter each time a USB stick is inserted. I use it to start a full virus scan on the inserted USB drive, hence the name.
For example, to start a cmd.exe on each USB drive you insert, you start USBVirusScan like this:
USBVirusScan cmd /k %c:
%c is a placeholder for the drive-letter of the inserted USB drive (yes, that’s %c like C’s printf function, and no, that’s not completely secure, but feel free to adapt it…).
USBVirusScan uses a system tray icon and balloons to announce the insertion of a USB drive. If you want to hide this system tray icon, start USBVirusScan with option -i, like this:
USBVirusScan -i cmd /k %c:
You can also hide the command line console with option -c. This only works with Console applications, not with Windows applications.
Here’s a Windows Script example (log.vbs) that will create a log.txt file on the inserted USB drive with the current date & time:
Dim objFSO Dim objTextFile Dim strFilename
strFilename = Wscript.Arguments.Item(0) & ":\\log.txt" Set objFSO = CreateObject("Scripting.FileSystemObject") If objFSO.FileExists(strFilename) Then Set objTextFile = objFSO.OpenTextFile(strFilename, 8 ) Else Set objTextFile = objFSO.CreateTextFile(strFilename) End If objTextFile.WriteLine Now() objTextFile.Close
You start it with this command: USBVirusScan.exe wscript log.vbs %c
Example of the content of the log file after inserting the USB drive twice:
14/10/2006 17:05:00 14/10/2006 17:05:21
This ZIP file contains both the executable and the source code. If you don’t plan to modify the source code of this program, you’ll only need to extract USBVirusScan.exe.
Compiled with Borland’s free C++ 5.5 compiler. Tested on Windows XP SP2 and Windows Vista.
This is an unexpected result of my post Google and the Drive-by Download:
At the end of my post Google and the Drive-by Download, I wondered how prevalent such query results were.
This is an attempt to answer this question.
Here’s a Perl script that will execute Google queries and look for suspect URLs in the first page with a regular expression (remember, suspect URLs are of the form 123.1a2b3c.info). If you want to use the script on your Windows machine and don’t have a Perl interpreter, you can use ActiveState’s free ActivePerl.
Since I have no list of common Google queries used here in Belgium, I included a simple algorithm in my program to generate its own queries. They look like this: name profession. I feed my program with a list of frequently occurring last-names in Belgium and a list of professions you might want to search for (like a plumber).
Here’s the output of my program:
Suspect queries: 613.6x2q1y.info http://www.google.be/search?hl=fr&q=Thys+Blanchisseur 4859.4rhw0hk.info http://www.google.be/search?hl=fr&q=Gerard+Plombier Suspect URLs: 4859.4rhw0hk.info 613.6x2q1y.info 2 suspect queries out of 2322 queries (0.0861326442721792%).
About 1 out of 1000 queries (looking for a profession) list a drive-by download site on the first result page. That’s not too bad, but still a surprising result to me.
I’ve encountered an interesting Drive-by Download and made a movie of a Windows XP SP2 machine getting infected.
Drive-by downloads are nothing new, but it’s the first time I see one were you are directed to the drive-by download site by a normal, innocent Google query.
These are the steps to get infected:
Searching for vanderelst chauffagiste is a normal, innocent query: I look for a heating technician (chauffagiste) called vanderelst (a common name here in Belgium).
Here is a post of someone (joy) experiencing the same thing when looking for a dentist in Illinois. But apparently joy doesn’t get infected.
I won’t explain how this drive-by download works. My point is rather that spyware makers have found ways to get their infected websites highly ranked by Google when you execute a normal, innocent query. We know that you’re likely to get infected when you look for keygens or cracks, but not when you’re searching for a local dentist.
My Search Engine Optimisation knowledge is very limited, I cannot explain you how they got their sites top listed by Google. According to joy, it has something to do with the fake Google result page they host (see the movie).
First I show that there’s no service32.exe file in the c:\windows directory. You can see that I’m running as local admin, which is a bad idea, but please bear with me.
Next I search for my heating technician with Google and click on the first link (you’ll notice the strange URL of the .info TLD with random subdomains).
The free Kerio Personal Firewall alerts me of programs (spywares) that are being started. I installed the firewall to visualize the infection in action. And I’m feeling stupid today, so I click on Permit.
Notice that the page looks like a Google search result page, but that all entries point to .info sites that are probably also drive-by download sites.
There’s a half minute of inactivity after 1:30 minutes, be patient and you’ll see other programs being started and the service32.exe file appearing in the Windows directory.
Finally, I go to the Virustotal site to get some files scanned by 20+ virus scanners. This part of the movie is rather boring, but I didn’t want to spend much time editing it, feel free to fast forward. The point is that most virus scanners don’t detect the infected files.
I also used Lavasoft’s Ad-Aware SE Personal (freeware) anti-spyware program to scan the machine: no files were detected.
It should be interesting to know how prevalent these sites are in Google query results.
Unipeak is a free anonymous proxy, it encodes the URLs like this:
Suppose you had to reverse engineer the encoding scheme, how could you proceed? You are in a comfortable position, because you can execute a Chosen Plaintext Attack.
First we need to find out if the encoding scheme is reversible, because it could also be a hash or another key used to access the cache of the proxy (if it’s a caching proxy).
So we add a letter ‘a’ to the encoded URL and see what Unipeak replies:
and we see the Google website.
So it’s not a hash, it’s reversible.
We add another ‘a’:
and now we get an error message:
unable to connect to http://www.google.comi:80/
It’s definitely reversible.
Searching with Google via Unipeak gives another URL:
This URL starts with the same sequence as our first URL, so it’s probably a simple encoding scheme where the characters are processed from left to right.
So let’s start another experiment, we enter this URL: aaaaaaaaaa
The encoded URL is:
Very interesting, we also get a repeating pattern, but the cycle is 4 characters long (YWFh).
Ok, now let’s use a trick: we enter a series of characters Us. The character U is special, its ASCII encoding written in binary is 01010101. Thus UU is 0101010101010101, UUU is 010101010101010101010101, …
Entering UUUUUUUUUU gives us:
Another nice sequence!
This is a strong indication that the encoding is done at the bit level: the input is seen as a stream of bits, the bits are grouped in groups of X bits (where X is unknown). Each group is transformed to another sequence of bits by a function F, and the same function F is used for each group. We can also assume that X is even, otherwise we wouldn’t get a sequence of identical characters, but a sequence of identical pairs.
We perform some extra tests to prove (or disprove) our hypothesis.
We encode sequences of different lengths and compare the length of the cleartext and the cyphertext: the ratio is about 3 to 4, 3 input characters generate 4 output characters (BTW, the fact that we get a cycle of 4 characters for aaaaa… is also a strong indication for this ratio).
So X can be 3, 6, 9, 12, … . Except we assume X is even: 6, 12, …
Let’s test X = 6.
We try URL 000, this gives us MDAw (http://www.unipeak.net/gethtml.php?_u_r_l_=MDAw)
Now 000 is 30 30 30 (in hexadecimal ASCII)
or 00110000 00110000 00110000 in binary, grouped in 8 bits (1 byte)
or 001100 000011 000000 110000 in binary, but grouped in 6 bits (X = 6)
Now increment the first group:
001101 000011 000000 110000
or 00110100 00110000 00110000 in binary, grouped in 8 bits (1 byte)
or 34 30 30 (in hexadecimal ASCII)
So 000 becomes 400 when you increment the first group of 6 bits.
Testing URL 400 gives NDAw: changing the first 6 bits changes only the first character!
We do the same for the remaining groups:
000 -> 0@0 -> MEAw
000 -> 00p -> MDBw
000 -> 001 -> MDAx
So X is indeed 6, because changing a group of 6 bits at a time changes only one encoded character.
And we can also assume that function F is linear, because incrementing the input with 1 increments the output with 1 (M -> N, D -> E, A -> B and w -> x).
Now we could try every possible permutation of 6 bits, and see what the corresponding encoded character is.
We would discover that F maps 0..63 to ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
And this is a very common encoding scheme: base64