Didier Stevens

Monday 2 October 2006

Reversing an anonymous proxy

Filed under: Reverse Engineering — Didier Stevens @ 10:08

Unipeak is a free anonymous proxy, it encodes the URLs like this:

http://www.unipeak.com/gethtml.php?_u_r_l_=aHR0cDovL3d3dy5nb29nbGUuY29t (this is http://www.google.com).

Suppose you had to reverse engineer the encoding scheme, how could you proceed? You are in a comfortable position, because you can execute a Chosen Plaintext Attack.

First we need to find out if the encoding scheme is reversible, because it could also be a hash or another key used to access the cache of the proxy (if it’s a caching proxy).

So we add a letter ‘a’ to the encoded URL and see what Unipeak replies:

http://www.unipeak.com/gethtml.php?_u_r_l_=aHR0cDovL3d3dy5nb29nbGUuY29ta

and we see the Google website.

So it’s not a hash, it’s reversible.

We add another ‘a':

http://www.unipeak.com/gethtml.php?_u_r_l_=aHR0cDovL3d3dy5nb29nbGUuY29taa

and now we get an error message:

unable to connect to http://www.google.comi:80/

It’s definitely reversible.

Searching with Google via Unipeak gives another URL:

http://www.unipeak.com/gethtml.php?_u_r_l_=aHR0cDovL3d3dy5nb29nbGUuY29tOjgwL3NlYXJjaA%3D%3D&hl=en&q=unipeak&btnG=Google+Search

This URL starts with the same sequence as our first URL, so it’s probably a simple encoding scheme where the characters are processed from left to right.

So let’s start another experiment, we enter this URL: aaaaaaaaaa

The encoded URL is:

http://www.unipeak.com/gethtml.php?_u_r_l_=YWFhYWFhYWFhYQ==

Very interesting, we also get a repeating pattern, but the cycle is 4 characters long (YWFh).

Ok, now let’s use a trick: we enter a series of characters Us. The character U is special, its ASCII encoding written in binary is 01010101. Thus UU is 0101010101010101, UUU is 010101010101010101010101, …

Entering UUUUUUUUUU gives us:

http://www.unipeak.com/gethtml.php?_u_r_l_=VVVVVVVVVVVVVQ==

Another nice sequence!

This is a strong indication that the encoding is done at the bit level: the input is seen as a stream of bits, the bits are grouped in groups of X bits (where X is unknown). Each group is transformed to another sequence of bits by a function F, and the same function F is used for each group. We can also assume that X is even, otherwise we wouldn’t get a sequence of identical characters, but a sequence of identical pairs.

We perform some extra tests to prove (or disprove) our hypothesis.

We encode sequences of different lengths and compare the length of the cleartext and the cyphertext: the ratio is about 3 to 4, 3 input characters generate 4 output characters (BTW, the fact that we get a cycle of 4 characters for aaaaa… is also a strong indication for this ratio).

So X can be 3, 6, 9, 12, … . Except we assume X is even: 6, 12, …

Let’s test X = 6.

We try URL 000, this gives us MDAw (http://www.unipeak.net/gethtml.php?_u_r_l_=MDAw)
Now 000 is 30 30 30 (in hexadecimal ASCII)

or 00110000 00110000 00110000 in binary, grouped in 8 bits (1 byte)
or 001100 000011 000000 110000 in binary, but grouped in 6 bits (X = 6)

Now increment the first group:

001101 000011 000000 110000

or 00110100 00110000 00110000 in binary, grouped in 8 bits (1 byte)

or 34 30 30 (in hexadecimal ASCII)

or 400

So 000 becomes 400 when you increment the first group of 6 bits.

Testing URL 400 gives NDAw: changing the first 6 bits changes only the first character!

We do the same for the remaining groups:

000 -> 0@0 -> MEAw

000 -> 00p -> MDBw

000 -> 001 -> MDAx
So X is indeed 6, because changing a group of 6 bits at a time changes only one encoded character.

And we can also assume that function F is linear, because incrementing the input with 1 increments the output with 1 (M -> N, D -> E, A -> B and w -> x).

Now we could try every possible permutation of 6 bits, and see what the corresponding encoded character is.

We would discover that F maps 0..63 to ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

And this is a very common encoding scheme: base64

4 Comments »

  1. Great work, once again. using U to help determine whether or not bit level modification scheme was being used is brilliant.

    Comment by Ryan — Thursday 5 October 2006 @ 13:51

  2. [...] I just found Didier Stevens’ blog. In this entry, he goes through the process of reverse engineering and encoding scheme which turns out to be base64. [...]

    Pingback by Rei - Through The Wire » Blog Archive » Reversing base64 — Monday 16 October 2006 @ 17:09

  3. Nice work! The site is to help the people to surf the blocked site. The site could use the session ID instead of encoded URL so that it will be hard to decode the url.
    :)

    http://unipeak.net

    http://unipeak.com

    Comment by nicework — Thursday 26 October 2006 @ 0:48


RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 230 other followers

%d bloggers like this: