Didier Stevens

ExtractScripts

Extractscripts.py takes an HTML file as argument and generates a separate file for each script in the input file. I use it to extract (potentially) malicious scripts from a webpage and execute them with my patched spidermonkey.
Extractscipts is written in Python to be portable across multiple platforms.
Example:
# ll
-rw-r–r– 1 root root 1131 Apr 21 00:19 index.htm
# extractscripts index.htm
# ll
-rw-r–r– 1 root root 1131 Apr 21 00:19 index.htm
-rw-r–r– 1 root root 607 Apr 21 00:21 script.1.javascript
-rw-r–r– 1 root root 0 Apr 21 00:21 script.2.JavaScript

The name of each script file generated by extractscripts has the following format:

script.counter.language

Counter starts with 1 and is increased with each new script found in the input file. Language equals to the value of the language attribute of the <script> tag. The .language part of the format is only set when there is a language attribute.
Remark that the size of the second file is 0, this means that there was nothing between the <script></script> tags. This often indicates that there is a src attribute to download the actual script.

Download:

extractscript.zip (https)

MD5: D40AFBB62A304C20B0BF06DA70B6DBF4

4 Comments »

  1. […] Filed under: Malware, My Software — Didier Stevens @ 6:26 ExtractScripts is another one of my little tools I use to analyze […]

    Pingback by ExtractScripts « Didier Stevens — Tuesday 26 June 2007 @ 6:27

  2. […] Filed under: Malware, My Software, Update — Didier Stevens @ 0:06 I’ve updated ExtractScripts to handle comments inside <script> […]

    Pingback by ExtractScripts Update « Didier Stevens — Wednesday 11 July 2007 @ 0:06

  3. good day mr stevens.. i’m one of you fans.. I tried to use you script extract, i’t working properly but when i try to extract my sample script, the output were different with the original.. can you take a look at this path.. http://example010.googlepages.com/sample2.html.tar.gz

    Comment by yip man — Thursday 5 February 2009 @ 6:43

  4. Indeed, I see a difference. Will have to check the code.

    Comment by Didier Stevens — Thursday 5 February 2009 @ 18:43


RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 244 other followers

%d bloggers like this: