Didier Stevens

ExtractScripts

Extractscripts.py takes an HTML file as argument and generates a separate file for each script in the input file. I use it to extract (potentially) malicious scripts from a webpage and execute them with my patched spidermonkey.
Extractscipts is written in Python to be portable across multiple platforms.
Example:
# ll
-rw-r–r– 1 root root 1131 Apr 21 00:19 index.htm
# extractscripts index.htm
# ll
-rw-r–r– 1 root root 1131 Apr 21 00:19 index.htm
-rw-r–r– 1 root root 607 Apr 21 00:21 script.1.javascript
-rw-r–r– 1 root root 0 Apr 21 00:21 script.2.JavaScript

The name of each script file generated by extractscripts has the following format:

script.counter.language

Counter starts with 1 and is increased with each new script found in the input file. Language equals to the value of the language attribute of the <script> tag. The .language part of the format is only set when there is a language attribute.
Remark that the size of the second file is 0, this means that there was nothing between the <script></script> tags. This often indicates that there is a src attribute to download the actual script.

Download:

extractscript.zip (https)

MD5: D40AFBB62A304C20B0BF06DA70B6DBF4

10 Comments »

  1. […] Filed under: Malware, My Software — Didier Stevens @ 6:26 ExtractScripts is another one of my little tools I use to analyze […]

    Pingback by ExtractScripts « Didier Stevens — Tuesday 26 June 2007 @ 6:27

  2. […] Filed under: Malware, My Software, Update — Didier Stevens @ 0:06 I’ve updated ExtractScripts to handle comments inside <script> […]

    Pingback by ExtractScripts Update « Didier Stevens — Wednesday 11 July 2007 @ 0:06

  3. good day mr stevens.. i’m one of you fans.. I tried to use you script extract, i’t working properly but when i try to extract my sample script, the output were different with the original.. can you take a look at this path.. http://example010.googlepages.com/sample2.html.tar.gz

    Comment by yip man — Thursday 5 February 2009 @ 6:43

  4. Indeed, I see a difference. Will have to check the code.

    Comment by Didier Stevens — Thursday 5 February 2009 @ 18:43

  5. […] ExtractScript […]

    Pingback by How to deobfuscate an obfuscated javascript file like this? [duplicate] | DL-UAT — Friday 22 May 2015 @ 21:29

  6. […] Rhino Debugger, ExtractScripts, Firebug, SpiderMonkey, V8, JS […]

    Pingback by REMnux: Distribución de Linux especializada en en el análisis de malware | Skydeep — Thursday 20 August 2015 @ 1:49

  7. […] ExtractScripts: Extract JavaScript scripts from an HTML file […]

    Pingback by REMnux Distro Linux Untuk Analisis Malware | Acehlinux.org — Tuesday 29 December 2015 @ 20:27

  8. […] ExtractScripts extractscripts.py Extract JavaScript scripts from an HTML file remnux-didier (APT) https://blog.didierstevens.com/programs/extractscripts/ Examine Browser Malware: JavaScript Firebug firefox, F12 JavaScript debugger for Firefox get-remnux […]

    Pingback by Remnux-A tool for reverse engineering Malware – Infohub — Saturday 8 April 2017 @ 22:40

  9. question re: your hashes. Performed a md5 checksum and the cases are off — your posted upper case is showing lower case in my validation. Please advise.

    Comment by Anonymous — Wednesday 25 March 2020 @ 3:13

  10. It’s the same, case does not matter here. Cryptographic hashes like MD5 are binary numbers, that by convention are represented in hexadecimal. Hexadecimal digits are not case-sensitive, e.g. letters a-f are equal to A-F.

    Comment by Didier Stevens — Wednesday 25 March 2020 @ 13:05


RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.