Extractscripts.py takes an HTML file as argument and generates a separate file for each script in the input file. I use it to extract (potentially) malicious scripts from a webpage and execute them with my patched spidermonkey.
Extractscipts is written in Python to be portable across multiple platforms.
Example:
# ll
-rw-r–r– 1 root root 1131 Apr 21 00:19 index.htm
# extractscripts index.htm
# ll
-rw-r–r– 1 root root 1131 Apr 21 00:19 index.htm
-rw-r–r– 1 root root 607 Apr 21 00:21 script.1.javascript
-rw-r–r– 1 root root 0 Apr 21 00:21 script.2.JavaScript
The name of each script file generated by extractscripts has the following format:
script.counter.language
Counter starts with 1 and is increased with each new script found in the input file. Language equals to the value of the language attribute of the <script> tag. The .language part of the format is only set when there is a language attribute.
Remark that the size of the second file is 0, this means that there was nothing between the <script></script> tags. This often indicates that there is a src attribute to download the actual script.
Download:
MD5: D40AFBB62A304C20B0BF06DA70B6DBF4
[...] Filed under: Malware, My Software — Didier Stevens @ 6:26 ExtractScripts is another one of my little tools I use to analyze [...]
Pingback by ExtractScripts « Didier Stevens — Tuesday 26 June 2007 @ 6:27
[...] Filed under: Malware, My Software, Update — Didier Stevens @ 0:06 I’ve updated ExtractScripts to handle comments inside <script> [...]
Pingback by ExtractScripts Update « Didier Stevens — Wednesday 11 July 2007 @ 0:06