Didier Stevens

Saturday 31 December 2022

Combining zipdump, file-magic And myjson-filter

Filed under: maldoc,Malware — Didier Stevens @ 9:38

In this blog post, I show how you can combine my tools zipdump.py, file-magic.py and myjson-filter.py to select and analyze files of a particular type.

I start with a daily batch of malware files published by Malware Bazaar.

I let it produce JSON output using option –jsonoutput, that can be consumed by some of my tools, like file-magic.py, my tool to identify files based on the content using the libmagic library.

In the output above, we can see that most files are PE files (Windows executables).

For this example, I’m interested in Office files (ole files). I can filter the output of file-magic.py for that with option -r. Libmagic identifies this type of file as “Composite Document File …”, thus I filter for Composite:

This gives me a list of malicious Office documents. I want to extract URLs from them, but I don’t want to extract all of these files from the ZIP container to disk, and do the URL extraction file per file.

I want to do this with a one-liner. 🙂

What I’m going to do, is use file-magic’s option –jsonoutput, so that it augments the json output of zipdump with the file type, and then I use my tool myjson-filter.py to filter that json output for files that are only of a type that contains the word Composite. With this command:

This produces JSON output that contains the content of each file of type Composite, found inside the ZIP container.

This output can be consumed by my tool strings.py, to extract all the strings.

Side note: if you want to know first which files were selected for processing, use option -l:

Let’s pipe the filtered JSON output into strings.py, with options to produce a list of unique strings (-u) that contain the word http (-s http), like this:

I use my tool re-search.py to extract a list of unique URLs:

I filter out common URLs found in Office documents:

And finally, I sort the URLs by domain name using my tool sortcanon.py:

The adobe URLs are not malicious, but the other ones could be.

This one-liner allows me to quickly process daily malware batches, looking for easy IOCs (cleartext URLs in Office documents) without writing any malicious file to disk.

zipdump.py --jsonoutput 2020-10-24.zip | file-magic.py --jsoninput --jsonoutput | myjson-filter.py -t Composite | strings.py --jsoninput -u -s http | re-search.py -u -n url -F officeurls | sortcanon.py -c domain

Remark that by using an option to search for strings with the word http (-s http), I reduce the output of strings to be processed by re-search.py, so that the search is faster. But that limits you (mostly) to URLs with protocol http or https.

Leave out this option if you want to search for all possible protocols, or try -s “://”.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.