maldoc | Didier Stevens

Tuesday 29 August 2023

Quickpost: PDF/ActiveMime Maldocs YARA Rule

Filed under: maldoc,Malware,Quickpost — Didier Stevens @ 18:07

Here is a YARA rule I developed to detect PDF/ActiveMime maldocs I wrote about in “Quickpost: Analysis of PDF/ActiveMime Polyglot Maldocs“.

It looks for files that start with %PDF- (this header can be obfuscated) and contain string QWN0aXZlTWlt (string ActiveMim in BASE64), possibly obfuscated with whitespace characters.

rule rule_pdf_activemime {
    meta:
        author = "Didier Stevens"
        date = "2023/08/29"
        version = "0.0.1"
        samples = "5b677d297fb862c2d223973697479ee53a91d03073b14556f421b3d74f136b9d,098796e1b82c199ad226bff056b6310262b132f6d06930d3c254c57bdf548187,ef59d7038cfd565fd65bae12588810d5361df938244ebad33b71882dcf683058"
        description = "look for files that start with %PDF- and contain BASE64 encoded string ActiveMim (QWN0aXZlTWlt), possibly obfuscated with extra whitespace characters"
        usage = "if you don't have to care about YARA performance warnings, you can uncomment string $base64_ActiveMim0 and remove all other $base64_ActiveMim## strings"
    strings:
        $pdf = "%PDF-"
//        $base64_ActiveMim0 = /[ \t\r\n]*Q[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim1 = /Q  [ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim2 = /Q \t[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim3 = /Q \r[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim4 = /Q \n[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim5 = /Q\t [ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim6 = /Q\t\t[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim7 = /Q\t\r[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim8 = /Q\t\n[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim9 = /Q\r [ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim10 = /Q\r\t[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim11 = /Q\r\r[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim12 = /Q\r\n[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim13 = /Q\n [ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim14 = /Q\n\t[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim15 = /Q\n\r[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim16 = /Q\n\n[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim17 = /QW [ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim18 = /QW\t[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim19 = /QW\r[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim20 = /QW\n[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim21 = /QWN[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
    condition:
        $pdf at 0 and any of ($base64_ActiveMim*)
}

The regex used to detect characters QWN0aXZlTWlt interspersed with whitespace characters (YARA string $base64_ActiveMim0) has no atoms (for YARA’s Aho-Corasic algorithm) larger than 1 byte, and thus generates a warning, that prohibits its use for hunting with VirusTotal.

That is why I replaced that regex with 21 regexes that all start with 3 fixed bytes and thus allow YARA to select atoms that are large enough.

Quickpost info

Leave a Comment

Quickpost: Analysis of PDF/ActiveMime Polyglot Maldocs

Filed under: maldoc,Malware,My Software,Quickpost — Didier Stevens @ 10:50

jpcert reported a new type of maldoc: “MalDoc in PDF – Detection bypass by embedding a malicious Word file into a PDF file –“.

These maldocs are PDF files that embed a Word document (ActiveMime) in MIME format.

ActiveMime documents can be analyzed by combining my emldump.py tool and oledump.py.

ActiveMime documents were heavily obfuscated in the past, and this is also the case here. As emldump.py version 0.0.11 was only able to handle the obfuscation of 2 of the 3 samples mentioned by jpcert, I released a new version to handle more obfuscation.

Here is an analysis example for sample 5b677d297fb862c2d223973697479ee53a91d03073b14556f421b3d74f136b9d.

Run emldump (version 0.0.12 or later) with option -F to fix the obfuscation of the mime-version header:

To find the part where the ActiveMime file was hidden, use option -E %HEADASCII% to view the first 20 characters of each part:

Here we can see that part 14 is not a JPEG file, but an ActiveMime file.

We extract it and pipe it into oledump.py:

That ActiveMime file contains VBA code:

These maldocs (at least the 3 samples shared by jpcert) can be detected by pdfid with option -e to display extra information:

There are a lot of bytes outside streams (usually for PDFs, there shouldn’t be) and the count of stream and endstream documents is different.

But like I said, these are detections for these 3 samples, it’s possible to modify those samples to remove the anomalies.

Quickpost info

Comments (2)

Sunday 22 January 2023

New Tool: onedump.py

Filed under: maldoc,Malware,My Software — Didier Stevens @ 9:24

This is a new tool (based on my Python template for binary files) to analyze OneNote files.

This version is limited to handling embedded files (for the moment).

As I might still make significant changes to the user interface, I’ve put this tool in my GitHub beta repository.

Leave a Comment

Saturday 31 December 2022

Combining zipdump, file-magic And myjson-filter

Filed under: maldoc,Malware — Didier Stevens @ 9:38

In this blog post, I show how you can combine my tools zipdump.py, file-magic.py and myjson-filter.py to select and analyze files of a particular type.

I start with a daily batch of malware files published by Malware Bazaar.

I let it produce JSON output using option –jsonoutput, that can be consumed by some of my tools, like file-magic.py, my tool to identify files based on the content using the libmagic library.

In the output above, we can see that most files are PE files (Windows executables).

For this example, I’m interested in Office files (ole files). I can filter the output of file-magic.py for that with option -r. Libmagic identifies this type of file as “Composite Document File …”, thus I filter for Composite:

This gives me a list of malicious Office documents. I want to extract URLs from them, but I don’t want to extract all of these files from the ZIP container to disk, and do the URL extraction file per file.

I want to do this with a one-liner. 🙂

What I’m going to do, is use file-magic’s option –jsonoutput, so that it augments the json output of zipdump with the file type, and then I use my tool myjson-filter.py to filter that json output for files that are only of a type that contains the word Composite. With this command:

This produces JSON output that contains the content of each file of type Composite, found inside the ZIP container.

This output can be consumed by my tool strings.py, to extract all the strings.

Side note: if you want to know first which files were selected for processing, use option -l:

Let’s pipe the filtered JSON output into strings.py, with options to produce a list of unique strings (-u) that contain the word http (-s http), like this:

I use my tool re-search.py to extract a list of unique URLs:

I filter out common URLs found in Office documents:

And finally, I sort the URLs by domain name using my tool sortcanon.py:

The adobe URLs are not malicious, but the other ones could be.

This one-liner allows me to quickly process daily malware batches, looking for easy IOCs (cleartext URLs in Office documents) without writing any malicious file to disk.

zipdump.py --jsonoutput 2020-10-24.zip | file-magic.py --jsoninput --jsonoutput | myjson-filter.py -t Composite | strings.py --jsoninput -u -s http | re-search.py -u -n url -F officeurls | sortcanon.py -c domain

Remark that by using an option to search for strings with the word http (-s http), I reduce the output of strings to be processed by re-search.py, so that the search is faster. But that limits you (mostly) to URLs with protocol http or https.

Leave out this option if you want to search for all possible protocols, or try -s “://”.

Leave a Comment

Saturday 10 September 2022

Maldoc Analysis Video – Rehearsed & Unrehearsed

Filed under: maldoc,Malware,My Software,video — Didier Stevens @ 21:41

When I record maldoc analysis videos, I have already analyzed the maldoc prior to recording, and I rehearse the recording.

This time, I also recorded the unrehearsed analysis: when I take the first look at a maldoc I’ve not seen before.

All in this video:

Leave a Comment

Sunday 4 September 2022

Update: oledump.py Version 0.0.70

Filed under: maldoc,My Software,Update,video — Didier Stevens @ 15:38

This is an update to plugin plugin_vba_dco.py, improving generalization and adding option -p.

You can watch this maldoc analysis video to learn how to use the generalization feature of this plugin:

oledump_V0_0_70.zip (http)
MD5: D6EC4FD6B7BE60E01A98922BC06A1E8F
SHA256: E9EE79501A08E896A601F1AFDDB6D3C05D9A2A1FD5899D44AC422DD79E4EF678

Comments (2)

Thursday 16 June 2022

Discovering A Forensic Artifact

Filed under: Forensics,Hacking,maldoc — Didier Stevens @ 0:00

While developing my oledump plugin plugin_olestreams.py, I noticed that the item moniker’s name field (lpszItem) values I observed while analyzing Follina RTF maldocs, had a value looking like _1715622067:

The number after the underscore (_), is derived from the timestamp when the item moniker was created. That timestamp is expressed as an epoch value in local time, to which a constant number is added: 61505155.

I figured this out by doing some tests. 61505155 is an approximation: I might be wrong by a couple of seconds.

Item name _1715622067 is the value you find in Follina maldocs created from this particular RTF template made by chvancooten. 1715622067 minus 61505155 is 1654116912. Converting epoch value 1654116912 to date & time value gives: Wednesday, June 1, 2022 8:55:12 PM. That’s when that RTF document was created.

RTF documents made from this template, can be detected by looking for string 0c0000005f3137313536323230363700 inside the document (you have to look for this hexadecimal string, not case sensitive, because OLE files embedded in RTF are represented in hexadecimal).

Notice that the newest template in that github repository is taken from a cve-2017-0199 RTF template document, and that it no longer contains a item moniker.

But it does contain another timestamp:

This hexadecimal string can also be used for detection purposes: 906660a637b5d201

I used the following YARA rules for a retrohunt (34 matches):

rule follina_rtf_template_1 {
    strings:
        $a = "0c0000005f3137313536323230363700" nocase
    condition:
        $a
}

rule follina_rtf_template_2 {
    strings:
        $a = "906660a637b5d201" nocase
    condition:
        $a
}

Notice that I do not include a test for RTF documents in my rules: these rules also detect Python program follina.py.

And if you are a bit familiar with the RTF syntax, you know that it’s trivial to modify such RTF documents to avoid detection by the above YARA rules.

Later I will spend some time to find the actual code that implements the generation of the item value _XXXXXXXXXX. Maybe you can find it, or you already know where it is located.

Comments (1)

Monday 4 April 2022

.ISO Files With Office Maldocs & Protected View in Office 2019 and 2021

Filed under: maldoc,Malware,Uncategorized — Didier Stevens @ 0:00

We have seen ISO files being used to deliver malicious documents via email. There are different variants of this attack.

One of the reasons to do this, is to evade “mark-of-web propagation”.

When a file (attached to an email, or downloaded from the Internet) is saved to disk on a Windows system, Microsoft applications will mark this file as coming from the Internet. This is done with a ZoneIdentifier Alternate Data Stream (like a “mark-of-web”).

When a Microsoft Office application, like Word, opens a document with a ZoneIdentifier ADS, the document is opened in Protected View (e.g., sandboxed).

But when an Office document is stored inside an ISO file, and that ISO has a ZoneIdentifier ADS, then Word will not open the document in Protected View. That is something I observed 5 years ago.

But this has changed recently. When exactly, I don’t know (update: August 2021).

But when I open an Office document stored inside an ISO file marked with a ZoneIdentifier ADS, Office 2021 will open the document in protected view:

With an unpatched version of Office 2019, that I installed a year ago, that same file is not opened in Protected View:

After updating Office:

Word’s behavior has changed:

The file is now opened in Protected View.

If you want to test this yourself, you can use my ZoneIdentifier tool to easily settings a “mark-of-web” without having to download your test file from the Internet:

Or you can just add the ZoneIdentifier ADS with notepad.

I did the same test with Office 2016, I updated an old version and: the document is not opened in Protected View.

I don’t know exactly when Microsoft Office 2019 was updated so that it would open documents in Protected View when they are inside an ISO file marked as originating from the Internet. But if you do know, please post a comment.

Update: this change happened in August 2021. See comments below. Thanks Philippe.

Comments (2)

Wednesday 29 December 2021

VBA: __SRP_ Streams

Filed under: Forensics,maldoc — Didier Stevens @ 0:00

Office documents with a VBA project that contains streams whose name starts with __SRP_, have had their VBA macros executed at least once.

As Dr. Bontchev describes in the documentation for his pcodedmp tool:

When the p-code has been executed at least once, a further tokenized form of it is stored elsewhere in the document (in streams, the names of which begin with __SRP_, followed by a number).

Thus in my maldoc trainings, I always explain that the presence of __SRP_ streams is an indication that the VBA code has been executed prior to the saving of the document, and vice-versa, that the absence means that the code was not executed (prior to saving).

I recently discovered that these __SRP_ streams are also created when the VBA project is compiled (without running the macros), by selecting menu option “Debug / Compile Project” in the VBA IDE.

Comments (1)

Tuesday 19 January 2021

Video: Maldoc Analysis With CyberChef

Filed under: maldoc,Malware,video — Didier Stevens @ 0:00

In this video, I show how to analyze a .doc malicious document using CyberChef only. This is possible, because the payload is a very long string that can be extracted without having to parse the structure of the .doc file with a tool like oledump.py.

I pasted the recipe on pastebin here.

Comments (3)

Didier Stevens

Tuesday 29 August 2023

Quickpost: PDF/ActiveMime Maldocs YARA Rule

Quickpost: Analysis of PDF/ActiveMime Polyglot Maldocs

Sunday 22 January 2023

New Tool: onedump.py

Saturday 31 December 2022

Combining zipdump, file-magic And myjson-filter

Saturday 10 September 2022

Maldoc Analysis Video – Rehearsed & Unrehearsed

Sunday 4 September 2022

Update: oledump.py Version 0.0.70

Thursday 16 June 2022

Discovering A Forensic Artifact

Monday 4 April 2022

.ISO Files With Office Maldocs & Protected View in Office 2019 and 2021

Wednesday 29 December 2021

VBA: __SRP_ Streams

Tuesday 19 January 2021

Video: Maldoc Analysis With CyberChef

Pages

Top Posts

Categories

Blog Stats

Twitter @DidierStevens

Archives