August | 2023 | Didier Stevens

Tuesday 29 August 2023

Quickpost: PDF/ActiveMime Maldocs YARA Rule

Filed under: maldoc,Malware,Quickpost — Didier Stevens @ 18:07

Here is a YARA rule I developed to detect PDF/ActiveMime maldocs I wrote about in “Quickpost: Analysis of PDF/ActiveMime Polyglot Maldocs“.

It looks for files that start with %PDF- (this header can be obfuscated) and contain string QWN0aXZlTWlt (string ActiveMim in BASE64), possibly obfuscated with whitespace characters.

rule rule_pdf_activemime {
    meta:
        author = "Didier Stevens"
        date = "2023/08/29"
        version = "0.0.1"
        samples = "5b677d297fb862c2d223973697479ee53a91d03073b14556f421b3d74f136b9d,098796e1b82c199ad226bff056b6310262b132f6d06930d3c254c57bdf548187,ef59d7038cfd565fd65bae12588810d5361df938244ebad33b71882dcf683058"
        description = "look for files that start with %PDF- and contain BASE64 encoded string ActiveMim (QWN0aXZlTWlt), possibly obfuscated with extra whitespace characters"
        usage = "if you don't have to care about YARA performance warnings, you can uncomment string $base64_ActiveMim0 and remove all other $base64_ActiveMim## strings"
    strings:
        $pdf = "%PDF-"
//        $base64_ActiveMim0 = /[ \t\r\n]*Q[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim1 = /Q  [ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim2 = /Q \t[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim3 = /Q \r[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim4 = /Q \n[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim5 = /Q\t [ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim6 = /Q\t\t[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim7 = /Q\t\r[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim8 = /Q\t\n[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim9 = /Q\r [ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim10 = /Q\r\t[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim11 = /Q\r\r[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim12 = /Q\r\n[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim13 = /Q\n [ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim14 = /Q\n\t[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim15 = /Q\n\r[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim16 = /Q\n\n[ \t\r\n]*W[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim17 = /QW [ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim18 = /QW\t[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim19 = /QW\r[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim20 = /QW\n[ \t\r\n]*N[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
        $base64_ActiveMim21 = /QWN[ \t\r\n]*0[ \t\r\n]*a[ \t\r\n]*X[ \t\r\n]*Z[ \t\r\n]*l[ \t\r\n]*T[ \t\r\n]*W[ \t\r\n]*l[ \t\r\n]*t/
    condition:
        $pdf at 0 and any of ($base64_ActiveMim*)
}

The regex used to detect characters QWN0aXZlTWlt interspersed with whitespace characters (YARA string $base64_ActiveMim0) has no atoms (for YARA’s Aho-Corasic algorithm) larger than 1 byte, and thus generates a warning, that prohibits its use for hunting with VirusTotal.

That is why I replaced that regex with 21 regexes that all start with 3 fixed bytes and thus allow YARA to select atoms that are large enough.

Quickpost info

Leave a Comment

Quickpost: Analysis of PDF/ActiveMime Polyglot Maldocs

Filed under: maldoc,Malware,My Software,Quickpost — Didier Stevens @ 10:50

jpcert reported a new type of maldoc: “MalDoc in PDF – Detection bypass by embedding a malicious Word file into a PDF file –“.

These maldocs are PDF files that embed a Word document (ActiveMime) in MIME format.

ActiveMime documents can be analyzed by combining my emldump.py tool and oledump.py.

ActiveMime documents were heavily obfuscated in the past, and this is also the case here. As emldump.py version 0.0.11 was only able to handle the obfuscation of 2 of the 3 samples mentioned by jpcert, I released a new version to handle more obfuscation.

Here is an analysis example for sample 5b677d297fb862c2d223973697479ee53a91d03073b14556f421b3d74f136b9d.

Run emldump (version 0.0.12 or later) with option -F to fix the obfuscation of the mime-version header:

To find the part where the ActiveMime file was hidden, use option -E %HEADASCII% to view the first 20 characters of each part:

Here we can see that part 14 is not a JPEG file, but an ActiveMime file.

We extract it and pipe it into oledump.py:

That ActiveMime file contains VBA code:

These maldocs (at least the 3 samples shared by jpcert) can be detected by pdfid with option -e to display extra information:

There are a lot of bytes outside streams (usually for PDFs, there shouldn’t be) and the count of stream and endstream documents is different.

But like I said, these are detections for these 3 samples, it’s possible to modify those samples to remove the anomalies.

Quickpost info

Comments (2)

Update: emldump.py Version 0.0.12

Filed under: My Software,Update — Didier Stevens @ 10:29

This update to emldump.py adds a new feature to fix (-F) some obfuscations.

For the moment, only one obfuscation method is fixed (many are already ignored with option -f –filter), used in polyglot PDF/Word files.

emldump_V0_0_12.zip (http)
MD5: 3847B92460C0485E1238C47C29EF9DE1
SHA256: AFDFB8E78AE7DE56F50EA73D69705B6DACB425FFBD40D6997D64C7C75E3D8A0D

Leave a Comment

Sunday 27 August 2023

Update: sortcanon.py Version 0.0.3

Filed under: My Software,Update — Didier Stevens @ 17:44

Some new options for my tool sortcanon.py to handle more inputs.

A bit of context: when one sorts a list of IPv4 addresses as text, one gets a result as follows. Take this list:

Just sorting this gives this result:

The IPv4 address starting with 185 comes first, because by default, sorting is string based and digit 1 comes before digit 3.

With sortcanon, one can provide a Python function that will be used to interpret the input and achieve the desired sorting. There are a couple of builtin functions, like ipv4. This is the result:

This time, the IPv4 address starting with 185 comes last, because it has the highest most significant byte.

Recently, I had to sort some files where with extra data, like IPv4 addresses with port numbers. Something like this list:

But this did not work:

Because the function that parses IPv4 addresses, does not expect a port number.

I could create a custom function to handle this, but I pursued another solution. I added an option to select the part of the line, that will be used for sorting, with a regular expression. This is done with option -s (select). Like this:

Regular expression “^([^ ]+) ” selects all characters from the beginning of the line (^) until the first space character (excluded). This selection is stored in a capture group (), and the ipv4 sorting function takes this capture group as input, in stead of the complete line.

The list I selected as example, has some duplicate IPv4 addresses:

If I use option -u (unique), duplicate lines are removed:

But of course the lines with identical IPv4 address 53… remain, because the lines themselves are different (different port number).

This is the desired result, most of the time. But I had an exceptional case, where I had to drop duplicate IPv4 addresses, but still keep one port number. This can be done with option –selectoptions u:

sortcanon_V0_0_3.zip (http)
MD5: CF742211DCF5AD893B882658980E6998
SHA256: 44DECFCDCA4966F8A8A2B1172EFA6B706294935C20D6A12C5A68F5D395396A77

Leave a Comment

Saturday 26 August 2023

Overview of Content Published in July

Filed under: Announcement — Didier Stevens @ 8:41

Here is an overview of content I published in July:

Blog posts:

Update: zipdump.py Version 0.0.27

SANS ISC Diary entries:

YARA Error Codes

Leave a Comment

Didier Stevens

Tuesday 29 August 2023

Quickpost: PDF/ActiveMime Maldocs YARA Rule

Quickpost: Analysis of PDF/ActiveMime Polyglot Maldocs

Update: emldump.py Version 0.0.12

Sunday 27 August 2023

Update: sortcanon.py Version 0.0.3

Saturday 26 August 2023

Overview of Content Published in July

Pages

Top Posts

Categories

Blog Stats

Twitter @DidierStevens

Archives