It looks for files that start with %PDF- (this header can be obfuscated) and contain string QWN0aXZlTWlt (string ActiveMim in BASE64), possibly obfuscated with whitespace characters.
The regex used to detect characters QWN0aXZlTWlt interspersed with whitespace characters (YARA string $base64_ActiveMim0) has no atoms (for YARA’s Aho-Corasic algorithm) larger than 1 byte, and thus generates a warning, that prohibits its use for hunting with VirusTotal.
That is why I replaced that regex with 21 regexes that all start with 3 fixed bytes and thus allow YARA to select atoms that are large enough.
These maldocs are PDF files that embed a Word document (ActiveMime) in MIME format.
ActiveMime documents can be analyzed by combining my emldump.py tool and oledump.py.
ActiveMime documents were heavily obfuscated in the past, and this is also the case here. As emldump.py version 0.0.11 was only able to handle the obfuscation of 2 of the 3 samples mentioned by jpcert, I released a new version to handle more obfuscation.
Some new options for my tool sortcanon.py to handle more inputs.
A bit of context: when one sorts a list of IPv4 addresses as text, one gets a result as follows. Take this list:
Just sorting this gives this result:
The IPv4 address starting with 185 comes first, because by default, sorting is string based and digit 1 comes before digit 3.
With sortcanon, one can provide a Python function that will be used to interpret the input and achieve the desired sorting. There are a couple of builtin functions, like ipv4. This is the result:
This time, the IPv4 address starting with 185 comes last, because it has the highest most significant byte.
Recently, I had to sort some files where with extra data, like IPv4 addresses with port numbers. Something like this list:
But this did not work:
Because the function that parses IPv4 addresses, does not expect a port number.
I could create a custom function to handle this, but I pursued another solution. I added an option to select the part of the line, that will be used for sorting, with a regular expression. This is done with option -s (select). Like this:
Regular expression “^([^ ]+) ” selects all characters from the beginning of the line (^) until the first space character (excluded). This selection is stored in a capture group (), and the ipv4 sorting function takes this capture group as input, in stead of the complete line.
The list I selected as example, has some duplicate IPv4 addresses:
If I use option -u (unique), duplicate lines are removed:
But of course the lines with identical IPv4 address 53… remain, because the lines themselves are different (different port number).
This is the desired result, most of the time. But I had an exceptional case, where I had to drop duplicate IPv4 addresses, but still keep one port number. This can be done with option –selectoptions u: