Didier Stevens

Monday 21 October 2019

Quickpost: ExifTool, OLE Files and FlashPix Files

Filed under: Forensics,maldoc,Malware,Quickpost — Didier Stevens @ 0:00

ExifTool can misidentify VBA macro files as FlashPix files.

The binary file format of Office documents (.doc, .xls) uses the Compound File Binary Format, what I like to refer as OLE files. These files can be analyzed with my tool oledump.py.

Starting with Office 2007, the default file format (.docx, .docm, .xlsx, …) is Office Open XML: OOXML. It’s in essence a ZIP container with XML files inside. However, VBA macros inside OOXML files (.docm, .xlsm) are not stored as XML files, they are still stored inside an OLE file: the ZIP container contains a file with name vbaProject.bin. That is an OLE file containing the VBA macros.

This can be observed with my zipdump.py tool:

oledump.py can look inside the ZIP container to analyze the embedded vbaProject.bin file:

And of course, it can handle an OLE file directly:

When ExifTool is given a vbaProject.bin file for analysis, it will misidentify it as a picture file: a FlashPix file.

That’s because when ExifTool doesn’t have enough metadata or an identifying extension to identify an OLE file, it will fall back to FlashPix file detection. That’s because FlashPix files are also based on the OLE file format, and AFAIK ExifTool started out as an image tool:

That is why on VirusTotal, vbaProject.bin files from OOXML files with macros, will be misidentified as FlashPix files:

When the extension of a vbaProject.bin file is changed to .doc, ExifTool will misidentify it as a Word document:

ExifTool is not designed to identify VBA macro files (vbaProject.bin). These files are not Office documents, neither pictures. But since they are also OLE files, ExifTool tries to guess what they are, based on the extension, and if that doesn’t help, it falls back to the FlashPix file format (based on OLE).

There’s no “bug” to fix, you just need to be aware of this particular behavior of ExifTool: it is a tool to extract information from media formats, when it analyses an OLE file and doesn’t have enough metadata/proper file extension, it will fall back to FlashPix identification.

 


Quickpost info


Tuesday 15 October 2019

PowerShell, Add-Type & csc.exe

Filed under: .NET,Forensics — Didier Stevens @ 0:00

Have you ever noticed that some PowerShell scripts result in the execution of the C# compiler csc.exe?

This happens when a PowerShell script uses cmdlet Add-Type.

Like in this command:

powershell -Command “Add-Type -TypeDefinition \”public class Demo {public int a;}\””

This command just adds the definition of a class (Demo) with one member (a).

When this Add-Type cmdlet is executed, the C# compiler is invoked by PowerShell to compile this class definition (a C# program) into an assembly (DLL) with the .NET type to be used by the PowerShell script.

A temporary file (oj5zlfcy.cmdline in this example) is created inside folder %appdata%\local\temp with extension .cmdline. This is passed as argument to the invoked C# compiler csc.exe, and contains directions to compile a C# program (oj5zlfcy.0.cs):

/t:library /utf8output /R:”System.dll” /R:”C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Management.Automation\v4.0_3.0.0.0__31bf3856ad364e35\System.Management.Automation.dll” /R:”System.Core.dll” /out:”C:\Users\testuser1\AppData\Local\Temp\oj5zlfcy.dll” /debug- /optimize+ /warnaserror /optimize+ “C:\Users\testuser1\AppData\Local\Temp\oj5zlfcy.0.cs”

The C# program (oj5zlfcy.0.cs in this example) contains the class definition passed as argument to cmdlet Add-Type:

public class Demo {public int a;}

Both these files start with a UTF-8 BOM (EF BB BF).

The C# compiler (csc.exe) can invoke compilation tools when necessary, like the resource compiler cvtres.exe.

This results in the creation of several temporary files:

All these files are removed when cmdlet Add-Type terminates.

 

Wednesday 25 July 2018

Extracting DotNetToJScript’s PE Files

Filed under: Forensics,Malware — Didier Stevens @ 0:00

I added a new option (-I, –ignorehex) to base64dump.py to make the extraction of the PE file inside a JScript script generated with DotNetToJScript a bit easier.

DotNetToJScript is James Forshaw‘s “tool to generate a JScript which bootstraps an arbitrary .NET Assembly and class”.

Here is an example of a script generated by James’ tool:

The serialized .NET object is embedded as a string concatenation of BASE64 strings, assigned to variable serialized_obj.

With re-search.py, I extract all strings from the script (e.g. strings delimited by double quotes):

The first 3 strings are not part of the BASE64 encoded object, hence I get rid of them (there are no unwanted strings at the end):

And now I have BASE64 characters, I just have to get rid of the doubles quotes and the newlines (base64dump searches for continuous strings of BASE64 characters). With base64dump‘s -w option I can get rid of whitespace (including newlines), and with option -i I can get rid of the double-quote character. Unfortunately, escaping of this character (\”) works on Windows, but then cmd.exe gets confused for the next pipe (it expects a closing double-quote). That’s why I introduced option -I, to specify characters with their hexadecimal value. Double-quote is 0x22, thus I use option -I 22:

This is the serialized object, and it contains the .NET assembly I want to analyze. .NET assemblies are .DLLs, e.g. PE files. With my YARA rule to detect PE files, I can find it inside the serialized data:

A PE file was found, and it starts at position 0x04C7. I can cut this data out with option -c:

Another method to find the start of the PE file, is to use a cut expression that searches for ‘MZ’, like this:

If there is more than one instance of string MZ, different cut-expressions must be tried to find the real start of the PE file. For example, this is the cut-expression to select data starting with the second instance of string MZ: -c “[‘MZ’]2:”

It’s best to pipe the cut-out data into pecheck, to validate that it is indeed a PE file:

pecheck also helps with finding the length of the PE file (with the given cut-expression, I select all data until the end of the serialized data).

Remark that there is an overlay (bytes appended to the end of the PE file), and that it starts at position 0x1400. Since I don’t expect an overlay in this .NET assembly, the overlay is not part of the PE file, but it is part of the serialization meta data.

Hence I can cut out the PE file precisely like this:

This PE file can be saved to disk now for reverse-engineering.

I have not read the .NET serialization format specification, but I can make an educated guess. Right before the PE file, there is the following data:

Remark the first 4 bytes (5 bytes before the beginning of the PE file): 00 14 00 00. That’s 0x1400 as a little-endian 32-bit integer, exactly the length of the PE file, 5120 bytes:

So that’s most likely another method to determine the length of the PE file.

 

Friday 29 December 2017

Cracking Encrypted PDFs – Conclusion

Filed under: Encryption,Forensics,Hacking,PDF — Didier Stevens @ 0:00

TL;DR: PDFs protected with 40-bit keys can not guarantee confidentiality, even with strong passwords. When you protect your PDFs with a password, you have to encrypt your PDFs with strong passwords and use long enough keys. The PDF specification has evolved over time, and with it, the encryption options you have. There are many encryption options today, you are no longer restricted to 40-bit keys. You can use 128-bit or 256-bit keys too.

There is a trade-off too: the more advanced encryption option you use, the more recent the PDF reader must be to support the encryption option you selected. Older PDF readers are not able to handle 256-bit AES for example.

Since each application capable of creating PDFs will have different options and descriptions for encryption, I can not tell you what options to use for your particular application. There are just too many different applications and versions. But if you are not sure if you selected an encryption option that will use long enough keys, you can always check the /Encrypt dictionary of the PDF you created, for example with my pdf-parser (in this example /Length 128 tells us a 128-bit key is used):

Or you can use QPDF to encrypt an existing PDF (I’ll publish a blog post later with encryption examples for QPDF).

But don’t use 40-bit keys, unless confidentiality is not important to you:

I first showed (almost 4 years ago) how PDFs with 40-bit keys can be decrypted in minutes, using a commercial tool with rainbow tables. This video illustrates this.

Later I showed how this can be done with free, open source tools: Hashcat and John the Ripper. But although I could recover the encryption key using Hashcat, I still had to use a commercial tool to do the actual decryption with the key recovered by Hashcat.

Today, this is no longer the case: in this series of blog posts, I show how to recover the password, how to recover the key and how to decrypt with the key, all with free, open source tools.

Overview of the complete blog post series:

 

Thursday 28 December 2017

Cracking Encrypted PDFs – Part 3

Filed under: Encryption,Forensics,Hacking,PDF — Didier Stevens @ 0:00

I performed a brute-force attack on the password of an encrypted PDF and a brute-force attack on the key of (another) encrypted PDF, both PDFs are part of a challenge published by John August.

The encryption key is derived from the password. it’s not just based on the password only, but also on metadata. This implies that different PDFs encrypted with the same user password, will have different encryption keys.

When you recover the user password of an encrypted PDF, you can just use it with PDF readers like Adobe Reader: they will ask you for the password, you provide it and the PDF will be decrypted and rendered.

But when you recover the key of an encrypted PDF, you can not use it with PDF reader: there is no feature that will allow you to input a key in stead of a password. The only method I knew to decrypt a PDF document with its encryption key, was to use Elcomsoft’s PDF cracking tool:

Now I worked out a second method: I modified the source code of QPDF so that it will accept encryption keys too. It’s a quick and dirty hack, I did not add a new option to QPDF but I “hijacked” the –password option. If the value to the option –password starts with string “key:”, then QPDF will not derive the key from the provided password, but it will use the key provided as hexadecimal characters. Here is how I use it to decrypt the “tough” PDF:

I also made a small modification to the –show-encryption option, to display the encryption key:

Update: I had an email exchange with Jay Berkenbilt, the author of QPDF, and he will look into this patch and possibly add a new key option to QPDF.

If you are interested in my modified version of QPDF, you can find the modified source code files and Windows binaries here:

qpdf-patched.zip (https)
MD5: 57E1A5A232E12B45D0A927181A1E8C3B
SHA256: 6F17E095B38AE72F229A6662216DDCE86057D2BA1C567B07FEF78B8A93413495

Update: this is the complete blog post series:

Wednesday 27 December 2017

Cracking Encrypted PDFs – Part 2

Filed under: Encryption,Forensics,Hacking,PDF — Didier Stevens @ 0:00

After cracking the “easy” PDF of John’s challenge, I’m cracking the “tough” PDF (harder_encryption).

Using the same steps as for the “easy” PDF, I confirm the PDF is encrypted with a user password using 40-bit encryption, and I extract the hash.

Since the password is a long random password, a brute-force attack on the password like I did in the first part will take too long. That’s why I’m going to perform a brute-force attack on the key: using 40-bit encryption means that the key is just 5 bytes long, and that will take about 2 hours on my machine. The key is derived from the password.

I’m using hashcat again, but this time with hash mode 10410 in stead of 10400.
This is the command I’m using:

hashcat-4.0.0\hashcat64.exe --potfile-path=harder_encryption.pot -m 10410 -a 3 -w 3 "harder_encryption - CONFIDENTIAL.hash" ?b?b?b?b?b

I’m using the following options:

  • –potfile-path=harder_encryption.pot : I prefer using a dedicated pot file, but this is optional
  • -m 10410 : this hash mode is suitable to crack the key used for 40-bit PDF encryption
  • -a 3 : I perform a brute force attack (since it’s a key, not a password)
  • -w 3 : I’m using a workload profile that is supposed to speed up cracking on my machine
  • ?b?b?b?b?b : I’m providing a mask for 5 bytes (I want to brute-force keys that are 40 bits long, i.e. 5 bytes)

And here is the result:

The recovered key is 27ce78c81a. I was lucky, it took about 15 minutes to recover this key (again, using GPU GeForce GTX 980M, 2048/8192 MB allocatable, 12MCU). Checking the complete keyspace whould take a bit more than 2 hours.

Now, how can we decrypt a PDF with the key (in stead of the password)? I’ll explain that in the next blog post.

Want a hint? Take a look at my Tweet!

Update: this is the complete blog post series:

Tuesday 26 December 2017

Cracking Encrypted PDFs – Part 1

Filed under: Encryption,Forensics,Hacking,PDF — Didier Stevens @ 17:15

In this series of blog posts, I’ll explain how I decrypted the encrypted PDFs shared by John August (John wanted to know how easy it is to crack encrypted PDFs, and started a challenge).

Here is how I decrypted the “easy” PDF (encryption_test).

From John’s blog post, I know the password is random and short. So first, let’s check out how the PDF is encrypted.

pdfid.py confirms the PDF is encrypted (name /Encrypt):

pdf-parser.py can tell us more:

The encryption info is in object 26:

From this I can conclude that the standard encryption filter was used. This encryption method uses a 40-bit key (usually indicated by a dictionary entry: /Length 40, but this is missing here).

PDFs can be encrypted for confidentiality (requiring a so-called user password /U) or for DRM (using a so-called owner password /O). PDFs encrypted with a user password can only be opened by providing this password. PDFs encrypted with a owner password can be opened without providing a password, but some restrictions will apply (for example, printing could be disabled).

QPDF can be used to determine if the PDF is protected with a user password or an owner password:

This output (invalid password) tells us the PDF document is encrypted with a user password.

I’ve written some blog posts about decrypting PDFs, but because we need to perform a brute-force attack here (it’s a short random password), this time I’m going to use hashcat to crack the password.

First we need to extract the hash to crack from the PDF. I’m using pdf2john.py to do this. Remark that John the Ripper (Jumbo version) is now using pdf2john.pl (a Perl program), because there were some issues with the Python program (pdf2john.py). For example, it would not properly generate a hash for 40-bit keys when the /Length name was not specified (like is the case here). However, I use a patched version of pdf2john.py that properly handles default 40-bit keys.

Here’s how we extract the hash:

This format is suitable for John the Ripper, but not for hashcat. For hashcat, just the hash is needed (field 2), and no other fields.

Let’s extract field 2 (you can use awk instead of csv-cut.py):

I’m storing the output in file “encryption_test – CONFIDENTIAL.hash”.

And now we can finally use hashcat. This is the command I’m using:

hashcat-4.0.0\hashcat64.exe --potfile-path=encryption_test.pot -m 10400 -a 3 -i "encryption_test - CONFIDENTIAL.hash" ?a?a?a?a?a?a

I’m using the following options:

  • –potfile-path=encryption_test.pot : I prefer using a dedicated pot file, but this is optional
  • -m 10400 : this hash mode is suitable to crack the password used for 40-bit PDF encryption
  • -a 3 : I perform a brute force attack (since it’s a random password)
  • ?a?a?a?a?a?a : I’m providing a mask for 6 alphanumeric characters (I want to brute-force passwords up to 6 alphanumeric characters, I’m assuming when John mentions a short password, it’s not longer than 6 characters)
  • -i : this incremental option makes that the set of generated password is not only 6 characters long, but also 1, 2, 3, 4 and 5 characters long

And here is the result:

The recovered password is 1806. We can confirm this with QPDF:

Conclusion: PDFs protected with a 4 character user password using 40-bit encryption can be cracked in a couple of seconds using free, open-source tools.

FYI, I used the following GPU: GeForce GTX 980M, 2048/8192 MB allocatable, 12MCU

Update: this is the complete blog post series:

Monday 10 July 2017

Select Parent Process from VBA

Filed under: Forensics,Hacking,maldoc,Malware,My Software — Didier Stevens @ 0:00

Years ago I wrote a C program to create a new process with a chosen parent process: selectmyparent. And recently I showed what process monitor and system monitor report when you use this tool.

Starting a new process with a chosen parent process can be done from VBA too, as shown in this video (I’m not sharing the VBA code):

Monday 20 March 2017

That Is Not My Child Process!

Filed under: Forensics,Hacking — Didier Stevens @ 0:00

Years ago I released a tool to create a Windows process with selected parent process: SelectMyParent.

You can not blindly trust parent-child process relations in Windows: the parent of a process can be different from the process that created that process.

Here I start selectmyparent from cmd.exe to launch notepad.exe with parent explorer.exe (PID 328):

Process Explorer reports explorer.exe as the parent (and not selectmyparent.exe):

Process Monitor also reports explorer.exe as the parent:

If we look in the call stack of the process creation of notepad.exe, we see 2 frames (6 and 7) with unknown modules:

We should see entries in the call stack for explorer.exe if notepad.exe was started by explorer.exe, but we don’t.

The <unknown> module is actually selectmyparent.exe.

0x11b1461 is the address of the instruction after the call to _main in ___tmainCRTStarup in selectmyparent.exe.

0x11b12a8 is the address of the instruction after the call to CreateProcessW in _main in selectmyparent.exe.

 

System Monitor also reports explorer.exe as the parent:

Finally, Volatility’s pstree command also reports explorer.exe as the parent:

Monday 30 January 2017

Quickpost: Dropbox & Alternate Data Streams

Filed under: Forensics,Quickpost,Reverse Engineering — Didier Stevens @ 0:00

When I got this popup while moving a file from a Dropbox folder, I immediately thought Alternate Data Stream:

20170124-221042

I ran my filescanner on the file, and found an ADS with name com.dropbox.attributes:

20170128-094319

From the Magic HEX value, we can see that the content of the stream (frozen-sea-foam.mp4:com.dropbox.attributes) starts with 0x78 (and the streamsize is 83 bytes). 0x78 hints at zlib deflated data.

If you are not that familiar with magic values, you can use my file-magic tool:

20170128-224649

Trying to decompress the ADS with translate.py gives us JSON data {“dropbox_fileid_local”: {“machineid_attr”: {“data”: “aa4xliox7z5n0qewxOlT3Q==”}}}:

20170128-102645

The data field looks like BASE64, so let’s try to decode it with base64dump.py:

20170128-104110

It decodes with BASE64 to data that looks random. From the names in the JSON data, we can deduce that this is probably a machine ID.

Remark 1: as it could well be my unique machine ID, I altered the value of the ID.

Remark 2: my file-magic.py tool is beta.

Remark 3: if you wonder what the video frozen-sea-foam is, I have it on Instragram.

 


Quickpost info


« Previous PageNext Page »

Blog at WordPress.com.