Didier Stevens

Tuesday 8 February 2022

Windows Explorer: Improper Exif Data Removal

Filed under: Vulnerabilities,Forensics,Windows 7 — Didier Stevens @ 20:53

Windows explorer has an option to remove properties from media files: “Remove Properties and Personal Information”. For example, removing Exif data from JPEG files.

There is an issue with this feature: it does not properly remove Exif data.

Within an open folder (Windows explorer), select a media file (I’m using Canon_40D.jpg), right-click and select properties:

Select Details:

Then click “Remove Properties and Personal Information”:

When you click OK, a new file will be created: Canon_40D – Copy.jpg (I renamed this file to Canon_40D-redacted-W11.jpg, because I tested this first on my Windows 11 machine).

File Canon_40D.jpg contains Exif data pertaining to the camera, like its maker and model:

File Canon_40D-redacted-W11.jpg (the redacted version of file Canon_40D.jpg) contains less Exif data: the maker and model properties have been removed:

Looking at the redacted file with binary editor 010 Editor, I noticed that these properties had not been completely removed. Let me explain.

JPEG files are composed of segments of data, these segments can be analyzed with my tool jpegdump.py.

Here is the output for file Canon_40D.jpg (original file), I’m using option -E to include the SHA256 hash of the data of each segment:

And here is the output for file Canon_40D-redacted-W11.jpg (redacted file):

Notice that all the hashes of the segments are identical, except for the third segment, APP1. This segment contains the Exif data. This means that only the Exif data has changed, nothing else, like the picture itself.

Segments APP1 of both pictures have the same size, 2476 bytes. Although properties have been removed, Microsoft Explorer’s removal feature did not shrink the segment.

When I open the original file (Canon_40D.jpg) with 010 Editor, a template for JPEG files is automatically used to parse the structure of the JPEG file. This can be seen in the Template Results below the hexadecimal dump:

The JPG template is also able to parse Exif data: I drilled down in the template hierarchy, until I found the Exif properties (circled in red). There are 11 properties, the first is Make (tag 0x010F) and the second one is Model (tag 0x0110).

Opening the template DIRENTRY structure for property Make reveals the following fields:

Remark that the string “Canon”, that is the string value of the Make property, is not contained inside the DIRENTRY structure for said property. What it does contain, is the size of the string (6 bytes) and an offset to the string itself (an offset of 146 bytes). Literal string values (StrAscii structures) are stored inside the Exif data structure, after the list of DIRENTRY structures.

It is the same for the Model property:

The Model DIRENTRY structure points to an ASCII string (StrAscii), size is 4 bytes and offset is 152 bytes.

Let’s take a look now, again with 010 Editor, at the cleaned file that was created by clicking “Remove Properties and Personal Information” (Canon_40D-redacted-W11.jpg):

Notice that this file has 8 DIRENTRY structures in stead of 11: 3 Exif properties have been removed (Make, Model & Software). And the StrAscii structures for these 3 properties do not appear in the template result.

However, these 3 StrAscii string values are still inside the APP1 segment:

They are at the exact same location as in the original file (Canon_40D.jpg):

Conclusion: when you use Windows Explorer’s “Remove Properties and Personal Information” feature, Exif properties will be removed, but if these are string properties, they are not completely removed.

Windows Explorer’s “Remove Properties and Personal Information” feature removes DIRENTRY structures (properties), but does not remove StrAscii structures (properties’ string values).

When the Exif data of JPEG files cleaned with this feature is viewed, the orphaned strings will not appear. But when they are viewed with a binary editor, these strings do appear. And of course, they can also be easily visualized with a strings utility (here I’m using my strings utility strings.py):

You will not know to which properties these strings belong, because that information has been erased (DIRENTRY structures). But here, just the string values themselves, are enough to know that this is a Canon camera and that GIMP software was used to produce the final picture.

In case you want to test this yourself and try to reproduce my findings, you can download file Canon_40D.jpg from here. The file I created by using Windows Explorer’s “Remove Properties and Personal Information” feature, Canon_40D-redacted-W11.jpg, has a SHA256 of 8B190028D0F9F2A6F7EDB1DC0955008D73173C32C19C32CE62372C7163EE1223. I tested this on a fully patched Windows 10 machine (21H2) and a fully patched Windows 11 machine. The results where completely identical.

And as I know that some remaining Windows 7 users will want to know if Windows 7 is also affected: a fully patched Windows 7 machine has the same issue (though the cleaned file was different from the W10/W11 file).

If you absolutely want to make sure that all metadata is gone from your media files, do not use Windows Explorer (for the moment). Use another tool. Ideally, use a tool that completely removes the segments containing metadata (APP1, APP2, …).

I did inform Microsoft about this issue.

Wednesday 29 December 2021

VBA: __SRP_ Streams

Filed under: Forensics,maldoc — Didier Stevens @ 0:00

Office documents with a VBA project that contains streams whose name starts with __SRP_, have had their VBA macros executed at least once.

As Dr. Bontchev describes in the documentation for his pcodedmp tool:

When the p-code has been executed at least once, a further tokenized form of it is stored elsewhere in the document (in streams, the names of which begin with __SRP_, followed by a number).

Thus in my maldoc trainings, I always explain that the presence of __SRP_ streams is an indication that the VBA code has been executed prior to the saving of the document, and vice-versa, that the absence means that the code was not executed (prior to saving).

I recently discovered that these __SRP_ streams are also created when the VBA project is compiled (without running the macros), by selecting menu option “Debug / Compile Project” in the VBA IDE.

Sunday 26 September 2021

Patching A Java .class File

Filed under: 010 Editor,Forensics,Hacking,Malware — Didier Stevens @ 0:00

010 Editor is one of few commercial applications that I use daily. It’s a powerful binary editor with scripting and templates.

I recently had to patch a Java .class file: extend a string inside that class. Before going the route of decompiling / editing / recompiling, I tried with 010 Editor.

Here is the file opened inside the editor:

When opening the file, 010 Editor recognized the .class extension and installed and ran the template for .class files. That’s what I wanted to know: is there a template for .class files? Yes, there is!

Here is how you can apply a template manually, in case the file extension is not the original extension:

And this is how the template results look like:

Under the hex/ascii dump, the template results are displayed: a set of nested fields that match the internal structure of .class file. For example, the first field I selected here, u4 magic, is the magic header of a .class file: CAFEBABE.

The string I want to extend is this one:

I need to extend string “1.2 (20210922)”. Into something like “1.2 (20210922a)”.

Doing so will make the string longer, thus I need to add a byte to the file (trivial), but I also need to make sure that the binary structure of .java files remain valid: for example, if there is something in that structure like a field length, I need to change the field length too.

I’m not familiar with the internal structure of .class files, that why I’m using 010 Editor’s .class template, hoping that the template will make it clear to me what needs to be changed.

To find the template result field I need to modify, I position my cursor on the string I want to modify inside the ASCII dump, I right-click and select “Jump To Template Variable”:

Which selects the corresponding template variable:

So my cursor was on the 10th byte (bytes[9]) of the string, which is part of template variable cp_info constant_pool[27]. From that I gather that the string I want to modify is inside a pool of constants.

I can select that template variable:

And here I can see which bytes inside the .class file were selected. It’s not only the string, but also bytes that represent the tag and length. The length is 14, that’s indeed the length of the string I want to extend. Since I want to add 1 character, I change the length from 14 to 15: I can do that inside the template results by double-clicking the value 14, I don’t need to make that change inside the hexdump:

Next I need to add a character to the string. I can do that in the ASCII dump:

I have to make sure that the editor is in insert mode (INS), so that when I type characters, they are inserted at the cursor, in stead of overwriting existing bytes:

And then I can type my extra character:

So I have changed the constant string I wanted to change. Maybe there are more changes to make to the internal structure of this .class file, like other length fields … I don’t know. But what I do as an extra check is: save the modified file and run the template again. It runs without errors, and the result looks good.

So I guess there are no more changes to make, and I decide to tryout my modified .class file and see what happens: it works, so there are no other changes to make.

Saturday 27 March 2021

FileZilla Uses PuTTY’s Registry Fingerprint Cache

Filed under: Encryption,Forensics,Networking — Didier Stevens @ 10:01

Today I figured out that FileZilla uses PuTTY‘s registry key (HKCU\SOFTWARE\SimonTatham\PuTTY\SshHostKeys) to cache SSH fingerprints.

This morning, I connected to my server over SFTP with FileZilla, and got this prompt:

That’s unusual. I logged in over SSH, and my SSH client did not show a warning. I checked the fingerprint on my server, and it matched the one presented by FileZilla.

What’s going on here? I started to search through FileZilla configuration files (XML files) looking for the cached fingerprints, and found nothing. Then I went to the registry, but there’s no FileZilla entry under my HKCU Software key.

Then I’m taking a look with ProcMon to figure out where FileZilla caches its fingerprints. After some searching, I found the answer:

FileZilla uses PuTTY’s registry keys!

And indeed, when I start FileZilla again and allow it to cache the key, it appears in PuTTY’s registry keys.

One last check: I modified the registry entry and started FileZilla again:

And now FileZilla warns me that the key is different. That confirms that FileZilla reads and writes PuTTY’s registry fingerprint cache.

So that answered my question: “Why did FileZilla warn me this morning?” “Because the key was not cached”.

But then I was left with another question: “Why is the key no longer cached, because it was cached?”

Well, I started to remember that some days ago today, I had been experimenting with PuTTY’s registry keys. I most likely deleted that key (PuTTY is not my default SSH client). I verified the last-write timestamp for PuTTY’s registry key, and indeed, 4 days ago it was last written to.


Thanks to Nicolas for pointing out that fzsftp is based on PuTTY:

Friday 12 March 2021

Quickpost: “ProxyLogon PoC” Capture File

Filed under: Forensics,Networking,Quickpost,Vulnerabilities — Didier Stevens @ 18:43

I was able to get the “ProxyLogon PoC” Python script running against a vulnerable Exchange server in a VM. It required some tweaks to the code, and also a change in Exchange permissions, as explained in this tweet by @irsdl.

I created a capture file:

More details will follow.

Update: I added a second capture file (proxylogon-poc-capture-with-keys-and-webshell.pcapng), this one includes a request to the webshell that was installed.

proxylogon-poc-capture-with-keys_V2.zip (https)
MD5: A005AC9CCE0F833C99B5113E79005C7D
SHA256: AA092E099141F8A09F62C3529D8B27624CD11FF348738F78CA9A1E657F999755

Quickpost info

Wednesday 15 April 2020

Analyzing Malformed ZIP Files

Filed under: Forensics,maldoc,My Software — Didier Stevens @ 0:00

With version 0.0.16 (we are now at version 0.0.18), I updated my zipdump.py tool to handle (deliberately) malformed ZIP files. My zipdump tool uses Python’s ZIP module to analyze ZIP files.

Now, zipdump has a an option (-f) to scan arbitrary binary files for ZIP records.

I will show here how this feature can be used, by analyzing a sample Xavier Mertens wrote a diary entry about. This sample is a Word document with macros, an OOXML (Office Open XML format) file (.docm). It is malformed, because 1) there’s an extra byte at the beginning and 2) there’s a byte missing at the end.

When you use my zipdump tool to look at the file, you get an error:

Using option -f l (list), we can find all PKZIP records inside arbitrary, binary files:

When using option -f with value l, a listing will be created of all PKZIP records found in the file, plus extra data. Some of these entries in this report will have an index, that can be used to select the entry.

In this example, 2 entries can be selected:

p: extra bytes at the beginning of the file (prefix)

1: an end-of-central-directory record (PK0506 end)

Using option -f p, we can select the prefix (extra data at the beginning of the file) for further analysis:

And from this hex/ascii dump, we learn that there is one extra byte at the beginning of the ZIP file, and that it is a newline characters (0x0A).

Using option -f 1, we can select the EOCD record to analyze the ZIP file:

As this generates an error, we need to take a closer look at the EOCD record by adding option -i (info):

With this info, we understand that the missing byte makes that the comment length field is one byte short, and this causes the error seen in previous image.

ZIP files can contain comments (for the ZIP container, and also for individual files): these are stored at the end of the PKZIP records, preceded by a 2-byte long, little-endian integer. This integer is the length of the comment. If there is no comment, this integer is zero (0x00).

Hence, the byte we are missing here is a NULL (0x00) byte. We can append a NULL byte to the sample, and then we should be able to analyze the ZIP file. In stead of modifying the sample, I use my tool cut-bytes.py to add a single NULL byte to the file (suffix option: -s #h#00) and then pipe this into zipdump:

File 5 (vbaProject.bin) contains the VBA macros, and can be piped into oledump.py:

I also created a video:

zipdump_v0_0_18.zip (https)
MD5: 34DC469E8CD4E5D3E9520517DEFED888
SHA256: 270B26217755D7ECBCB6D642FBB349856FAA1AE668DB37D8D106B37D062FADBB

Tuesday 28 January 2020

etl2pcapng: Support For Process IDs

Filed under: Forensics,Networking — Didier Stevens @ 0:00

You can start a packet capture on a vanilla Windows machine with command “netsh trace start capture=yes” (and end it with “netsh trace stop”).

This packet capture file, with extension .etl, can not be opened with Wireshark. Until recently, I used Microsoft’s Message Analyzer, but this tool is no longer supported and installation files have been removed from Microsoft’s site.

In comes etl2pcapng, a new open-source utility from Microsoft that converts an .etl file to .pcapng format:

Utility that converts an .etl file containing a Windows network packet capture into .pcapng format“.

I contributed to version 1.3.0 of etl2pcapng, by adding a comment containing the Process ID to each packet. etl files contain metadata (like the PID of the process associated with the network traffic) that got lost when translating to pcapng format. As the pcapng format has no option to store the PID for each packet, but it supports packet comments, I stored the PID inside packet comments:

Notice this warning by Microsoft:

The output pcapng file will have a comment on each packet indicating the PID of the current process when the packet was logged. WARNING: this is frequently not the same as the actual PID of the process which caused the packet to be sent or to which the packet was delivered, since the packet capture provider often runs in a DPC (which runs in an arbitrary process). The user should keep this in mind when using the PID information.

Monday 21 October 2019

Quickpost: ExifTool, OLE Files and FlashPix Files

Filed under: Forensics,maldoc,Malware,Quickpost — Didier Stevens @ 0:00

ExifTool can misidentify VBA macro files as FlashPix files.

The binary file format of Office documents (.doc, .xls) uses the Compound File Binary Format, what I like to refer as OLE files. These files can be analyzed with my tool oledump.py.

Starting with Office 2007, the default file format (.docx, .docm, .xlsx, …) is Office Open XML: OOXML. It’s in essence a ZIP container with XML files inside. However, VBA macros inside OOXML files (.docm, .xlsm) are not stored as XML files, they are still stored inside an OLE file: the ZIP container contains a file with name vbaProject.bin. That is an OLE file containing the VBA macros.

This can be observed with my zipdump.py tool:

oledump.py can look inside the ZIP container to analyze the embedded vbaProject.bin file:

And of course, it can handle an OLE file directly:

When ExifTool is given a vbaProject.bin file for analysis, it will misidentify it as a picture file: a FlashPix file.

That’s because when ExifTool doesn’t have enough metadata or an identifying extension to identify an OLE file, it will fall back to FlashPix file detection. That’s because FlashPix files are also based on the OLE file format, and AFAIK ExifTool started out as an image tool:

That is why on VirusTotal, vbaProject.bin files from OOXML files with macros, will be misidentified as FlashPix files:

When the extension of a vbaProject.bin file is changed to .doc, ExifTool will misidentify it as a Word document:

ExifTool is not designed to identify VBA macro files (vbaProject.bin). These files are not Office documents, neither pictures. But since they are also OLE files, ExifTool tries to guess what they are, based on the extension, and if that doesn’t help, it falls back to the FlashPix file format (based on OLE).

There’s no “bug” to fix, you just need to be aware of this particular behavior of ExifTool: it is a tool to extract information from media formats, when it analyses an OLE file and doesn’t have enough metadata/proper file extension, it will fall back to FlashPix identification.


Quickpost info

Tuesday 15 October 2019

PowerShell, Add-Type & csc.exe

Filed under: .NET,Forensics — Didier Stevens @ 0:00

Have you ever noticed that some PowerShell scripts result in the execution of the C# compiler csc.exe?

This happens when a PowerShell script uses cmdlet Add-Type.

Like in this command:

powershell -Command “Add-Type -TypeDefinition \”public class Demo {public int a;}\””

This command just adds the definition of a class (Demo) with one member (a).

When this Add-Type cmdlet is executed, the C# compiler is invoked by PowerShell to compile this class definition (a C# program) into an assembly (DLL) with the .NET type to be used by the PowerShell script.

A temporary file (oj5zlfcy.cmdline in this example) is created inside folder %appdata%\local\temp with extension .cmdline. This is passed as argument to the invoked C# compiler csc.exe, and contains directions to compile a C# program (oj5zlfcy.0.cs):

/t:library /utf8output /R:”System.dll” /R:”C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Management.Automation\v4.0_3.0.0.0__31bf3856ad364e35\System.Management.Automation.dll” /R:”System.Core.dll” /out:”C:\Users\testuser1\AppData\Local\Temp\oj5zlfcy.dll” /debug- /optimize+ /warnaserror /optimize+ “C:\Users\testuser1\AppData\Local\Temp\oj5zlfcy.0.cs”

The C# program (oj5zlfcy.0.cs in this example) contains the class definition passed as argument to cmdlet Add-Type:

public class Demo {public int a;}

Both these files start with a UTF-8 BOM (EF BB BF).

The C# compiler (csc.exe) can invoke compilation tools when necessary, like the resource compiler cvtres.exe.

This results in the creation of several temporary files:

All these files are removed when cmdlet Add-Type terminates.


Wednesday 25 July 2018

Extracting DotNetToJScript’s PE Files

Filed under: Forensics,Malware — Didier Stevens @ 0:00

I added a new option (-I, –ignorehex) to base64dump.py to make the extraction of the PE file inside a JScript script generated with DotNetToJScript a bit easier.

DotNetToJScript is James Forshaw‘s “tool to generate a JScript which bootstraps an arbitrary .NET Assembly and class”.

Here is an example of a script generated by James’ tool:

The serialized .NET object is embedded as a string concatenation of BASE64 strings, assigned to variable serialized_obj.

With re-search.py, I extract all strings from the script (e.g. strings delimited by double quotes):

The first 3 strings are not part of the BASE64 encoded object, hence I get rid of them (there are no unwanted strings at the end):

And now I have BASE64 characters, I just have to get rid of the doubles quotes and the newlines (base64dump searches for continuous strings of BASE64 characters). With base64dump‘s -w option I can get rid of whitespace (including newlines), and with option -i I can get rid of the double-quote character. Unfortunately, escaping of this character (\”) works on Windows, but then cmd.exe gets confused for the next pipe (it expects a closing double-quote). That’s why I introduced option -I, to specify characters with their hexadecimal value. Double-quote is 0x22, thus I use option -I 22:

This is the serialized object, and it contains the .NET assembly I want to analyze. .NET assemblies are .DLLs, e.g. PE files. With my YARA rule to detect PE files, I can find it inside the serialized data:

A PE file was found, and it starts at position 0x04C7. I can cut this data out with option -c:

Another method to find the start of the PE file, is to use a cut expression that searches for ‘MZ’, like this:

If there is more than one instance of string MZ, different cut-expressions must be tried to find the real start of the PE file. For example, this is the cut-expression to select data starting with the second instance of string MZ: -c “[‘MZ’]2:”

It’s best to pipe the cut-out data into pecheck, to validate that it is indeed a PE file:

pecheck also helps with finding the length of the PE file (with the given cut-expression, I select all data until the end of the serialized data).

Remark that there is an overlay (bytes appended to the end of the PE file), and that it starts at position 0x1400. Since I don’t expect an overlay in this .NET assembly, the overlay is not part of the PE file, but it is part of the serialization meta data.

Hence I can cut out the PE file precisely like this:

This PE file can be saved to disk now for reverse-engineering.

I have not read the .NET serialization format specification, but I can make an educated guess. Right before the PE file, there is the following data:

Remark the first 4 bytes (5 bytes before the beginning of the PE file): 00 14 00 00. That’s 0x1400 as a little-endian 32-bit integer, exactly the length of the PE file, 5120 bytes:

So that’s most likely another method to determine the length of the PE file.


Next Page »

Blog at WordPress.com.