Some time ago I had to figure out if a file was embedded inside another file.
It’s not a file carving problem. I had both files. I just needed to be sure that file A was contained inside file B.
With a hex editor I could find parts of file A inside file B, but it looked like file A was split up and scattered at different locations in file B.
I Googled a bit for a tool, but nothing came up, so I wrote my own Python program.
With my new tool I was able to get assured that file msi49.tmp was inside file c8400.msi:
You can see that file msi49.tmp is one contiguous sequence inside file c8400.msi starting at position 0x3A7200.
But I was more interested to know if file msi49.tmp was also inside file Cisco_Jabber.msi:
And you can see it is, but not as one contiguous sequence. It’s split in 3 sequences.
This tool can also be used to find a downloaded file inside a pcap/pcapng file. I downloaded AnalyzePESig_V0_0_0_2.zip while taking a Wireshark capture.
Or to find a file opened by an application. Here I look into the process dump:
The only limitation is that both files need to be read into memory. But when I’ve time, I’ll turn this into a plugin for the Volatility framework.
The program looks for sequences of at least 10 bytes long (this is an option). If your file is divided in sequences smaller than 10 bytes, then my program will not find the embedded file. Unless you lower the minimum length, but don’t go as low as 1 byte, because then you’re likely to be finding random data.
I’m not 100% sure that my program will find all possible cases of embedded files. No problem if it’s one contiguous sequence, or several sequences in logical order. But I’ve to review my algorithm to be sure it will also find all possible cases of embedded files with sequences in random order. I think it will, but I need to prove it.