Didier Stevens

Tuesday 25 November 2014

Update: find-file-in-file.py Version 0.0.4

Filed under: Forensics,My Software,Update — Didier Stevens @ 22:05

Here is the version I talked about in my Bitcoin virus posts.

It also has an embedded man page (use option –man).

find-file-in-file_v0_0_4.zip (https)
MD5: CD381616158BD233D94B368554B824C6
SHA256: FD5C4E3EC99371754E58B93D3D96CBA7A86C230C47FC9C27C9B871ED8BFB9149

Man page:

Usage: find-file-in-file.py [options] file-contained file-containing […]
Find if a file is present in another file

Arguments:
file-containing can be a single file, several files, and/or @file
@file: run the command on each file listed in the text file specified
wildcards are supported
batch mode is enabled when more than one file is specified

Source code put in the public domain by Didier Stevens, no Copyright
Use at your own risk
https://DidierStevens.com

Options:
–version             show program’s version number and exit
-h, –help            show this help message and exit
-m MINIMUM, –minimum=MINIMUM
Minimum length of byte-sequence to find (default 10)
-o, –overlap         Found sequences may overlap
-v, –verbose         Be verbose in batch mode
-p, –partial         Perform partial search of contained file
-O OUTPUT, –output=OUTPUT
Output to file
-b RANGEBEGIN, –rangebegin=RANGEBEGIN
Select the beginning of the contained file (by default
byte 0)
-e RANGEEND, –rangeend=RANGEEND
Select the end of the contained file (by default last
byte)
-x, –hexdump         Hexdump of found bytes
-q, –quiet           Do not output to standard output
–man                 Print manual

Manual:

find-file-in-file is a program to test if one file (the contained
file) can be found inside another file (the containing file).

Here is an example.
We have a file called contained-1.txt with the following content:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
and have a file called containing-1.txt with the following content:
0000ABCDEFGHIJKLM1111NOPQRSTUVWXYZ2222

When we execute the following command:
find-file-in-file.py contained-1.txt containing-1.txt

We get this output:
0x00000004 0x0000000d (50%)
0x00000015 0x0000000d (50%)
Finished

This means that the file contained-1.txt was completely found inside
file containing-1.txt At position 0x00000004 we found a first part
(0x0000000d bytes) and at position 0x00000015 we found a second part
(0x0000000d bytes).

We can use option hexdump (-x) to see which bytes were found:
find-file-in-file.py -x contained-1.txt containing-1.txt
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000015 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished

The containing file may contain the contained file in an arbitrary
order, like file containing-2.txt:
0000NOPQRSTUVWXYZ1111ABCDEFGHIJKLM2222

Example:
find-file-in-file.py -x contained-1.txt containing-2.txt
0x00000015 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000004 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished

The containing file does not need to contain the complete contained
file, like file containing-3.txt:
0000ABCDEFGHIJKLM1111

Example:
find-file-in-file.py -x contained-1.txt containing-3.txt
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
Remaining 13 (50%)

The message “Remaining 13 (50%)” means that the last 13 bytes of the
contained file were not found in the containing file (that’s 50% of
the contained file).

If the contained file starts with a byte sequence not present in the
containing file, nothing will be found. Example with file
contained-2.txt:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ

Nothing is found:
find-file-in-file.py -x contained-2.txt containing-1.txt
Remaining 36 (100%)

If you know how long that initial byte sequence is, you can skip it.
Use option rangebegin (-b) to specify the position in the contained
file from where you want to start searching.
Example:

find-file-in-file.py -x -b 10 contained-2.txt containing-1.txt
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000015 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished

If you want to skip bytes at the end of the contained file, use option
rangeend (-e).

If you don’t know how long that initial byte sequence is, you can
instruct find-file-in-file to “brute-force” it. With option partial
(-p), one byte at a time will be removed from the beginning of the
contained file until a match is found.
Example:

find-file-in-file.py -x -p contained-2.txt containing-1.txt
File: containing-1.txt (partial 0x0a)
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000015 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished

“(partial 0x0a)” tells you that the first 10 bytes of the contained
file were skipped before a match was found.

There are some other options:
-m minimum: find-file-in-file will search for byte sequences of 10
bytes long minimum. If you want to change this minimum, use option -m
minimum.
-o overlap: find-file-in-file will not let byte sequences overlap. Use
option -o overlap to remove this restriction.
-v verbose: be verbose in batch mode (more than one containing file).
-O output: besides writing output to stdout, write the output also to
the given file.
-q quiet: do not output to stdout.

1 Comment »

  1. Thanks very much, Didier. This is going to be good tool for me. (From time to time, I have to investigate PostScript files and determine if they are de-optimized badly: they frequently contain fonts, images and other resources multiple times were containing them just once would be good enough. So no “forensic” use-case at all 🙂

    Comment by kurt — Tuesday 2 December 2014 @ 8:04


RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: