Didier Stevens

FileScanner

FileScanner is a command-line Windows program that I use to scan disks, folders and files.

It provides information about files and, when present, their Alternate Data Streams (ADS). That information is both based on metadata and file content.

One can create simple rules to detect files based on their content.

There’s an help option:

And an embedded man page:

At the end of this page, a download link and links to older FileScanner blog posts can be found.

Manual:

Usage: FileScanner [options] [@]filename|folder|?f: ...
Version 0.0.0.7
 -m: Manual
 -o filename: Output to file filename
 -O: Output to file with generated filename, format FileScanner-COMPUTERNAME-YYYYMMDD-HHmmSS.csv
 -g: Generate output filename with option -o, format FileScanner-KEYWORD-YYYYMMDD-HHmmSS.csv
 -C folder: Copy output file to folder
 -M folder: Move output file to folder
 -s: Recurse subdirectories
 -l: Follow links (follow link when directory is a reparse point)
 -v: CSV output
 -f: Full file read
 -a analysisfile: Use rules in analysisfile and select rule-matching files only
 -A analysisfile: Use rules in analysisfile and select all files
 -r rule: Provide a single rule via the command-line
 -t timequota: Stop after timequota (expressed in seconds) is reached
 -7: ASCII (7-bits) output
 -e .ext: Scan only files with extension .ext (lowercase)

FileScanner is a Windows program to scan files. It accepts files and folders to scan as command-line arguments.
Here is an example of FileScanner scanning itself: FileScanner-crt-x86.exe FileScanner-crt-x86.exe
Output:

Filename:                             FileScanner-crt-x86.exe
Streamname:                           
ADS:                                  0
Rulename:                             
Filename lowercase:                   filescanner-crt-x86.exe
Basename:                             filescanner-crt-x86
Extension:                            .exe
Filesize:                             318464
Creation time:                        2021/07/06 20:30:51
Last write time:                      2021/07/10 15:36:11
Last access time:                     2021/07/10 16:09:05
Owner name:                           DESKTOP\testuser1
File attributes:                      20
File attributes decode:               A
Magic HEX:                            4d5a9000
Magic ASCII:                          MZ..
EFS Status:                           0

In this example, FileScanner collects metadata and reads at most 4 bytes of the file(s) to scan and their Alternate Data Streams (ADS).
A file is identified by its Filename (the Streamname is empty), and an ADS is identified by its Filename and Streamname. If a file has ADSs, field ADS reports the number of streams (0 if there are no ADSs).
Several fields in this report are self-explanatory. The timestamp fields are in UTC and format YYYY/MM/DD HH:mm:SS.
Field owner name is the file owner username. An SID is used when it can not be translated to a username.
File attributes is the hexadecimal value of a file's attributes, and File attributes decode is the text representation of a file's attributes.
Magic HEX is the first 4 bytes (at most) of the content of the file in hexadecimal
Magic ASCII is the first 4 bytes (at most) of the content of the file in ASCII
EFS Status is a flag indicating if the file is encrypted (using EFS) or not.

When option -f (Full file read) is used, the file is completely read and the following extra fields are added:
Here is an example using option -f: FileScanner-crt-x86.exe -f FileScanner-crt-x86.exe
Output:

Filename:                             FileScanner-crt-x86.exe
Streamname:                           
ADS:                                  0
Rulename:                             
Filename lowercase:                   filescanner-crt-x86.exe
Basename:                             filescanner-crt-x86
Extension:                            .exe
Filesize:                             318464
Creation time:                        2021/07/06 20:30:51
Last write time:                      2021/07/10 15:36:11
Last access time:                     2021/07/10 16:09:05
Owner name:                           DESKTOP\testuser1
File attributes:                      20
File attributes decode:               A
Magic HEX:                            4d5a9000
Magic ASCII:                          MZ..
EFS Status:                           0
MD5:                                  fe0a07a62d86c5b479f041c0e3fe0878
SHA1:                                 edd825c342bc90d112fcbb54e61172f39fc3e379
Entropy:                              6.49784
Null bytes:                           60961
Control bytes:                        33493
Whitespace bytes:                     8097
Printable bytes:                      96375
High bytes:                           124658

MD5 and SHA1 are the hashes of the filecontent, and Entropy is the entropy of the filecontent (value between 0.0 and 8.0).
Null bytes is the counter for all bytes equal to 0x00.
Control bytes is the counter for all bytes between 0x01 and 0x1F (included), excluding whitespace, and bytes equal to 0x7F.
Whitespace bytes is the counter for all bytes between 0x09 and 0x0D (included) and bytes equal to 0x20.
Printable bytes is the counter for all bytes between 0x21 and 0x7E (included).
High bytes is the counter for all bytes between 0x80 and 0xFF (included).

FileScanner can produce CSV output by using option -v.

Output is written to the console. It can be written to a file instead, by using option -o filename. For example, option -o report.txt will write all output to file report.txt.
Using option -O, output will be written to a file with filename generated by FileScanner (unlike option -o, you don't provide a filename). The format of the generated filename is: FileScanner-COMPUTERNAME-YYYYMMDD-HHmmSS.csv. COMPUTERNAME is the name of the computer where FileScanner is running, and YYYYMMDD-HHmmSS is the local time when FileScanner was started.
Option -g is used together with option -o to generate a filename with a chosen keyword. In this case, the value used with -o is not intertpreted as a filename, but as a keyword to generated a filename of the following format: FileScanner-KEYWORD-YYYYMMDD-HHmmSS.csv KEYWORD is the value of option o.
For example, command 'FileScanner-crt-x86.exe -o case_alpha -g FileScanner-crt-x86.exe' results in the generation of a report file with filename FileScanner-case_alpha-20210710-185731.csv.
Option -g can be used to clearly identify report files, by including a clear identifier, like a case name (case_alpha in this example).
When an output option is used (-o, -g, -O) a filecounter is displayed on the console to indicate scan progress.

When FileScanner is given folders as argument, it will scan the files contained in said folders. It will not scan files in subfolders, unless option -s is used.
By default, FileScanner will not follow links, unless option -l is used. When this option is used, links of directories that are a reparse point are followed.

Once FileScanner is done with scanning and the report is completed, it can be copied or moved to a centralized folder, for example a file share on a server. This is done with options -C or -M. For example, 'FileScanner-crt-x86.exe -v -s -O -M \\fileserver\reports C:\'
When this command is executed on different machines in a domain, their reports will be centralized in share reports on the fileserver.

To limit the duration of a scan, option -t timequota can be used: the timequota is an integer that is inerpreted as seconds. For example, -t 3600 will limit a scan to one hour (3600 seconds). Remark that exceeding of the timequota is not checked while a file is scanned, but prior to scanning of the next file.

The output of FileScanner is unicode (UTF8). To force pure ASCII output, use option -7 (7 bits).

Scanning of files can be limited to filenames with a particular extension. Use option -e to chose an extension. The extension must be lowercase and start with a dot (.). For example, use option -e .dll to limit scanning to all files with extension .dll.

FileScanner can be given one or more filenames and folders as argument to scan. It is also possible to provide a textfile with filenames to scan. Use that filename as argument and prefix it with @ (at). For example, command 'FileScanner-crt-x86.exe @list.txt' will scan all files listed in textfile list.txt. For the textfile: use one line per filename; lines starting with # are ignored (this can be used to add comments to the file list).

FileScanner can also be passed argument ?f: to scan all fixed drives on the computer it is running.

FileScanner can also use rules to detect files. Rules examine the content of a file, and trigger when the conditions are met. The rules syntax is specific to FileScanner.
Option -r can be used to pass a rule to detect files. For example:
FileScanner-crt-x86.exe -r MZ:start:4D5A c:\Windows
This command will include all files in its report that start with hexadecimal byte values 4D5A, or MZ (simplified: Windows executables start with MZ).
The syntax of a rule is: RULENAME:RULETYPE:conditions.
The name of a rule (RULENAME) can be any identifier, but must be unique when multiple rules are used.
When a rule triggers, its name is added to the Rule field of the corresponding file/stream.
The type of a rule (RULETYPE) can be:
 start
 content
 icontent
 md5
 sizemd5
 sizemin
 sizemax
 and
 or

A 'start' rule is a rule that looks at the beginning of the content of a file, and triggers if there is a match. The beginning of the content of the file is matched with the condition. This condition is the hexadecimal representation of the bytes to match.
Only the necessary bytes of a file are read with a start rule: the complete file is not read, unless option -f is used.
It is possible to provide ASCII data for the condition of a rule, in stead of hexadecimal. Prefix the condition with str=, like this:
FileScanner-crt-x86.exe -r MZ:start:str=MZ c:\Windows
This rule is identical to the previous rule, where an hexadecimal representation was used to specify the condition.
UNICODE can also be provided, by using prefix uni=. Remark that this only converts pure ASCII to a UNICODE representation (16-bit little-endian), and that it can not be used with non-ASCII characters. To match non-ASCII characters, the hexadecimal representation must be used.

A 'content' rule is a rule that scans the content of a file, until a match is found. This rules results in the complete scan (read) of a file when no match is found (regardless of option -f). The condition is similar to the start rule.

An 'icontent' rule is a rule that scans the content of a file, until a match is found This match is not case-sensitive. This rules results in the complete scan (read) of a file when no match is found (regardless of option -f). The condition is similar to the start rule.

An 'md5' rule is a rule that computes the MD5 hash of the content of a file, and triggers when it matches the condition. This rules results in the complete scan (read) of a file (regardless of option -f). The condition is an MD5 hash in hexadecimal.

A 'sizemd5' rule is a rule that computes the MD5 hash of the content of a file, only if its size matches, and triggers when it matches the condition. This rules results in the complete scan (read) of a file (regardless of option -f). The condition is an MD5 hash in hexadecimal.
Example: FileScanner-crt-x86.exe -r NOTEPAD:sizemd5:202240:423d3ade2f14572c5bd5f546973eb493 c:\Windows
Rule NOTEPAD looks for files with size 202240 and MD5 hash 423d3ade2f14572c5bd5f546973eb493 in the c:\Windows folder.
A sizemd5 rule can be used to quickly scan a complete disk of a computer, for example looking for malware based on its hash. Since only files that have the correct filesize have their MD5 hashes calculated, this rule us much faster than an ordinary md5 rule that results in the hashing of all files.

Rules 'sizemin' and 'sizemax' can be used to detect files with a given minimum or maximum size. These types of rules are often used in combination with other rules.

It is possible to create rules that trigger depending on the result of other rules.
This is done with 'and' and 'or' composite rules. An and rule triggers when all its dependend rules trigger, and an or rule triggers when one (or more) of its dependend rules trigger
Composite rules do short-circuit evaluation. For an and rule, checking of the remaining dependend rules stops when a dependend rule is false. For an or rule, checking of the remaining dependend rules stops when a dependend rule is true.
To use composite rules, several rules need to be defined, which is not possible with option -r.

When more than one rule needs to be defined, a textfile must be created containing the rules (one per line), and options -a or -A are used to pass this textfile to FileScanner.
Here is an example with textfile rules-example.txt, containing 3 rules:

 MZ:start:str=MZ
 SIZE:sizemin:100000
 MZ100kb:and:MZ SIZE

One rule (MZ) to detect Windows executables, a second rule (SIZE) to detect files of at least 100.000 bytes long and a last rule (MZ100kb) that triggers when MZ and SIZE rules trigger.
Use this command to apply these rules to the files scanned by FileScanner: FileScanner-crt-x86.exe -v -a rules-example.txt c:\Windows

If you run this example, you will notice that you don't get the desired results: rule MZ100kb never triggers. That is because FileScanner, by default, will cease applying rules to a file once one of the rules has triggered. So if rule MZ triggers for a given file, rules SIZE and MZ100kb will not be applied to that file, e.g. they will not be able to trigger.
It is possible to change this behavior of FileScanner, by including directive 'exhaustive' in the rule file, like this:

 exhaustive
 MZ:start:str=MZ
 SIZE:sizemin:100000
 MZ100kb:and:MZ SIZE

When running the same command with this modified rule file, you will notice that rule MZ100kb triggers this time: it detects Windows executables of at least 100.000 bytes in the c:\windows folder.
But you will also notice that other files are selected: files that only trigger rules MZ and/or SIZE.
If you only want the results of a composite rule, and not of its dependend rules, prefix the name of the depending rules with a dollar sign ($). Rules whose name starts with a $ behave like other rules, except that they are invoked when other rules depend on them, and the result of these rules is not included in the report.
Let's apply this to our example:

 exhaustive
 $MZ:start:str=MZ
 $SIZE:sizemin:100000
 MZ100kb:and:$MZ $SIZE

When running the same command with this modified rule file, you will notice that only rule MZ100kb is included in the report. This time, you only see files that are Windows executables of at least 100.000 bytes in the c:\windows folder.

Rule files may contain comment lines: they start with a # character.

To provide a rule file to FileScanner, option -a is used. Only files that trigger a rule, are included in the report. To include all files in the report, regardless of rule triggering, use option -A in stead of option -a.
Since many rule types result in the complete scan of a file, it takes little time to also calculate metadata of rules while rules are being applied to them (what takes time, is reading the file). Thus it can often be usefull to combine options -A and -f when rules are used that scan the complete file: while the complete content of a file is scanned, FileScanner takes the opportunity to calculate metadata like the MD5 and SHA1 hash of said file.
By combining these options, one receives a report with metadata for all scanned files, and rule identifiers for files that triggered rules.

Selecting all files regardless of rules, done with option -A, can also be achieved by using directive 'selectallfiles'.
When directive selectallfiles is used in a rule file, all files are selected, even when option -a is used in stead of option -A.
Directives in a rule file take precedence over command line options.

There are more directives:
 fullfileread -> this is the directive that corresponds with option -f.
 extension= -> this is the directive that corresponds with option -e.
 copyreport= -> this is the directive that corresponds with option -C.
 movereport= -> this is the directive that corresponds with option -M.

FileScanner can take advantage of the backup-privilege: when FileScanner is executed by an elevated administrator (or any elevated user with the backup-privilege), the backup-privilege is used to access all files, even files that are inaccessible due to permissions.

There are different versions of FileScanner, with the following names:
 FileScanner-x86.exe
 FileScanner-x64.exe
 FileScanner-crt-x86.exe
 FileScanner-crt-x64.exe
 FileScanner-crt-auto-x86.exe
 FileScanner-crt-auto-x64.exe

32-bit versions of FileScanner have x86 in their name, 64-bit versions have x64 in their name.
All versions of FileScanner with crt in their name, have the C-runtime linked into the PE-file. These versions will function even when the corresponding C-runtime is not installed (missing Visual C++ redistributable).
All versions of FileScanner with auto in their name, will execute automatically. This means they start to scan when executed. They also contain a manifest to elevate.
The options used for this automatic scan are: all fixed drives are scanned with an embedded rule file and a report is created (same naming convention as with option -O).
The embedded rule file is:

# Last change 2021/07/06
# exhaustive: process all rules, don't stop after the first match
exhaustive
# fullfileread: read the full content of files
fullfileread
# selectallfiles: also report files that didn't trigger rules
selectallfiles
# extension=.txt: limit search to file with .txt extension
# rules:
PK:start:str=PK
$META:icontent:str=MANIFEST.MF
JAR:and:PK $META
CLASS:start:CAFEBABE
MZ:start:4D5A
PDF:start:str=%PDF-
OLE:start:D0CF11E0
RAR:start:526172211A07
$ATTRIBUT:content:00417474726962757400
OLE-VBA:and:OLE $ATTRIBUT
CAB:start:str=MSCF
ARJ:start:EA60
JFIF:start:FFD8FF
PNG:start:89504E47
$VBAPROJECT:content:str=vbaProject.bin
PK-VBA:and:PK $VBAPROJECT
$MSGATTACHMENT:content:uni=__attach_version1.0_
OLE-MSG-ATTACHMENT:and:OLE $MSGATTACHMENT;

It is also possible to rename the executable file of FileScanner, to achieve some automation.
For example, by making a copy of FileScanner-crt-x86.exe with name filescanner-auto-rule-NOTEPAD-sizemd5-202240-423d3ade2f14572c5bd5f546973eb493.exe, you have created a version of FileScanner that will automatically scan all fixed drives of the computer it runs on, using a single rule: NOTEPAD:sizemd5:202240:423d3ade2f14572c5bd5f546973eb493.
Rules can be encoded in the filename of the executable like this, provided the separator : (colon) is replaced by a - (dash).
When filescanner-auto-rule-NOTEPAD-sizemd5-202240-423d3ade2f14572c5bd5f546973eb493.exe is executed, the following output is produced:
 FileScanner auto mode
 Drives: C:\ D:\

Such an 'auto' version can be used to prepare a FileScanner, that can be given to an inexperienced user, to launch a scan on machines. The only thing that user has to do, is launch that version of FileScanner (e.g., double-click it), wait for the scan to terminate, and send you the report.

Besides encoding a rule in the filename of FileScanner, it is also possible to encode the filename of a rule file.
For example, by making a copy of FileScanner-crt-x86.exe with name filescanner-auto-analysisfile-incident-analysis-file.exe, you have created a version of FileScanner that will automatically scan all fixed drives of the computer it runs on, using a rule file with name incident-analysis-file (without extension).
That FileScanner file and file incident-analysis-file can be given to an inexperienced user, to launch a scan on machines. The only thing that user has to do, is launch that version of FileScanner (e.g., double-click it), wait for the scan to terminate, and send you the report.

Finally, it is also possible to create a version of FileScanner embedding your own rule file.
This can be done by replacing the embedded rule file in the source code, and recompiling FileScanner.
Or by editing the embedded rule file of FileScanner-crt-auto-x86.exe or FileScanner-crt-auto-x64.exe with a binary editor.
Search for a long line of dashes (in UNICODE) inside the PE file: that is the start of the embedded rule file (in UNICODE, UTF16-LE). Replace that content with your rule file, NULL terminated, making sure not to overflow the existing embedded rule file.

Blog posts from 2014: part 1, part 2, part 3 and part 4.

FileScanner_V0_0_0_8.zip (http)
MD5: 20201A4336F3E5298896EE0962C6C287
SHA256: F0EAE8F989A65509EE2AC793EB23C3FED3F333D10C62C30FF047EE45CD308190

Blog at WordPress.com.