I have a new tool that calculates byte statistics for files, like entropy. I used it recently to help me recover images from a ransomware infection, as described in these SANS ISC Diary entries:
Usage: byte-stats.py [options] [files ...] Calculate byte statistics files: wildcards are supported @file: run command on each file listed in the text file specified Source code put in the public domain by Didier Stevens, no Copyright Use at your own risk https://DidierStevens.com Options: --version show program's version number and exit -h, --help show this help message and exit -m, --man Print manual -d, --descending Sort descending -k, --keys Sort on keys in stead of counts -b BUCKET, --bucket=BUCKET Size of bucket (default is 10240 bytes) -l, --list Print list of bucket property -p PROPERTY, --property=PROPERTY Property to list: encwph -a, --all Print all byte stats -s, --sequence Detect simple sequences -f FILTER, --filter=FILTER Minimum length of sequence for displaying (default 0) Manual: byte-stats is a tool to calculate byte statistics of the content of files. It helps to determine the type or content of a file. Let's start with some examples. all.bin is a 256-byte large file, containing all possible byte values. The bytes are ordered: the first byte is 0x00, the second one is 0x01, the third one is 0x02, ... and the last one is 0xFF. $byte-stats.py all.bin Byte ASCII Count Pct 0x00 1 0.39% 0x01 1 0.39% 0x02 1 0.39% 0x03 1 0.39% 0x04 1 0.39% ... 0xfb 1 0.39% 0xfc 1 0.39% 0xfd 1 0.39% 0xfe 1 0.39% 0xff 1 0.39% Size: 256 File(s) Entropy: 8.000000 NULL bytes: 1 0.39% Control bytes: 27 10.55% Whitespace bytes: 6 2.34% Printable bytes: 94 36.72% High bytes: 128 50.00% First byte-stats.py will display a histogram of byte values found in the file(s). The first column is the byte value in hex (Byte), the second column is its ASCII value, third column tells us how many times the byte value appears (Count) and the last column is the percentage (Pct). This histogram is sorted by Count (ascending). To change the order use option -d (descending), to sort by byte value use option -k (key). By default, the first 5 and last 5 entries of the histogram are displayed. To display all values, use option -a (all). After the histogram, the size of the file(s) is displayed. Finally, the following statistics for the files(s) are displayed: * Entropy (between 0.0 and 8.0). * Number and percentage of NULL bytes (0x00). * Number and percentage of Control bytes (0x01 through 0x1F, excluding whitespace bytes and including 0x7F). * Number and percentage of Whitespace bytes (0x09 through 0x0D and 0x20). * Number and percentage of Printable bytes (0x21 through 0x7E). * Number and percentage of High bytes (0x80 through 0xFF). byte-stats.py will also split the file in equally sized parts (called buckets) and perform the same calculations for these buckets. The default size of a bucket is 10KB (10240 bytes), but can be chosen with option -b (bucket). If the file is smaller than the bucket size, no bucket calculations are performed. If the file size is not an exact multiple of the bucket size, then no calculations are done for the last bucket (because it is incomplete). Here is an example with buckets (file random.bin just contains random bytes): $byte-stats.py random.bin Byte ASCII Count Pct 0xce 242 0.32% 0x14 248 0.33% 0x52 R 251 0.34% 0xba 251 0.34% 0x3e > 256 0.34% ... 0x2e . 332 0.44% 0x45 E 336 0.45% 0xc9 336 0.45% 0x1b 338 0.45% 0x75 u 344 0.46% Size: 74752 Bucket size: 10240 Bucket count: 7 File(s) Minimum buckets Maximum buckets Entropy: 7.997180 7.981543 7.984125 Position: 0x0000f000 0x00005000 NULL bytes: 303 0.41% 34 0.33% 44 0.43% Control bytes: 7888 10.55% 1046 10.21% 1117 10.91% Whitespace bytes: 1726 2.31% 220 2.15% 254 2.48% Printable bytes: 27278 36.49% 3680 35.94% 3812 37.23% High bytes: 37557 50.24% 5096 49.77% 5211 50.89% Besides the file size (74752), the size of the bucket (10240) and the number of buckets (7) is displayed. And next to the entropy and byte counters for the complete file, the entropy and byte counters are calculated for each bucket. The minimum values for the bucket entropy and byte counters are displayed (Minimum buckets), and also the maximum values (Maximum buckets). Position gives the start of the bucket with minimum entropy and maximum entropy in hexadecimal. A significant difference between the overal statistics and bucket statistics can indicate a file that is not uniform in its content. Like in this picture "encrypted" by ransomware: $byte-stats.py picture.jpg.ransom Byte ASCII Count Pct 0x44 D 1172 0.13% 0x16 1310 0.15% 0x22 " 1371 0.16% 0xc2 1421 0.16% 0x17 1437 0.16% ... 0x7a z 7958 0.91% 0x82 8006 0.91% 0x7e ~ 8571 0.98% 0x80 22232 2.53% 0x00 23873 2.72% Size: 877456 Bucket size: 10240 Bucket count: 85 File(s) Minimum buckets Maximum buckets Entropy: 7.815519 5.156678 7.981628 Position: 0x00019000 0x00005000 NULL bytes: 23873 2.72% 8 0.08% 1643 16.04% Control bytes: 92243 10.51% 98 0.96% 1275 12.45% Whitespace bytes: 16241 1.85% 1 0.01% 263 2.57% Printable bytes: 303975 34.64% 2476 24.18% 5219 50.97% High bytes: 441124 50.27% 3728 36.41% 6772 66.13% The entropy for the file is 7.815519 (encrypted or compressed), but there is one part of the file (bucket) with an entropy of (5.156678). This part is not encrypted or compressed. To locate this part, option -l (list) can be used to list the entropy values for each bucket: $byte-stats.py -l picture.jpg.ransom 0x00000000 7.978380 0x00002800 7.979475 0x00005000 7.981628 0x00007800 7.267890 0x0000a000 6.579047 0x0000c800 6.798210 0x0000f000 6.733402 0x00011800 6.496882 0x00014000 5.743983 0x00016800 5.488550 0x00019000 5.156678 0x0001b800 5.330629 0x0001e000 6.057448 0x00020800 6.425884 0x00023000 6.880007 0x00025800 6.856647 ... The bucket starting at position 0x00019000 has the lowest entropy. A list for the other properties (NULL bytes, ...) can be produced by using option -l together with option -p (property). For example options "-l -p n" will produce a list of the number of NULL bytes for each bucket. Option -s (sequence) instructs byte-stats to search for simple byte sequences. A simple byte sequence is a sequence of bytes where the difference (unsigned) between 2 consecutive bytes is a constant. Example: $byte-stats.py -s picture.jpg.ransom Byte ASCII Count Pct 0x44 D 1172 0.13% 0x16 1310 0.15% 0x22 " 1371 0.16% 0xc2 1421 0.16% 0x17 1437 0.16% ... 0x7a z 7958 0.91% 0x82 8006 0.91% 0x7e ~ 8571 0.98% 0x80 22232 2.53% 0x00 23873 2.72% Size: 877456 Bucket size: 10240 Bucket count: 85 File(s) Minimum buckets Maximum buckets Entropy: 7.815519 5.156678 7.981628 Position: 0x00019000 0x00005000 NULL bytes: 23873 2.72% 8 0.08% 1643 16.04% Control bytes: 92243 10.51% 98 0.96% 1275 12.45% Whitespace bytes: 16241 1.85% 1 0.01% 263 2.57% Printable bytes: 303975 34.64% 2476 24.18% 5219 50.97% High bytes: 441124 50.27% 3728 36.41% 6772 66.13% Position Length Diff Bytes 0x00013984: 246 128 0x8000800080008000800080008000800080008000... 0x00013c01: 206 128 0x0080008000800080008000800080008000800080... 0x0001b186: 205 128 0x8000800080008000800080008000800080008000... 0x0001b406: 205 128 0x8000800080008000800080008000800080008000... 0x0001b906: 204 128 0x8000800080008000800080008000800080008000... 0x0001bb86: 204 128 0x8000800080008000800080008000800080008000... 0x0001be06: 200 128 0x8000800080008000800080008000800080008000... 0x0001c086: 200 128 0x8000800080008000800080008000800080008000... 0x0001c306: 200 128 0x8000800080008000800080008000800080008000... 0x0001c586: 196 128 0x8000800080008000800080008000800080008000... Position is the start of the detected sequence, Length is the number of bytes in the sequence, Diff is the difference (unsigned) between 2 consecutive bytes and Bytes displays the hex values of the start of the sequence. By default, the 10 longest sequences are displayed. All sequences (minimum 3 bytes long) can be displayed with option -a. To sort the sequences by position use option -k (key). To filter the sequences by length, use option -f. Sequence detection is useful as an extra check when the entropy and byte counters indicate the file is random: $byte-stats.py -s not-random.bin Byte ASCII Count Pct 0x00 16 0.39% 0x01 16 0.39% 0x02 16 0.39% 0x03 16 0.39% 0x04 16 0.39% ... 0xfb 16 0.39% 0xfc 16 0.39% 0xfd 16 0.39% 0xfe 16 0.39% 0xff 16 0.39% Size: 4096 File(s) Entropy: 8.000000 NULL bytes: 16 0.39% Control bytes: 432 10.55% Whitespace bytes: 96 2.34% Printable bytes: 1504 36.72% High bytes: 2048 50.00% Position Length Diff Bytes 0x00000000: 4096 1 0x000102030405060708090a0b0c0d0e0f10111213...
byte-stats_V0_0_3.zip (https)
MD5: 4287A94EC56E0BF5A936C2A16DA7F2B4
SHA256: 310B15865B332FF62F2C70CE441D322491DB79BC5D1C8D8BBC9A7245005491B5
[…] my byte-stats.py […]
Pingback by byte-stats.py | Didier Stevens Videos — Thursday 12 November 2015 @ 10:56
[…] byte-stats.py […]
Pingback by Overview of Content Published In November | Didier Stevens — Friday 11 December 2015 @ 0:00
[…] par {0x93, 0xfe, 0x56, 0x89, <16 Octects de l’ID>}. En utilisant l’outils byte-stats.py (toujours de notre ami belge), on constate une entropie quasi parfaite du fichier : souvent […]
Pingback by Locky l’épidémie | Cryptobourrin — Friday 19 February 2016 @ 22:37
[…] case I helped with, the file was not completely encrypted. The file had parts with low entropy, as byte-stats.py […]
Pingback by Recovering A Ransomed PDF | Didier Stevens — Tuesday 7 June 2016 @ 0:00
[…] tool byte-stats.py calculates statistics for the files it analyzes. With option -l (and -p) , it produces a list of […]
Pingback by Update: byte-stats.py Version 0.0.7 | Didier Stevens — Friday 3 November 2017 @ 20:59