Didier Stevens

Monday 22 February 2021

re-search.py And Custom Validations

Filed under: My Software — Didier Stevens @ 0:00

My tool re-search.py is a tool that uses regular expressions to search through files. You can use regular expressions from a small builtin library, or provide your own regular expressions.

And these regular expressions can be augmented with extra conditions, like validation with a custom Python function.

I’m going to illustrate this here with a regular expression to match credit card numbers. Credit card numbers have a check digit (calculated with the Luhn algorithm) and I’m going to augment the regular expression to validate the check digit.

I’m using the following regular expression to match credit card numbers (I’m limiting myself to credit card numbers of 16 digits): \b(\d{4}( ?)\d{4}\2\d{4}\2\d{4})\b

This regular expression consist of 4 expressions to match 4 digits “\d{4}”. Each block of 4 digits could be separated with a space character ” ?”.

I’m putting this in a capture group “( ?)” so that I can refer back to this matched group with backreference \2 (it’s the second capture group, because the complete credit card number is also put in a capture group, e.g. the first capture group).

The reason I’m using a backreference to match the first optional space character, is because I want to match the next 2 separating space characters if and only if a first space character was matched. So I want to match (1111222233334444 and 1111 2222 3333 4444, but not 11112222 3333 4444 for example). Either all 4 groups are separated, or none are separated.

Finally, I put this expression in a capture group, and enclose it with a boundary check “\b”. This is to avoid matching credit card numbers that are immediately preceded or followed by letters or digits.

So I can use this regular expression with re-search.py on a test file:

You can see that the first 2 test credit card numbers are identical, except for the last digit: the check digit. So at most one of these 2 can be a valid credit card number.

This can be checked with the Luhn algorithm.

Here is a small Python script to calculate this Luhn check digit:

# 2020/02/06
# https://stackoverflow.com/questions/21079439/implementation-of-luhn-formula

import string

def luhn_checksum(card_number):
    def digits_of(n):
        return [int(d) for d in str(n)]
    digits = digits_of(card_number)
    odd_digits = digits[-1::-2]
    even_digits = digits[-2::-2]
    checksum = 0
    checksum += sum(odd_digits)
    for d in even_digits:
        checksum += sum(digits_of(d*2))
    return checksum % 10

def is_luhn_valid(card_number):
    return luhn_checksum(card_number) == 0

def CCNValidate(ccn):
    return is_luhn_valid(''.join(digit for digit in ccn if digit in string.digits))

Python function luhn_checksum calculates the check digit for an input of digits, and Python function is_luhn_valid return True when the calculate Luhn number matches the check digit.

To use this last function with the regular expression I created, I need another Python function: CCNValidate. This function receives the string matched by the regulator expression, extracts the digits and checks the Luhn check digit.

To let my tool re-search.py call this function CCNValidate when a credit card number is matched by the regular expression, I precede the regular expression with a comment, like this:

(?#extra=P:CCNValidate)\b(\d{4}( ?)\d{4}\2\d{4}\2\d{4})\b

(?#…) is a comment in the regular expression syntax. It is ignored by the parser (i.e. not used for matching). … is the comment itself, which can be anything.

re-search.py interprets this comment: when the comment starts with “extra=”, re-search is dealing with an augmented regular expression. P indicates that a Python function has to be called when the regular expression matches, and CCNValidate is the name of the Python functon to call when the regular expression matches.

All this combined gives me the following command:

You can see that the first credit card number that was matched in the first example, no longer matches: that’s because 6 is not the correct Luhn number for this credit card number.

Besides providing re-search.py with this augmented regular expression, I also need to provide the Python script containing the validation functions: I do this with option –script CCNValidate.py



Sunday 21 February 2021

Update: re-search.py Version 0.0.16

Filed under: My Software,Update — Didier Stevens @ 12:20

This new version of re-search.py, my tool to search files with a builtin library of regular expressions, brings an update to the url and url-domain regexes to match hostnames with underscores (_) and a Python 3 fix.

re-search_V0_0_16.zip (https)
MD5: 21A7096116F50CCA051A152066B2DB50
SHA256: 4A3AC1B1BED68660316011F14EFC84B344BE3FF7E335CDFA8F1AAA2C0D2D06B0

Friday 12 February 2021

Quickpost: oledump.py plugin_biff.py: Remove Sheet Protection From Spreadsheets

Filed under: Malware,My Software,Quickpost — Didier Stevens @ 0:00

My new version of plugin_biff.py has a new option: –hexrecord.

Here I’ll show how I use this to remove the sheet protection from malicious spreadsheets.

If you want to open a malicious spreadsheet (for example with Excel 4 macros) in a sandbox, to inspect its content with Excel, chances are that it is protected.

I’m not talking about encryption (this is something that can be handled with my tool msoffcrypto-crack.py), but about sheet protection.

Enabling sheet protection can be done in Excel as follows:

Although you have to provide a password, that password is not used to derive an encryption key. An .xls file with sheet protection is not encrypted.

If you use my tool oledump.py together with plugin_biff.py, you can select all BIFF records that have the string “protect” in their name or description (-O protect). This will give you different records that govern sheet protection.

First, let’s take a look at an empty, unprotected (and unencrypted) .xls spreadsheet. With option -O protect I select the appropriate records, and with option -a I get an hex/ascii dump of the record data:

We can see that there are several records, and that their data is all NULL (0x00) bytes.

When we do the same for a spreadsheet with sheet protection, we get a different view:

First of all we have 4 extra records, and their data isn’t zero: the flags are set to 1 (01 00 little-endian) and the Protection Password data is AB94. That is the hash of the password (P@ssw0rd) we typed to create this sheet.

To remove this sheet protection, we just need to set all data to 0x00. This is something that can be done with an hex editor.

First use option -R instead of option -a:

This will give you the complete records (type, length and data) in hexadecimal. Next you can search for each record using this hexadecimal data with an hex editor and set the data bytes to 0x00.

Searching for the first record 120002000100:

Setting the data to 0x00: 0100 -> 0000

Do this for the 4 records, and then save the spreadsheet under a different name (keep the original intact).

Now you can open the spreadsheet, and the sheet protection is gone. You can now unhide hidden sheets for example.

Quickpost info

Wednesday 10 February 2021

Update: oledump.py Version 0.0.59

Filed under: My Software,Update — Didier Stevens @ 0:00

This new version of oledump.py has a small change in the XML detection logic, and adds options –hexrecord and –xordeobfuscate to plugin plugin_biff.py.


oledump_V0_0_59.zip (https)
MD5: 89CC85EDADA0BB6978A75BA37065A65D
SHA256: BE62B45AE20D3BF5B3C335742F08067297079F6B8431A5CC82401BF67BFA50F6

Monday 1 February 2021

Overview of Content Published in January

Filed under: Announcement — Didier Stevens @ 0:00

Here is an overview of content I published in January:

Blog posts:

YouTube videos:

Videoblog posts:

SANS ISC Diary entries:

Sunday 31 January 2021

New Tool: pdftool.py

Filed under: My Software,PDF — Didier Stevens @ 0:00

pdftool.py is a new tool I developed. This version has only one command: iu (incremental updates).

With this command, one can check if a PDF has incremental updates, and then select different versions of this PDF with incremental updates.

A PDF with incremental updates, is a PDF that has been modified by appending changes to the document at the end of the PDF file, without modifying the original content.

Here is a video explaining incremental updates and the use of my tool.

I reference 2 blog posts in the video: “Solving a Little PDF Puzzle” and “Shoulder Surfing a Malicious PDF Author“.

pdftool_V0_0_1.zip (https)
MD5: ED2BBE886008C737CC06E22F4F0FE8A1
SHA256: 401E88FBFAEC4382A50FE59430D04FE6111F9911958AB09BA7530C26043FDA87

Thursday 28 January 2021

Update: XORSelection.1sc Version 6.0

Filed under: 010 Editor,Encryption,Malware,My Software — Didier Stevens @ 0:00

I released an update to my 010 Editor script XORSelection.1sc.

010 is a binary editor with a scripting engine. XORSelection.1sc is a script I wrote years ago, that will XOR-encode a (partial) file open in the editor.

The first version just accepted a printable, arbitrary-length string as XOR-key. Later versions accepted an hexadecimal key too, and introduced various options.

With version 6.0, I add support for a dynamic XOR-key. That is a key that changes while it is being used. It can change, one byte at-a-time, before or after each XOR operation at byte-level is executed.

Hence option cb means change before, and ca means change after. Watch this video to understand exactly how the key changes (if you want to skip the part explaining my script XORSelection, you can jump directly to the dynamic XOR-key explanation).


I made this update to my XORSelection script, because I had to “manually” decode a Cobalt Strike beacon that was XOR-encoded with a changing XOR key (it is part of a WebLogic server attack). Later I included this decoding in my Cobalt Strike beacon analysis tool 1768.py.

The decoding shellcode is in the first 62 bytes (0x3E) of the file:

After the shellcode comes the XOR-key, the size and the beacon:

We can decode the beacon size, that is XOR-encoded with key 0x3F0882FB, as follows. First we select the bytes to be decoded:

Then we launch 010 Editor script XORSelection.1sc:

Provide the XOR key (prefix 0x is to indicate that the key is provide as hexadecimal byte values):

And then, after pressing OK, the bytes that contain the beacon size are decoded by XOR-ing them with the provided key:

This beacon size (bytes 00 14 04 00) is a little-endian, 32-bit integer: 0x041400.

To decode the beacon, we select the encoded beacon and launch script XORSelection.1sc again:

This time, we need to provide an option to change the XOR-decoding process. We press OK without entering a value, this will make the next prompt appear, where we can provide options:

The option we need to use to decode this Cobalt Strike beacon, is cb: change before.

In the next prompt, we can provide the XOR-key:

And we end up with the decoded beacon (you can see parts of the PE file that is the beacon):

Remark that you can enter “h” at the option prompt, to get a help screen:

I made this video explaining how to use this new option, and also explaining how the XOR key is changed exactly when using option change before (cb) or change after (ca).

If you want to skip the part explaining my script XORSelection, you can jump directly to the dynamic XOR-key explanation.

XORSelection_V6_0.zip (https)
MD5: C1872C275B59E236906D38B2302F3F4B
SHA256: 1970A506299878FAC2DDD193F9CE230FD717854AC1C85554610DDD95E04DE9E9


Sunday 24 January 2021

Update: strings.py Version 0.0.7

Filed under: My Software,Update — Didier Stevens @ 0:00

This new version brings an update to the Pascal feature of strings.py, my tool to extract strings from arbitrary files.

I had to analyze compiled Lua code (compiled with Lua 5.2): Lua 5.2 byte code stores strings like C strings and Pascal strings.

The strings are terminated by a NULL byte, like C strings, and they are prefixed with a length counter, like Pascal strings. Since the length includes the NULL byte, my strings.py tool didn’t match compiled Lua 5.2 strings:

I need to subtract 1 from the counter, so that it matches the length of the string without NULL byte. This can now be done as follows:


strings_V0_0_7.zip (https)
MD5: 2533BF3E7CBD5526718CDE5E150039D2
SHA256: FFBE686A2E41B22858023898580419806A789349D408C24EF25E8BEBCD33A418

Saturday 23 January 2021

Update: re-search.py Version 0.0.15

Filed under: My Software,Update — Didier Stevens @ 0:00

This is a new version of my tool to search with regular expression, adds a -F (–filter) option to filter search results.

re-search_V0_0_15.zip (https)
MD5: E68D42F9F943335961C12BED7AD459A7
SHA256: 47F837C198CC3033B9C07086EA4FD0484BC40CE850723B4F6A849FB237D9A7E0

Friday 22 January 2021

Update: re-search.py Version 0.0.14

Filed under: My Software,Update — Didier Stevens @ 21:34

This is a new version of my tool to search with regular expression, that adds a new regular expression to the embedded dictionary: detection of domain names that end with a valid TLD:

re-search_V0_0_14.zip (https)
MD5: 53CDB34174E6EFE211872D6BC64533CC
SHA256: 3F55E6EA7272BFC780E159BA886932F96DC055CF533B0B3C3A5CCBAF0229682E

Next Page »

Blog at WordPress.com.