Didier Stevens

Thursday 12 March 2026

Update: zipdump.py Version 0.0.34

Filed under: My Software,Update — Didier Stevens @ 0:00

This update adds option forcedecompress when using options -f and -s.

More info: Analyzing “Zombie Zip” Files (CVE-2026-0866).

zipdump_v0_0_35.zip (http)
MD5: F4A48AE14C1B258D688BF61D9ACF5E54
SHA256: 8DF7B3EBA282A0391AD619AD33A5F77CD25CC0FDA760E116934DD953714A27C5

9 Comments »

  1. Good afternoon, Great work on a great tool, I must say)) But there is one thing that surprises me.

    We know that there is no strict indication in a ZIP archive that it is necessary to set the encoding of the names of files and folders stored in the archive. In this regard, many archivers work on the principle of “Unpack according to the OS locale”. And in theory, this is most often true. But when we analyze the file, we try to find the reason for WHAT is broken. In the archive, so that it can be fixed, we need to display information about objects in the console ;). And that’s where your script breaks. In principle, it does not seem to be able to change the encoding of file/folder names when output to the console. All output is in raw bytes.

    It is clear that such a conclusion has a place in this life, but still, first of all, I would like to have an option like: –name-encoding CP1200 among the supported options of your script. Is it possible to see this?

    Comment by Anonymous — Saturday 6 June 2026 @ 16:53

  2. Are you doing this on Windows and are you seeing squares with question marks inside? Then it means the shell’s font can’t render the glyphs. Try changing the font in cmd.exe, or use a shell that has more fonts, like Cmder.

    Comment by Didier Stevens — Sunday 7 June 2026 @ 11:59

  3. Thanks for a feedback! Appreciate this;)

    1) Yes I’m on a Windows station) 2) An NO – I redirect a output into the file, like “bla-bla > output.log” And I see python raw byte literals output in it like: b’C:\\Users\\xc4\xec\xe8\xf2\xf0\xe8\xe9\\AppData’ and the same I see and in a regular console screen – cmd.exe. So there is no problems with font here.

    Problems are only inside the fact that the ZIP records for any name of a compressed file or a folder – DOES NOT have a corresponding bit which is responsible for a encoding showing. So after that or your code directly, or Python’s code indirectly through the call of a standard objects, or any command related to the ZIP processing – they all behave very directly IMHO – or use predefined CP437 table, or try to use OS locale or do nothing)) and output raw bytes, And so the programmer – which will use these results – should create some additional logic for processing such bytes))) And right now there is NO such logic and so that’s why I suggest this new option: –name-encoding

    This option will work always and everywhere according to the code – where the output goes to the user (I think it’s not worth somehow highlighting where the output goes: the console screen or the file after the redirect) directly about the name of a file or a folder that is contained in compressed form in the processed archive. Now you have as I understand it the “-t TRANSLATE” option. But I admit that using it for the purpose of correctly viewing the names of files and folders did not help, and where it worked later (and it worked because there were no errors) was not clear to me. Can you clarify? Well, since the existing option did not help, a new one is proposed)))

    Comment by Anonymous — Sunday 7 June 2026 @ 14:35

  4. The –translate option works for the content of the file, not for the filename.
    But I’m curious to know when you see raw byte literals?
    They should not appear when you use zipdump, because the zipfile module returns a string for fileinfo.filename.
    Are you maybe using option -f ? Because with option -f, the zipfile module is not used, and with that option, filenames are displayed as bytes.

    Comment by Didier Stevens — Sunday 7 June 2026 @ 19:44

  5. That’s right – this option was used. After all, I wrote above that it is important for us to understand WHERE (on which elements) the problem with file processing occurs during unpacking, and first of all we should at least see the contents of the files/folders listing that are present in the archive. And yes, I understand that current byte-by-byte output is just as useful here. BUT! IN ADDITION to this “complex” output, a simple one is also needed – for a human to view the listing – and for this, the bytes need to be converted. And there is no option for this in the current version of the utility.

    So going back to the initial request, can we see this necessary thing in the utility?

    And let me make it clear again that if we are talking about implementing the option – then its use must be present in all places where file-folder names are processed and output. And this is how you answered above: “the zipfile module returns a string for fileinfo.filename“. So in all places with this code – you have to be sure that the module can convert the names correctly before it outputs a string with the name. And now imho the module has no external information about what encoding of names it should work with. It seems that –metadata-encoding <encoding> parameter is responsible for this when using the -l, -e and -t options for calling of the zipfile module, which seems to be our goal.

    Here is what I’ve googled so far: gist[.]githubusercontent[.]com/ElusiveSpirit/d441aae1f52f2d63530bdb255da3f64e/raw/4c35ebaec6f18b562169aa6065bbd681f9a2ec22/windows_zipfile[.]py – just remove [ ]

    By the way, another question arose while analyzing the existing listing of the list of files and folders. Does the ZIP standard somehow support absolute/full paths to a file/folder when placing them in the archive? Now we have full Windows-paths in the listing:

    C:\Users\<username>\AppData\Roaming\<folder>\<folder>\filename.ext

    And of course even for unpacking such a path cannot be used directly, at least because of the presence of forbidden element “:” in the full path. But I am confused by the fact of finding such a path in the archive data. Is such a thing possible? And how should the unpacking itself take place? If the file location paths are already hard-coded, so filename.ext should be unpacked to my C: drive to my system folder????

    Comment by Anonymous — Monday 8 June 2026 @ 7:18

  6. Ah OK, you are using option -f.
    I’ll implement a feature so that you can specify the string encoding.

    Regarding full paths: it all depends on the application.

    For example, 7-zip command-line has 2 commands to extract files:
    e : Extract files from archive (without using directory names)
    x : eXtract files with full paths

    Comment by Didier Stevens — Monday 8 June 2026 @ 10:59

  7. Sorry for my insistence – but I am NOT ONLY in favor of the -f option, I repeat once again – EVERYWHERE where the code implies outputting file-folder names to the user’s eyes, if the –name-encoding option is enabled, conversion of these names read from the archive body must take place. And it doesn’t matter which of the allowed utility options these outputs are hidden under. Whenever working with names, if the option is enabled == use it.

    The fact that programs can somehow handle such full paths in a special way is half the point. The main thing is different: they CAN be stored in the archive itself? Are you saying that after studying all the standards for this type of archive – you confirm that they are allowed to include/store FULL paths that have local meaning (and even lead to fatal consequences when unpacking on another OS)?

    And one more clarification question. You have an option to output additional information -E and there is the output of the bit flag #flags:…# Question – are we correctly understand that this is where the information about the utf-8 encoding should be stored, if we don’t use the default one defined by the standard – CP437? Can we make the output for the -E option of the enCODING attribute, so that we could immediately get the output of the file listing and understand whether the possibility to specify the full-fledged utf-8 encoding was used when packing the archive under analysis? Well, and so that this bit would be described in human form when outputting it:

    C:Demo>zipdump.py -f l -E encoding,version,crc double-suffix

    Comment by Anonymous — Monday 8 June 2026 @ 12:43

  8. I don’t understand – I already entered a new comment yesterday and still don’t see it. What’s this weird blog engine? HOW do I understand that I was able to send a text in principle? I am forced to repeat yesterday’s text, sorry….I apologize if this causes any inconvenience on your part. But we are still discussing important changes as I see it.

    Again sorry for my insistence – but I am voting NOT ONLY in favor of the “-f” option, I repeat once again – EVERYWHERE where the code implies outputting file-folder names to the user’s eyes, if the –name-encoding option is enabled, conversion of these names read from the archive body must take place. And it doesn’t matter which of the allowed utility options these outputs are hidden under. Whenever working with names, if the option is enabled ==> use it.

    The fact that programs can somehow handle such full paths in a special way is half the point. The main thing is different: they CAN be stored in the archive itself? Are you saying that after studying all the standards for this type of archive – you confirm that they are simply allowed to include/store FULL paths in ZIP that have local meaning (and even lead to fatal consequences when unpacking on another OS)?

    And one more clarification question. You have an option to output additional information -E and there is the output of the bit flag flags: Question – are we correctly understand that this is where the information about the utf-8 encoding should be stored, if we don’t use the default one defined by the standard – CP437? If yes – CAN we make the output for the -E option of the CODING attribute, so that we could immediately get the output of the file listing and understand whether the possibility to specify the full-fledged utf-8 encoding was used when packing the archive under analysis? Well, and so that this bit would be described in human form when outputting it:

    C:Demo>zipdump.py -f l -E encoding,version,crc double-suffix

    Comment by Anonymous — Tuesday 9 June 2026 @ 9:17

  9. I will reply later, I need to schedule time to look into this.

    Comment by Didier Stevens — Tuesday 9 June 2026 @ 18:56


RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.