Didier Stevens

Wednesday 17 December 2014

Introducing oledump.py

Filed under: Forensics,Malware,My Software — Didier Stevens @ 0:07

If you follow my video blog, you’ve seen my oledump videos and downloaded the preview version. Here is the “official” release.

oledump.py is a program to analyze OLE files (Compound File Binary Format). These files contain streams of data. oledump allows you to analyze these streams.

Many applications use this file format, the best known is MS Office. .doc, .xls, .ppt, … are OLE files (docx, xlsx, … is the new file format: XML insize ZIP).

Run oledump on an .xls file and it will show you the streams:

20141216-223150

The letter M next to stream 7, 8, 9 and 10 indicate that the stream contains VBA macros.

You can select a stream to dump its content:

20141216-223233

The source code of VBA macros is compressed when stored inside a stream. Use option -v to decompress the VBA macros:

20141216-223705

You can write plugins (in Python) to analyze streams. I developed 3 plugins. Plugin plugin_http_heuristics.py uses a couple of tricks to extract URLs from malicious, obfuscated VBA macros, like this:

20141216-224228

You might have noticed that the file analyzed in the above screenshot is a zip file. Like many of my analysis programs, oledump.py can analyze a file inside a (password protected) zip file. This allows you to store your malware samples in password protected zip files (password infected), and then analyze them without having to extract them.

If you install the YARA Python module, you can scan the streams with YARA rules:

20141216-224952

And if you suspect that the content of a stream is encoded, for example with XOR, you can try to brute-force the XOR key with a simple decoder I provide (or you can develop your own decoder in Python):

20141216-225911

This program requires Python module OleFileIO_PL: http://www.decalage.info/python/olefileio

oledump_V0_0_3.zip (https)
MD5: 9D5AA950C9BFDB16D63D394D622C6767
SHA256: 44D8C675881245D3336D6AB6F9D7DAF152B14D7313A77CB8F84A71B62E619A70

39 Comments »

  1. sys.stderr, ‘Error – %s is not a valid OLE file.’ % infile
    NameError: global name ‘infile’ is not defined

    Comment by Yogesh — Wednesday 17 December 2014 @ 5:54

  2. Awesome work, Didier. If you ever come to Madrid you have a ration of the best damn Iberian ham on me (with a nice red wine of course). This tool is going to be extremely handy to fight against malicious macro in pesky Excel files … 🙂

    Comment by Antonio — Wednesday 17 December 2014 @ 16:07

  3. Yogesh — It means the input file is not being detected as a valid OLE file. Chances are you have an Office Open XML structured document. Try renaming the doc to .zip and unzipping it. Look for the .bin file in the word directory that is extracted. You should be able to run the tool on it just fine, don’t forget to decompress it. To give you insight Steven, there was a recent “Phishing Campaign” that had a malicious macro in it. This tool took me to source code in minutes. Thanks again 4 the release!

    Comment by @SlyDGotcha — Friday 19 December 2014 @ 20:35

  4. Trying to get this working on windows. new to python but have installed it and got python in ok. I have installed olefileio and oledump 0.0.3 but i get this error when i try to run.

    File “C:\oledump.py”, line 335
    exec open(plugin, ‘r’) in globals(), globals()
    ^
    SyntaxError: invalid syntax

    Comment by luke — Wednesday 7 January 2015 @ 7:44

  5. C:\>oledump.py
    File “C:\oledump.py”, line 337
    exec open(plugin, ‘r’) in globals(), globals()
    ^
    SyntaxError: invalid syntax

    I get this with version 0.0.5

    Comment by luke — Wednesday 7 January 2015 @ 7:46

  6. Are you using Python 3? My tools are for Python 2, except when stated otherwise.

    Comment by Didier Stevens — Wednesday 7 January 2015 @ 7:49

  7. python 2.7

    Tried this on my desktop and a fresh VM with the same steps repeated above to try and rule out my machine.

    Odd.

    Thanks

    Comment by luke — Wednesday 7 January 2015 @ 8:56

  8. Great! – Many Thanks.

    Comment by iiiears — Friday 9 January 2015 @ 2:28

  9. Everytime a YARA matches, I get this.

    Traceback (most recent call last):
    File “oledump.py”, line 574, in
    Main()
    File “oledump.py”, line 571, in Main
    OLEDump(args[0], options)
    File “oledump.py”, line 516, in OLEDump
    OLESub(ole, ”, rules, options)
    File “oledump.py”, line 456, in OLESub
    print(‘ YARA rule%s: %s’ % (IFF(oDecoder.Name() == ”, ”, ‘ (stream decoder: %s)’ % oDecoder.Name()), result.rule))
    AttributeError: ‘str’ object has no attribute ‘rule’

    Comment by Josef — Friday 9 January 2015 @ 21:08

  10. @Josef I think this is because you don’t have the latest version of the YARA Python module installed. If I remember correctly the YARA Python API changed.

    Comment by Didier Stevens — Friday 9 January 2015 @ 21:12

  11. […] Kahu Security has a nice walk through that shows how to find malware embedded in a Microsoft Word document, using tools like OfficeMalScanner and OleDump. […]

    Pingback by Security News #0x83 | CyberOperations — Saturday 7 March 2015 @ 15:07

  12. […] 这儿我们使用 OleDump,他能很好的显示出文档的内部对象。 […]

    Pingback by 记录一次OFFICE恶意宏分析-中国 X 黑客小组 — Tuesday 10 March 2015 @ 6:44

  13. Is there an easy way to decrypt and extract VBA and OLE streams from an OLE2/CDF2 document if that was encrypted by password? To clarify: To *read is not password protected*, only to modify it — yet OLE streams are encrypted. Further reading on this:

    https://blogs.mcafee.com/mcafee-labs/threat-actors-use-encrypted-office-binary-format-evade-detection

    Comment by Tamas — Wednesday 8 July 2015 @ 18:28

  14. @Tamas Yes, it’s very easy, VBA streams are not encrypted when a password is used to protect VBA macros. You can just decompress them. What is the MD5 of your sample?

    Comment by Didier Stevens — Wednesday 8 July 2015 @ 18:59

  15. It seems that the problem is that PowerPoint encrypts the stream ‘PowerPoint Document’ even if you only set a password for modifications. No password needed if you click on “ReadOnly” button in PowerPoint, however…

    Here is an MD5 mentioned in that blog i have cited: 2E63ED1CDCEBAC556F78F16E8E872786
    They claim that this evasion technique is used to leave their badness undetected — and it seems many security tools not handling this indeed leaving us blindsided.

    $ ../../../oledump.py Presentation1\ \(readonly\).ppt
    1: 72 ‘\x05DocumentSummaryInformation’
    2: 68 ‘Current User’
    3: 952 ‘EncryptedSummary’
    4: 38663 ‘PowerPoint Document’
    $ ../../../oledump.py Presentation1.ppt
    1: 508 ‘\x05DocumentSummaryInformation’
    2: 3340 ‘\x05SummaryInformation’
    3: 68 ‘Current User’
    4: 38435 ‘PowerPoint Document’

    $ ../../../oledump.py -s4 Presentation1\ \(readonly\).ppt | head -2
    00000000: EF 49 BE 5B 7A 90 4B D3 53 BA FB 52 6D 1F FD 77 �I�[z�K�S��Rm.�w
    00000010: 18 19 A3 96 54 42 E0 5C 21 F7 38 8E 7A 6B C3 8E ..��TB�\!�8�zkÎ
    $ ../../../oledump.py -s4 Presentation1.ppt | head -2
    00000000: 0F 00 E8 03 CF 0B 00 00 01 00 E9 03 28 00 00 00 ..�.�…..�.(…
    00000010: 80 16 00 00 E0 10 00 00 E0 10 00 00 80 16 00 00 �…�…�…�…

    Comment by trudnai — Wednesday 8 July 2015 @ 23:52

  16. @trudnai The ppt file contains no macros, but an exploit.

    Comment by Didier Stevens — Thursday 9 July 2015 @ 22:24

  17. Hello, thanks for the great tools. I’ve successfully tried with olevba, but oledump not working, i’ve same error as Yogesh ==> “sys.stderr, ‘Error – %s is not a valid OLE file.’ % infile”
    Any advice ?

    Comment by Romain — Wednesday 15 July 2015 @ 18:07

  18. @romain Yes, you’re using an old version. Download the latest version from its page.

    Comment by Didier Stevens — Wednesday 15 July 2015 @ 18:26

  19. Thanks for quick reply, but i’ve same issue with last versiojn (0_0_17?) : Error: 423867-782155.doc is not a valid OLE file
    File type is pretty weird: file 423867-782155.doc ==> 423867-782155.doc: HTML document

    Comment by Romain — Wednesday 15 July 2015 @ 18:51

  20. If it’s a MIME file, you can use my emldump tool, look for the video where I explain how to use emldump together with oledump.
    What is the MD5 hash of your sample?

    Comment by Didier Stevens — Wednesday 15 July 2015 @ 18:54

  21. MD5 hash: 9c419ba752b23d11782757e205e13031 I will check for emldump

    Comment by Romain — Wednesday 15 July 2015 @ 19:05

  22. @romain Yes, it’s a MIME file:

    emldump.py -s 3 -d 423867-782155.doc.vir | oledump.py
    1: 1079 ‘PROJECT’
    2: 470 ‘PROJECTwm’
    3: m 999 ‘VBA/Class1’
    4: m 1000 ‘VBA/Class10’
    5: m 1000 ‘VBA/Class11’
    6: m 1000 ‘VBA/Class12’
    7: m 1000 ‘VBA/Class13’
    8: m 1000 ‘VBA/Class14’
    9: m 1000 ‘VBA/Class15’
    10: m 1000 ‘VBA/Class16’
    11: m 1000 ‘VBA/Class17’
    12: m 1000 ‘VBA/Class18’
    13: m 999 ‘VBA/Class2’
    14: m 999 ‘VBA/Class3’
    15: m 999 ‘VBA/Class4’
    16: m 999 ‘VBA/Class5’
    17: m 999 ‘VBA/Class6’
    18: m 999 ‘VBA/Class7’
    19: m 999 ‘VBA/Class8’
    20: m 999 ‘VBA/Class9’
    21: M 7473 ‘VBA/Module1’
    22: M 1454 ‘VBA/ThisDocument’
    23: 5927 ‘VBA/_VBA_PROJECT’
    24: 989 ‘VBA/dir’

    Since I have a tool for MIME files, I did not add support for MIME files to oledump. Yoy need to pipe emldump and oledump together.

    Comment by Didier Stevens — Wednesday 15 July 2015 @ 19:27

  23. Thanks, i found the option with http decrypt. Could you please tell me where you got file associated with the MD5 i gave you ?

    Comment by Romain — Wednesday 15 July 2015 @ 19:40

  24. VirusTotal Intelligence.

    Comment by Didier Stevens — Wednesday 15 July 2015 @ 19:49

  25. Hi Didier. I’ve successfully detected the VBA macro code using olevba.py inside a .doc malware. Now i’d like to decrypt it’s ActiveMime content. I’ve tried your suggestion, as in comment #22 (chaining emldump and oledump) but i’m getting this error:

    Traceback (most recent call last):
    File “oledump/oledump.py”, line 522, in
    Main()
    File “oledump/oledump.py”, line 517, in Main
    OLEDump(”, options)
    File “oledump/oledump.py”, line 406, in OLEDump
    ole = OleFileIO_PL.OleFileIO(cStringIO.StringIO(sys.stdin.read()))
    File “build/bdist.macosx-10.10-intel/egg/olefile/olefile.py”, line 1142, in __init__
    File “build/bdist.macosx-10.10-intel/egg/olefile/olefile.py”, line 1247, in open
    File “build/bdist.macosx-10.10-intel/egg/olefile/olefile.py”, line 1163, in _raise_defect
    IOError: not an OLE2 structured storage file

    Not sure what it does mean, since the olevba.py tool is detecting the macros inside it.
    emldump alone yields:

    1: M multipart/related
    2: 3672 text/html
    3: 10480 application/x-mso
    4: 159 text/xml

    (the stream #3 is base64 encrypted)

    Do you have any suggestion?

    Comment by Raffaele — Thursday 17 December 2015 @ 16:33

  26. @Raffaele What is the exact command you used?

    Comment by Didier Stevens — Thursday 17 December 2015 @ 16:36

  27. Hi, Didier.
    The command-line is use is this:
    iMac:$ oledump/emldump.py -s 3 -d invoice55742665.doc | oledump/oledump.py

    Comment by Raffaele — Friday 18 December 2015 @ 16:29

  28. @Raffaele That command should work. What is the md5 of your sample (invoice55742665.doc)?

    Comment by Didier Stevens — Friday 18 December 2015 @ 16:32

  29. The md5 hash is 7c18297515cb65d55fc4864418659c98.
    I’ not an expert about OLE docs. May the fact that i’ve slightly modified the file can help you in understanding the problem.
    This .doc sample is actually malware I’m trying to decrypt.
    The beginning of the file was like this:

    sssffsfffsdddfdfdfdsdsdsfdss

    MIME-Version: 22
    Content-Type: multipart/related; boundary=”—-=_NextPart_Jm9Ovypy.uUh6MCk”

    Error!

    ——=_NextPart_Jm9Ovypy.uUh6MCk

    As you can see the file contains an error, surely a trick for preventing antivirus software from easily detecting the malware inside it.
    In order to have olevba.py working, I’ve deleted the ssfsfffddfsfsds… line.
    I don’t know if this may modify the md5 hash.

    Comment by Raffaele — Saturday 19 December 2015 @ 11:40

  30. @Raffaele Yes, when you modify the file you change the hash. I need the hash of the original file to help you.

    Comment by Didier Stevens — Saturday 19 December 2015 @ 19:31

  31. Hi, Didier. Thanks for your help.
    The hash of the unmodified file is f67aa5a3ede3d31c5a68494c0678e2ee

    Comment by Raffaele — Sunday 20 December 2015 @ 12:33

  32. @Raffaele The MIME type file has one line preceding it (sssffsfffsdddfdfdfdsdsdsfdss). emldump.py can skip the first line by using option -H. So this command will extract the OLE file and analyze it: emldump.py -H -s 3 -d f67aa5a3ede3d31c5a68494c0678e2ee.vir | oledump.py

    Comment by Didier Stevens — Sunday 20 December 2015 @ 17:17

  33. Hi Didier. Thank you much for your efforts in helping me. I’ve read your recent post, but in my case it isn’t working.
    If I issue the command emldump/emldump.py -H -s 3 -d f67aa5a3ede3d31c5a68494c0678e2ee.vir, alone, emldump complains that the file doesn’t exist.
    If I issue ’emldump/emldump.py -H -s 3 -d invoice55742665.doc’, again, alone, it *dumps* the content of the stream #3 on screen. I can see lines like this:

    Project.ThisDocument.GCxuoDtO4Project.ThisDocument.AutoOpen”Project.ThisDocument.Workbook_OpenProject.sasai.QKt3VjahPcUXQuePROJECT.SASAI.QKT3VJAHPCUXQUEPROJECT.THISDOCUMENT.AUTOOPENPROJECT.THISDOCUMENT.GCXUODTO4″PROJECT.THISDOCUMENT.WORKBOOK_OPEN@

    Furthermore, the first line of this dump looks like this: ActiveMime??????-&?x??} `ՙ??l+>?qB?0???K????G|۱?Ԗ?q?D?dIv?? ?]?hh?v

    But the rest is unreadable, like this: n??׹?6^?׻?:??d????]6???͎?Wu8=W)?p?m?8?h*?P????!lF?c2?p]E????`p n??l48Lf??RY?u;]??`t?V????[,z?`v?x???u?

    Now if I try to feed this output to oledump (using ’emldump/emldump.py -H -s 3 -d invoice55742665.doc | oledump/oledump.py’) i get always the same error

    Traceback (most recent call last):
    File “oledump/oledump.py”, line 522, in
    Main()
    File “oledump/oledump.py”, line 517, in Main
    OLEDump(”, options)
    File “oledump/oledump.py”, line 406, in OLEDump
    ole = OleFileIO_PL.OleFileIO(cStringIO.StringIO(sys.stdin.read()))
    File “build/bdist.macosx-10.10-intel/egg/olefile/olefile.py”, line 1142, in __init__
    File “build/bdist.macosx-10.10-intel/egg/olefile/olefile.py”, line 1247, in open
    File “build/bdist.macosx-10.10-intel/egg/olefile/olefile.py”, line 1163, in _raise_defect
    IOError: not an OLE2 structured storage file

    Apparently oledump (or olefile?) is behaving differently on Win platforms vs. *nix ones?

    Comment by Raffaele — Tuesday 22 December 2015 @ 10:18

  34. @Rafaelle are you using the latest version of oledump?

    Comment by Didier Stevens — Tuesday 22 December 2015 @ 10:21

  35. Bingo! It’s working now! I was using an older version. I downloaded oledump from the link in your latest post, and it’s working.
    Thank you very much!

    Comment by Raffaele — Tuesday 22 December 2015 @ 14:43

  36. […] for extracting embedded OLE objects from Office documents is Didier Steven’s (ALL HAIL!!) Oledump Python tool, so that is what we will […]

    Pingback by Macros! – Malcat! Mew! — Monday 13 June 2016 @ 3:11

  37. Hi Didier,

    I am trying to set up the environment so that oledump.py will work. I have downloaded the latest version of oledump.py. I have also pip installed olefile. However when I try and run oledump.py using the syntax you have shown I get a message telling me that “This program requires module olefileIO_PL”. I have copied this specific file from the olefile folder (downloaded from the publishers website) and have copied it into the scripts folder of Python27 (I am running version 2.7.10) and I have also copied it into the oledump folder which contains the oledump.py file.

    If you could please let me know what I am doing wrong here… I have also run the setup file for oledump in case the pip install was not actually installing oledump.

    Thank you in advance.

    Comment by noman — Friday 17 June 2016 @ 19:15

  38. Are you using the latest version?

    Comment by Didier Stevens — Friday 30 September 2016 @ 14:50

  39. […] We can extract the macros’ source code from the excel file using oledump. […]

    Pingback by Static Analysis: Locky Osiris – Evil Code Analysis — Sunday 22 January 2017 @ 20:05


RSS feed for comments on this post. TrackBack URI

Leave a Reply (comments are moderated)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: