If you follow my video blog, you’ve seen my oledump videos and downloaded the preview version. Here is the “official” release.
oledump.py is a program to analyze OLE files (Compound File Binary Format). These files contain streams of data. oledump allows you to analyze these streams.
Many applications use this file format, the best known is MS Office. .doc, .xls, .ppt, … are OLE files (docx, xlsx, … is the new file format: XML insize ZIP).
Run oledump on an .xls file and it will show you the streams:
The letter M next to stream 7, 8, 9 and 10 indicate that the stream contains VBA macros.
You can select a stream to dump its content:
The source code of VBA macros is compressed when stored inside a stream. Use option -v to decompress the VBA macros:
You can write plugins (in Python) to analyze streams. I developed 3 plugins. Plugin plugin_http_heuristics.py uses a couple of tricks to extract URLs from malicious, obfuscated VBA macros, like this:
You might have noticed that the file analyzed in the above screenshot is a zip file. Like many of my analysis programs, oledump.py can analyze a file inside a (password protected) zip file. This allows you to store your malware samples in password protected zip files (password infected), and then analyze them without having to extract them.
If you install the YARA Python module, you can scan the streams with YARA rules:
And if you suspect that the content of a stream is encoded, for example with XOR, you can try to brute-force the XOR key with a simple decoder I provide (or you can develop your own decoder in Python):
This program requires Python module OleFileIO_PL: http://www.decalage.info/python/olefileio
oledump_V0_0_3.zip (https)
MD5: 9D5AA950C9BFDB16D63D394D622C6767
SHA256: 44D8C675881245D3336D6AB6F9D7DAF152B14D7313A77CB8F84A71B62E619A70
sys.stderr, ‘Error – %s is not a valid OLE file.’ % infile
NameError: global name ‘infile’ is not defined
Comment by Yogesh — Wednesday 17 December 2014 @ 5:54
Awesome work, Didier. If you ever come to Madrid you have a ration of the best damn Iberian ham on me (with a nice red wine of course). This tool is going to be extremely handy to fight against malicious macro in pesky Excel files … 🙂
Comment by Antonio — Wednesday 17 December 2014 @ 16:07
Yogesh — It means the input file is not being detected as a valid OLE file. Chances are you have an Office Open XML structured document. Try renaming the doc to .zip and unzipping it. Look for the .bin file in the word directory that is extracted. You should be able to run the tool on it just fine, don’t forget to decompress it. To give you insight Steven, there was a recent “Phishing Campaign” that had a malicious macro in it. This tool took me to source code in minutes. Thanks again 4 the release!
Comment by @SlyDGotcha — Friday 19 December 2014 @ 20:35
Trying to get this working on windows. new to python but have installed it and got python in ok. I have installed olefileio and oledump 0.0.3 but i get this error when i try to run.
File “C:\oledump.py”, line 335
exec open(plugin, ‘r’) in globals(), globals()
^
SyntaxError: invalid syntax
Comment by luke — Wednesday 7 January 2015 @ 7:44
C:\>oledump.py
File “C:\oledump.py”, line 337
exec open(plugin, ‘r’) in globals(), globals()
^
SyntaxError: invalid syntax
I get this with version 0.0.5
Comment by luke — Wednesday 7 January 2015 @ 7:46
Are you using Python 3? My tools are for Python 2, except when stated otherwise.
Comment by Didier Stevens — Wednesday 7 January 2015 @ 7:49
python 2.7
Tried this on my desktop and a fresh VM with the same steps repeated above to try and rule out my machine.
Odd.
Thanks
Comment by luke — Wednesday 7 January 2015 @ 8:56
Great! – Many Thanks.
Comment by iiiears — Friday 9 January 2015 @ 2:28
Everytime a YARA matches, I get this.
Traceback (most recent call last):
File “oledump.py”, line 574, in
Main()
File “oledump.py”, line 571, in Main
OLEDump(args[0], options)
File “oledump.py”, line 516, in OLEDump
OLESub(ole, ”, rules, options)
File “oledump.py”, line 456, in OLESub
print(‘ YARA rule%s: %s’ % (IFF(oDecoder.Name() == ”, ”, ‘ (stream decoder: %s)’ % oDecoder.Name()), result.rule))
AttributeError: ‘str’ object has no attribute ‘rule’
Comment by Josef — Friday 9 January 2015 @ 21:08
@Josef I think this is because you don’t have the latest version of the YARA Python module installed. If I remember correctly the YARA Python API changed.
Comment by Didier Stevens — Friday 9 January 2015 @ 21:12
[…] Kahu Security has a nice walk through that shows how to find malware embedded in a Microsoft Word document, using tools like OfficeMalScanner and OleDump. […]
Pingback by Security News #0x83 | CyberOperations — Saturday 7 March 2015 @ 15:07
[…] 这儿我们使用 OleDump,他能很好的显示出文档的内部对象。 […]
Pingback by 记录一次OFFICE恶意宏分析-中国 X 黑客小组 — Tuesday 10 March 2015 @ 6:44
Is there an easy way to decrypt and extract VBA and OLE streams from an OLE2/CDF2 document if that was encrypted by password? To clarify: To *read is not password protected*, only to modify it — yet OLE streams are encrypted. Further reading on this:
https://blogs.mcafee.com/mcafee-labs/threat-actors-use-encrypted-office-binary-format-evade-detection
Comment by Tamas — Wednesday 8 July 2015 @ 18:28
@Tamas Yes, it’s very easy, VBA streams are not encrypted when a password is used to protect VBA macros. You can just decompress them. What is the MD5 of your sample?
Comment by Didier Stevens — Wednesday 8 July 2015 @ 18:59
It seems that the problem is that PowerPoint encrypts the stream ‘PowerPoint Document’ even if you only set a password for modifications. No password needed if you click on “ReadOnly” button in PowerPoint, however…
Here is an MD5 mentioned in that blog i have cited: 2E63ED1CDCEBAC556F78F16E8E872786
They claim that this evasion technique is used to leave their badness undetected — and it seems many security tools not handling this indeed leaving us blindsided.
$ ../../../oledump.py Presentation1\ \(readonly\).ppt
1: 72 ‘\x05DocumentSummaryInformation’
2: 68 ‘Current User’
3: 952 ‘EncryptedSummary’
4: 38663 ‘PowerPoint Document’
$ ../../../oledump.py Presentation1.ppt
1: 508 ‘\x05DocumentSummaryInformation’
2: 3340 ‘\x05SummaryInformation’
3: 68 ‘Current User’
4: 38435 ‘PowerPoint Document’
$ ../../../oledump.py -s4 Presentation1\ \(readonly\).ppt | head -2
00000000: EF 49 BE 5B 7A 90 4B D3 53 BA FB 52 6D 1F FD 77 �I�[z�K�S��Rm.�w
00000010: 18 19 A3 96 54 42 E0 5C 21 F7 38 8E 7A 6B C3 8E ..��TB�\!�8�zkÎ
$ ../../../oledump.py -s4 Presentation1.ppt | head -2
00000000: 0F 00 E8 03 CF 0B 00 00 01 00 E9 03 28 00 00 00 ..�.�…..�.(…
00000010: 80 16 00 00 E0 10 00 00 E0 10 00 00 80 16 00 00 �…�…�…�…
Comment by trudnai — Wednesday 8 July 2015 @ 23:52
@trudnai The ppt file contains no macros, but an exploit.
Comment by Didier Stevens — Thursday 9 July 2015 @ 22:24
Hello, thanks for the great tools. I’ve successfully tried with olevba, but oledump not working, i’ve same error as Yogesh ==> “sys.stderr, ‘Error – %s is not a valid OLE file.’ % infile”
Any advice ?
Comment by Romain — Wednesday 15 July 2015 @ 18:07
@romain Yes, you’re using an old version. Download the latest version from its page.
Comment by Didier Stevens — Wednesday 15 July 2015 @ 18:26
Thanks for quick reply, but i’ve same issue with last versiojn (0_0_17?) : Error: 423867-782155.doc is not a valid OLE file
File type is pretty weird: file 423867-782155.doc ==> 423867-782155.doc: HTML document
Comment by Romain — Wednesday 15 July 2015 @ 18:51
If it’s a MIME file, you can use my emldump tool, look for the video where I explain how to use emldump together with oledump.
What is the MD5 hash of your sample?
Comment by Didier Stevens — Wednesday 15 July 2015 @ 18:54
MD5 hash: 9c419ba752b23d11782757e205e13031 I will check for emldump
Comment by Romain — Wednesday 15 July 2015 @ 19:05
@romain Yes, it’s a MIME file:
emldump.py -s 3 -d 423867-782155.doc.vir | oledump.py
1: 1079 ‘PROJECT’
2: 470 ‘PROJECTwm’
3: m 999 ‘VBA/Class1’
4: m 1000 ‘VBA/Class10’
5: m 1000 ‘VBA/Class11’
6: m 1000 ‘VBA/Class12’
7: m 1000 ‘VBA/Class13’
8: m 1000 ‘VBA/Class14’
9: m 1000 ‘VBA/Class15’
10: m 1000 ‘VBA/Class16’
11: m 1000 ‘VBA/Class17’
12: m 1000 ‘VBA/Class18’
13: m 999 ‘VBA/Class2’
14: m 999 ‘VBA/Class3’
15: m 999 ‘VBA/Class4’
16: m 999 ‘VBA/Class5’
17: m 999 ‘VBA/Class6’
18: m 999 ‘VBA/Class7’
19: m 999 ‘VBA/Class8’
20: m 999 ‘VBA/Class9’
21: M 7473 ‘VBA/Module1’
22: M 1454 ‘VBA/ThisDocument’
23: 5927 ‘VBA/_VBA_PROJECT’
24: 989 ‘VBA/dir’
Since I have a tool for MIME files, I did not add support for MIME files to oledump. Yoy need to pipe emldump and oledump together.
Comment by Didier Stevens — Wednesday 15 July 2015 @ 19:27
Thanks, i found the option with http decrypt. Could you please tell me where you got file associated with the MD5 i gave you ?
Comment by Romain — Wednesday 15 July 2015 @ 19:40
VirusTotal Intelligence.
Comment by Didier Stevens — Wednesday 15 July 2015 @ 19:49
Hi Didier. I’ve successfully detected the VBA macro code using olevba.py inside a .doc malware. Now i’d like to decrypt it’s ActiveMime content. I’ve tried your suggestion, as in comment #22 (chaining emldump and oledump) but i’m getting this error:
Traceback (most recent call last):
File “oledump/oledump.py”, line 522, in
Main()
File “oledump/oledump.py”, line 517, in Main
OLEDump(”, options)
File “oledump/oledump.py”, line 406, in OLEDump
ole = OleFileIO_PL.OleFileIO(cStringIO.StringIO(sys.stdin.read()))
File “build/bdist.macosx-10.10-intel/egg/olefile/olefile.py”, line 1142, in __init__
File “build/bdist.macosx-10.10-intel/egg/olefile/olefile.py”, line 1247, in open
File “build/bdist.macosx-10.10-intel/egg/olefile/olefile.py”, line 1163, in _raise_defect
IOError: not an OLE2 structured storage file
Not sure what it does mean, since the olevba.py tool is detecting the macros inside it.
emldump alone yields:
1: M multipart/related
2: 3672 text/html
3: 10480 application/x-mso
4: 159 text/xml
(the stream #3 is base64 encrypted)
Do you have any suggestion?
Comment by Raffaele — Thursday 17 December 2015 @ 16:33
@Raffaele What is the exact command you used?
Comment by Didier Stevens — Thursday 17 December 2015 @ 16:36
Hi, Didier.
The command-line is use is this:
iMac:$ oledump/emldump.py -s 3 -d invoice55742665.doc | oledump/oledump.py
Comment by Raffaele — Friday 18 December 2015 @ 16:29
@Raffaele That command should work. What is the md5 of your sample (invoice55742665.doc)?
Comment by Didier Stevens — Friday 18 December 2015 @ 16:32
The md5 hash is 7c18297515cb65d55fc4864418659c98.
I’ not an expert about OLE docs. May the fact that i’ve slightly modified the file can help you in understanding the problem.
This .doc sample is actually malware I’m trying to decrypt.
The beginning of the file was like this:
sssffsfffsdddfdfdfdsdsdsfdss
MIME-Version: 22
Content-Type: multipart/related; boundary=”—-=_NextPart_Jm9Ovypy.uUh6MCk”
Error!
——=_NextPart_Jm9Ovypy.uUh6MCk
As you can see the file contains an error, surely a trick for preventing antivirus software from easily detecting the malware inside it.
In order to have olevba.py working, I’ve deleted the ssfsfffddfsfsds… line.
I don’t know if this may modify the md5 hash.
Comment by Raffaele — Saturday 19 December 2015 @ 11:40
@Raffaele Yes, when you modify the file you change the hash. I need the hash of the original file to help you.
Comment by Didier Stevens — Saturday 19 December 2015 @ 19:31
Hi, Didier. Thanks for your help.
The hash of the unmodified file is f67aa5a3ede3d31c5a68494c0678e2ee
Comment by Raffaele — Sunday 20 December 2015 @ 12:33
@Raffaele The MIME type file has one line preceding it (sssffsfffsdddfdfdfdsdsdsfdss). emldump.py can skip the first line by using option -H. So this command will extract the OLE file and analyze it: emldump.py -H -s 3 -d f67aa5a3ede3d31c5a68494c0678e2ee.vir | oledump.py
Comment by Didier Stevens — Sunday 20 December 2015 @ 17:17
Hi Didier. Thank you much for your efforts in helping me. I’ve read your recent post, but in my case it isn’t working.
If I issue the command emldump/emldump.py -H -s 3 -d f67aa5a3ede3d31c5a68494c0678e2ee.vir, alone, emldump complains that the file doesn’t exist.
If I issue ’emldump/emldump.py -H -s 3 -d invoice55742665.doc’, again, alone, it *dumps* the content of the stream #3 on screen. I can see lines like this:
Project.ThisDocument.GCxuoDtO4Project.ThisDocument.AutoOpen”Project.ThisDocument.Workbook_OpenProject.sasai.QKt3VjahPcUXQuePROJECT.SASAI.QKT3VJAHPCUXQUEPROJECT.THISDOCUMENT.AUTOOPENPROJECT.THISDOCUMENT.GCXUODTO4″PROJECT.THISDOCUMENT.WORKBOOK_OPEN@
Furthermore, the first line of this dump looks like this: ActiveMime??????-&?x??} `ՙ??l+>?qB?0???K????G|۱?Ԗ?q?D?dIv?? ?]?hh?v
But the rest is unreadable, like this: n???6^??:??d????]6???͎?Wu8=W)?p?m?8?h*?P????!lF?c2?p]E????`p n??l48Lf??RY?u;]??`t?V????[,z?`v?x???u?
Now if I try to feed this output to oledump (using ’emldump/emldump.py -H -s 3 -d invoice55742665.doc | oledump/oledump.py’) i get always the same error
Traceback (most recent call last):
File “oledump/oledump.py”, line 522, in
Main()
File “oledump/oledump.py”, line 517, in Main
OLEDump(”, options)
File “oledump/oledump.py”, line 406, in OLEDump
ole = OleFileIO_PL.OleFileIO(cStringIO.StringIO(sys.stdin.read()))
File “build/bdist.macosx-10.10-intel/egg/olefile/olefile.py”, line 1142, in __init__
File “build/bdist.macosx-10.10-intel/egg/olefile/olefile.py”, line 1247, in open
File “build/bdist.macosx-10.10-intel/egg/olefile/olefile.py”, line 1163, in _raise_defect
IOError: not an OLE2 structured storage file
Apparently oledump (or olefile?) is behaving differently on Win platforms vs. *nix ones?
Comment by Raffaele — Tuesday 22 December 2015 @ 10:18
@Rafaelle are you using the latest version of oledump?
Comment by Didier Stevens — Tuesday 22 December 2015 @ 10:21
Bingo! It’s working now! I was using an older version. I downloaded oledump from the link in your latest post, and it’s working.
Thank you very much!
Comment by Raffaele — Tuesday 22 December 2015 @ 14:43
[…] for extracting embedded OLE objects from Office documents is Didier Steven’s (ALL HAIL!!) Oledump Python tool, so that is what we will […]
Pingback by Macros! – Malcat! Mew! — Monday 13 June 2016 @ 3:11
Hi Didier,
I am trying to set up the environment so that oledump.py will work. I have downloaded the latest version of oledump.py. I have also pip installed olefile. However when I try and run oledump.py using the syntax you have shown I get a message telling me that “This program requires module olefileIO_PL”. I have copied this specific file from the olefile folder (downloaded from the publishers website) and have copied it into the scripts folder of Python27 (I am running version 2.7.10) and I have also copied it into the oledump folder which contains the oledump.py file.
If you could please let me know what I am doing wrong here… I have also run the setup file for oledump in case the pip install was not actually installing oledump.
Thank you in advance.
Comment by noman — Friday 17 June 2016 @ 19:15
Are you using the latest version?
Comment by Didier Stevens — Friday 30 September 2016 @ 14:50
[…] We can extract the macros’ source code from the excel file using oledump. […]
Pingback by Static Analysis: Locky Osiris – Evil Code Analysis — Sunday 22 January 2017 @ 20:05
Hi Didier,
FYI, I’m experiencing a similar problem to the one that norman was having above. I’m running Python 2.7.14, but upon execution “oledump.py” version 0.0.33 I’m getting the following error message:
This program requires module olefile.
http://www.decalage.info/python/olefileio
However, I definitely have olefile version 0.45 installed, as I can successfully run commands from “python-oletools” v0.52 (e.g., oleid, olevba, mraptor, etc…) which requires olefile as a dependency.
Any ideas regarding why the olefile module is not being detected by oledump?
Thank you for your help!
Comment by Clint — Wednesday 23 May 2018 @ 22:32
Make sure you have only one Python interpreter installed, then run it and type import olefile. Do you get an error?
Comment by Didier Stevens — Wednesday 23 May 2018 @ 22:35
Yep, had to uninstall Python 3 and then had to “pip install olefile” again. FYI, I had previously followed all the guidance for how to run both Python 2 & 3 environments on the same computer, I tried using the Python Launcher to explicitly specify the Python version to use, and even setup and used a separate Python virtual environment using “virtualenv”, but none of it worked. I had to actually uninstall Python 3 and then install olefile again in order to get it working. Weird. Thanks for the assist.
Comment by Clint — Thursday 24 May 2018 @ 16:19
Did you run command “pip install olefile” exactly like that, e.g. without absolute path to pip?
Because I have different versions of Python on my machines (2 & 3, x86 & x64), but I always provide a full path when running pip, this way I know in what version of Python I’m installing.
Like this: c:\python27\scripts\pip.exe …
Comment by Didier Stevens — Thursday 24 May 2018 @ 16:24
Tried running the pip command explicitly using fully qualified path name from each Python version. Each version was claiming to have the olefile module already installed. Unsure where the issue was in my environment, but uninstalling Python 3 fixed it. FYI, I have since reinstalled Python 3 and everything still appears to be working normally now. Very odd.
Comment by Clint — Thursday 24 May 2018 @ 20:52
With the EOL of python 2.7 is there a version oledump.py that works with python 3.8?
If so is there a step by step statement of the setup required?
Comment by Doug Goss — Saturday 11 January 2020 @ 1:56
Yes, the latest version works with Python 3: https://blog.didierstevens.com/programs/oledump-py/
I test with 3.7, so let me know should you have a problem with 3.8.
Comment by Didier Stevens — Sunday 12 January 2020 @ 9:46
Tried running oledump.py however it says that “This program requires module olefile.
http://www.decalage.info/python/olefileio” ….. I have insatalled olefile via install.bat…. Python version is 2.7
Comment by Anonymous — Wednesday 29 July 2020 @ 13:09
What version of oledump are you using? Did you download the latest version from here: https://blog.didierstevens.com/programs/oledump-py/
Comment by Didier Stevens — Wednesday 29 July 2020 @ 22:31