Didier Stevens

Wednesday 18 February 2015

Analyzing A Fraudulent Document With Error Level Analysis

Filed under: Forensics,My Software,PDF — Didier Stevens @ 0:00

Some time ago I had the chance to try out an image forensic method (Error Level Analysis) on a PDF. It was a fraudulent document (a form), but with a special characteristic: the criminal converted the original form (a PDF) to JPEG, edited the JPEG with a raster graphics editor, and then inserted the edited JPEG in a PDF document. This gave me the opportunity to try out Error Level Analysis (ELA) on a “text document”.

I can’t share the PDF, but I recreated one to use in this blogpost.

First I search for images in the PDF document:

pdf-parser.py -s image example-edited.pdf


obj 4 0
 Referencing: 6 0 R

        /Im4 6 0 R
    /ProcSet [/PDF/Text/ImageC/ImageI/ImageB]

obj 6 0
 Type: /XObject
 Contains stream

    /Type /XObject
    /Subtype /Image
    /Width 680
    /Height 965
    /BitsPerComponent 8
    /ColorSpace /DeviceRGB
    /Filter /DCTDecode
    /Length 233133

The image is in object 6. I extract the image:

pdf-parser.py -o 6 -d example-edited.jpeg example-edited.pdf

Here it is:


If you Google for Error Level Analysis, you’ll find a couple of websites that provide online image forensics. But that was not an option for me, I could not share the document.

I found this C program for ELA, and later I wrote my own Python program (what else?), that I’ll use for this example:

image-forensics-ela.py example-edited.jpeg example-edited-ela.png


The colored pixels reveal the word I edited. You can see it better when I overlay the 2 images:

image-overlay.py -a 0.6 example-edited.jpeg example-edited-ela.png example-edited-overlay.png


FYI: there is also a GIMP plugin for ELA.

You can download the examples and programs here:

blogpost-ela-files.zip (https)
MD5: 4F3071A9162C5CA8B7B10A41F662093A
SHA256: CBA786368D7BAF65E1E9F854C315BFB60FF89910429106513A0C41C180D8FCAB

Blog at WordPress.com.