Some time ago I had the chance to try out an image forensic method (Error Level Analysis) on a PDF. It was a fraudulent document (a form), but with a special characteristic: the criminal converted the original form (a PDF) to JPEG, edited the JPEG with a raster graphics editor, and then inserted the edited JPEG in a PDF document. This gave me the opportunity to try out Error Level Analysis (ELA) on a “text document”.
I can’t share the PDF, but I recreated one to use in this blogpost.
First I search for images in the PDF document:
pdf-parser.py -s image example-edited.pdf
Result:
obj 4 0 Type: Referencing: 6 0 R << /Font /XObject << /Im4 6 0 R >> /ProcSet [/PDF/Text/ImageC/ImageI/ImageB] >> obj 6 0 Type: /XObject Referencing: Contains stream << /Type /XObject /Subtype /Image /Width 680 /Height 965 /BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter /DCTDecode /Length 233133 >>
The image is in object 6. I extract the image:
pdf-parser.py -o 6 -d example-edited.jpeg example-edited.pdf
Here it is:
If you Google for Error Level Analysis, you’ll find a couple of websites that provide online image forensics. But that was not an option for me, I could not share the document.
I found this C program for ELA, and later I wrote my own Python program (what else?), that I’ll use for this example:
image-forensics-ela.py example-edited.jpeg example-edited-ela.png
The colored pixels reveal the word I edited. You can see it better when I overlay the 2 images:
image-overlay.py -a 0.6 example-edited.jpeg example-edited-ela.png example-edited-overlay.png
FYI: there is also a GIMP plugin for ELA.
You can download the examples and programs here:
blogpost-ela-files.zip (https)
MD5: 4F3071A9162C5CA8B7B10A41F662093A
SHA256: CBA786368D7BAF65E1E9F854C315BFB60FF89910429106513A0C41C180D8FCAB
Thank you for your post.
I’ve been playing with ELA and JPEG and I’ve been also “playing” with your script. How do you know when something is forged? Sometimes seems that’s only a change of colour in the document. ELA works OK in some cases but in other cases, it returns false positives. How can deal with this?
Thanks in advance,
Alex.
Comment by Alejandro — Tuesday 5 May 2015 @ 15:00
I’ve only done ELA on black & white text. Every text added or changed was indicated by color pixels. The other detections, the false positives, had gray pixels.
Comment by Didier Stevens — Wednesday 6 May 2015 @ 4:29
could you explain how to run this python program and analyze ela.I am asking this I am not a expert in program,but I want to analyze the image
thank you in advance
Comment by ARJUN — Monday 28 December 2015 @ 14:10
@ARJUN If the command-line is a problem for you, do it on-line, there are a couple of websites that will analyze images.
Comment by Didier Stevens — Monday 28 December 2015 @ 15:38