Some time ago I had the chance to try out an image forensic method (Error Level Analysis) on a PDF. It was a fraudulent document (a form), but with a special characteristic: the criminal converted the original form (a PDF) to JPEG, edited the JPEG with a raster graphics editor, and then inserted the edited JPEG in a PDF document. This gave me the opportunity to try out Error Level Analysis (ELA) on a “text document”.
I can’t share the PDF, but I recreated one to use in this blogpost.
First I search for images in the PDF document:
pdf-parser.py -s image example-edited.pdf
Result:
obj 4 0 Type: Referencing: 6 0 R << /Font /XObject << /Im4 6 0 R >> /ProcSet [/PDF/Text/ImageC/ImageI/ImageB] >> obj 6 0 Type: /XObject Referencing: Contains stream << /Type /XObject /Subtype /Image /Width 680 /Height 965 /BitsPerComponent 8 /ColorSpace /DeviceRGB /Filter /DCTDecode /Length 233133 >>
The image is in object 6. I extract the image:
pdf-parser.py -o 6 -d example-edited.jpeg example-edited.pdf
Here it is:
If you Google for Error Level Analysis, you’ll find a couple of websites that provide online image forensics. But that was not an option for me, I could not share the document.
I found this C program for ELA, and later I wrote my own Python program (what else?), that I’ll use for this example:
image-forensics-ela.py example-edited.jpeg example-edited-ela.png
The colored pixels reveal the word I edited. You can see it better when I overlay the 2 images:
image-overlay.py -a 0.6 example-edited.jpeg example-edited-ela.png example-edited-overlay.png
FYI: there is also a GIMP plugin for ELA.
You can download the examples and programs here:
blogpost-ela-files.zip (https)
MD5: 4F3071A9162C5CA8B7B10A41F662093A
SHA256: CBA786368D7BAF65E1E9F854C315BFB60FF89910429106513A0C41C180D8FCAB