No breaking news in this post, but I’ve found enough applications vulnerable to XML-bombs and not enough awareness around it, that I feel it justifies another web page documenting the principles behind it, together with suggestions to protect your applications from it.
The XML-bomb is a small XML document designed to expand to a gigantic size when parsed by an (unprotected) XML-parser. The huge amount of resources (memory) consumed when parsing the XML-bomb can cause a DoS or BoF.
Take this simple XML document:
And take this Document Type Declaration defining an entity e0 with value A:
Including this DOCTYPE in our simple XML document enables us to reference entity e0 in our document, for example like this:
When this document is parsed by an XML-parser supporting DTDs, the entity reference is replaced by its value. Here is Internet Explorer rendering our XML document:
Notice that &e0; has been replaced by A.
This entity definition and referral mechanism is one essential ingredient of an XML-bomb.
The second ingredient is an expression that will grow exponentially and consume huge amounts of resources when evaluated.
We define a second entity, e1, referring twice to our first entity e0:
Include this definition in our XML document:
And this is how it is parsed:
e0 evaluates to A
e1 evaluates to AA
Now define e2 referencing e1, e3 referencing e2, …, and then we get
e2 evaluates to AAAA
e3 evaluates to AAAAAAAA
We have achieved exponential growth! An XML-bomb with 31 entities is less than 1K in size, but entity e30 is 1GB (2^30 bytes) in size when it gets evaluated by the XML-parser!
How do you protect your application from an exploding XML-bomb?
If you don’t need support for DTDs, just disable DTDs or use a parser without DTD support.
If you need support for DTDs, try to prevent XML-bombs from entering your XML-parser by known-pattern scanning (like classic antivirus software does, for example an application firewall) or limit the impact of an expanding XML-bomb by hardening your XML-parser by restricting its consumption of resources.
You’re aware of the limitations of known-pattern scanning. This is a text-book XML-bomb, with exponential growth finding its origin in the binary tree structure. But there are many other data type structures …