No breaking news in this post, but I’ve found enough applications vulnerable to XML-bombs and not enough awareness around it, that I feel it justifies another web page documenting the principles behind it, together with suggestions to protect your applications from it.
The XML-bomb is a small XML document designed to expand to a gigantic size when parsed by an (unprotected) XML-parser. The huge amount of resources (memory) consumed when parsing the XML-bomb can cause a DoS or BoF.
Take this simple XML document:
And take this Document Type Declaration defining an entity e0 with value A:
Including this DOCTYPE in our simple XML document enables us to reference entity e0 in our document, for example like this:
When this document is parsed by an XML-parser supporting DTDs, the entity reference is replaced by its value. Here is Internet Explorer rendering our XML document:
Notice that &e0; has been replaced by A.
This entity definition and referral mechanism is one essential ingredient of an XML-bomb.
The second ingredient is an expression that will grow exponentially and consume huge amounts of resources when evaluated.
We define a second entity, e1, referring twice to our first entity e0:
Include this definition in our XML document:
And this is how it is parsed:
e0 evaluates to A
e1 evaluates to AA
Now define e2 referencing e1, e3 referencing e2, …, and then we get
e2 evaluates to AAAA
e3 evaluates to AAAAAAAA
…
We have achieved exponential growth! An XML-bomb with 31 entities is less than 1K in size, but entity e30 is 1GBÂ (2^30 bytes) in size when it gets evaluated by the XML-parser!
How do you protect your application from an exploding XML-bomb?
If you don’t need support for DTDs, just disable DTDs or use a parser without DTD support.
If you need support for DTDs, try to prevent XML-bombs from entering your XML-parser by known-pattern scanning (like classic antivirus software does, for example an application firewall) or limit the impact of an expanding XML-bomb by hardening your XML-parser by restricting its consumption of resources.
You’re aware of the limitations of known-pattern scanning. This is a text-book XML-bomb, with exponential growth finding its origin in the binary tree structure. But there are many other data type structures …