Didier Stevens

Monday 19 May 2008

PDF Stream Objects

Filed under: Malware, PDF — Didier Stevens @ 6:09

A PDF stream object is a sequence of bytes. There is a virtually unlimited number of ways to represent the same byte sequence. After Names and Strings obfuscation, let’s take a look at streams.

A PDF stream object is composed of a dictionary (<< >>), the keyword stream, a sequence of bytes and the keyword endstream. All streams must be indirect objects. Here is an example:

This stream is indirect object 5 version 0. The stream dictionary must have a /Length entry, to document the length of the (encoded) byte sequence. The stream and endstream keywords are terminated with the EOL character(s). In this example, the byte sequence is a set of instructions for the PDF reader to render the string Hello World with a given font at a precise position. It’s precisely 42 bytes long.

In this example, the byte sequence is represented literally, but it’s possible (and usual) to encode the byte sequence. This is done with a stream filter. A stream filter specifies how the sequence of bytes has to be decoded. Let’s take the same example, but with an ASCII85 encoding:

The /Filter entry instructs the PDF reader how to decode the byte sequence (/ASCII85Decode). Notice the change of the length value. There are many encoding schemes (ASCII filters and decompression filters), here is a list:

  • ASCIIHexDecode
  • ASCII85Decode
  • LZWDecode
  • FlateDecode
  • RunLengthDecode
  • CCITTFaxDecode
  • JBIG2Decode
  • DCTDecode
  • JPXDecode
  • Crypt

This list is not so long, so why do I claim an almost limitless number of ways to encode a stream? I have 2 reasons:

  1. Many filters, like /FlateDecode, take parameters (in this case, the compression level), which influence the encoding too
  2. Filters can be cascaded, meaning that the stream has to be decoded by more than one filter

Here is our example, where the stream is encoded twice, first with ASCII85 and then with plain HEX (I know, this is rather pointless, but it yields simple and readable examples):

Cascading filters also inspired me to create a couple of test PDF documents. For example, I’ve created a 2642 bytes small PDF document that contains a 1GB large stream (a ZIP bomb of sorts). Some PDF readers will choke on this document.

4 Comments »

  1. [...] security professional Didier Stevens has highlighted a potential exploit in PDF Stream Objects which could be used to cause a PDF file to balloon in size, prompting Computerworld to label it [...]

    Pingback by PDF Bomb - PDFalerts — Tuesday 27 May 2008 @ 20:09

  2. Some of these filters cannot be used to hide scripts with exploits, because they do lossy compression and are suitable only for images. I think (but am not 100% sure) that CCITTFaxDecode, JBIG2Decode, DCTDecode and JPXDecode all fall in this category. They might be usable for a denial-of-service attack (the equivalent of the ZIP bomb), although I have my doubts about that too.

    Comment by Vesselin Bontchev — Wednesday 28 May 2008 @ 18:03

  3. It’s true that these filters are lossy, but the first 3 of them take parameters, and I believe it’s possible to parameterize a lossless compression. But I have not tested this.

    Comment by Didier Stevens — Wednesday 28 May 2008 @ 20:40

  4. [...] security professional Didier Stevens has highlighted a potential exploit in PDF Stream Objects which could be used to cause a PDF file to balloon in size, prompting Computerworld to label it the [...]

    Pingback by PDF Bomb — Thursday 14 August 2008 @ 8:28

RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.