Didier Stevens

Monday 19 May 2008

PDF Stream Objects

Filed under: Malware,PDF — Didier Stevens @ 6:09

A PDF stream object is a sequence of bytes. There is a virtually unlimited number of ways to represent the same byte sequence. After Names and Strings obfuscation, let’s take a look at streams.

A PDF stream object is composed of a dictionary (<< >>), the keyword stream, a sequence of bytes and the keyword endstream. All streams must be indirect objects. Here is an example:

This stream is indirect object 5 version 0. The stream dictionary must have a /Length entry, to document the length of the (encoded) byte sequence. The stream and endstream keywords are terminated with the EOL character(s). In this example, the byte sequence is a set of instructions for the PDF reader to render the string Hello World with a given font at a precise position. It’s precisely 42 bytes long.

In this example, the byte sequence is represented literally, but it’s possible (and usual) to encode the byte sequence. This is done with a stream filter. A stream filter specifies how the sequence of bytes has to be decoded. Let’s take the same example, but with an ASCII85 encoding:

The /Filter entry instructs the PDF reader how to decode the byte sequence (/ASCII85Decode). Notice the change of the length value. There are many encoding schemes (ASCII filters and decompression filters), here is a list:

  • ASCIIHexDecode
  • ASCII85Decode
  • LZWDecode
  • FlateDecode
  • RunLengthDecode
  • CCITTFaxDecode
  • JBIG2Decode
  • DCTDecode
  • JPXDecode
  • Crypt

This list is not so long, so why do I claim an almost limitless number of ways to encode a stream? I have 2 reasons:

  1. Many filters, like /FlateDecode, take parameters (in this case, the compression level), which influence the encoding too
  2. Filters can be cascaded, meaning that the stream has to be decoded by more than one filter

Here is our example, where the stream is encoded twice, first with ASCII85 and then with plain HEX (I know, this is rather pointless, but it yields simple and readable examples):

Cascading filters also inspired me to create a couple of test PDF documents. For example, I’ve created a 2642 bytes small PDF document that contains a 1GB large stream (a ZIP bomb of sorts). Some PDF readers will choke on this document.

Blog at WordPress.com.