Chapter 1. PDF Syntax
We’ll begin our exploration of PDF by diving right into the building blocks of the PDF file format. Using these blocks, you’ll see how a PDF is constructed to lead to the page-based format that you are familiar with.
PDF Objects
The core part of a PDF file is a collection of “things” that the PDF standard (ISO 32000) refers to as objects, or sometimes COS objects.
Note
COS stands for Carousel Object System and refers to the original/code name for Adobe’s Acrobat product.
These aren’t objects in the “object-oriented programming” sense of the word; instead, they are the building blocks on which PDF stands. There are nine types of objects: null, Boolean, integer, real, name, string, array, dictionary, and stream.
Let’s look at each of these object types and how they are serialized into a PDF file. From there, you’ll then see how to take these object types and use them to build higher-level constructs and the PDF format itself.
Null Objects
The null object, if actually written to a file, is simply the four characters null. It is synonymous with a missing value, which is why it’s extremely rare to see one in a PDF. If you have reason to work with the null value, be sure to consult ISO 32000 carefully about the subtleties involving its handling.
Boolean Objects
Boolean objects represent the logical values of true and false and are represented accordingly in the PDF, either as true
or false
.
Note
When writing a PDF, you will always use true
or false
. However, if you are reading/parsing ...