Preface

The Portable Document Format (PDF) is the world’s leading page description language, and the first format equally useful for print and online use.

PDF documents are now almost ubiquitous in the printing industry, in document interchange, and in the online distribution of paginated content. They are, however, widely viewed as opaque and delicate and are poorly understood, even by those of a technical disposition.

This is partly due to a perplexing lack of documentation; the file format reference is freely available, but is of a size and complexity which requires a time investment unlikely to be plausible for the majority of those working with PDF.

This book aims to be an approachable introduction. It is suitable both for the technically minded, and for those who just want to understand a little of the PDF format to give context to their work with tools which produce or process PDF documents.

Who Should Read This Book

We’ve tried to write a book which serves as a general introduction, with some optional technical interludes, giving you the chance to type in example PDF files and see how they display.

This book is suitable for:

  • Adobe Acrobat users who want to understand the reasons behind the facilities it provides, rather than just how to use them. For example: encryption options, trim and crop boxes, and page labels.

  • Power users who want to use command-line software to process PDF documents in batches by merging, splitting, and optimizing them.

  • Programmers writing code to read, edit, or create PDF files.

  • Industry professionals in search, electronic publishing, and printing who want to understand how to use PDF’s metadata and workflow features to build coherent systems.

Organization of Contents

Chapter 1, Introduction

In this chapter, we give a history of the PDF format and put it into context. We look at the advantages PDF has over similar technologies, introduce specialized kinds of PDF files such as PDF/X and PDF/A, and take a brief tour of the elements which comprise a typical PDF document. We conclude by looking at how PDF is used in industry.

Chapter 2, Building a Simple PDF

We begin in earnest, building a simple PDF file from scratch in a text editor. We show how to process this into a fully valid PDF and open it in a PDF viewer. We explain each component of the file, taking our first look at various parts of the PDF syntax.

Chapter 3, File Structure

In this chapter, we describe the layout and content of a PDF file, and the syntax of the objects from which it is built. We describe how a PDF document is read from a flat file into a structured format and, conversely, written from that structured format to a flat file.

Chapter 4, Document Structure

In this chapter, we leave behind the bits and bytes of the PDF file, and consider the logical structure of its objects, describing how pages and their resources are arranged into a document.

Chapter 5, Graphics

We describe how to create vector graphics and raster images in PDF, and how to deal with transparency, color spaces, and patterns. We illustrate with examples, showing the code and the result in a PDF viewer.

Chapter 6, Text and Fonts

In this chapter, we look at the PDF operators for building and showing text strings using different fonts and sizes, and how to build lines and paragraphs. We describe the different types of fonts and encodings in PDF documents, and how they are defined and used. We look at the process of text extraction from a PDF document.

Chapter 7, Document Metadata and Navigation

Here, we discuss topics not directly related to the visual appearance of the document, but to ancillary data: bookmarks, metadata, hyperlinks, annotations, and file attachments. For each, we describe how they are defined in PDF and give examples.

Chapter 8, Encrypted Documents

We look at how encryption and document permissions work in PDF, and see how to inspect encryption information in Adobe Reader. We describe how programs which process PDF files read, write, and edit encrypted documents.

Chapter 9, Working with Pdftk

In this chapter, we show how to use the popular pdftk program for the command-line processing of PDF files, looking at common usage scenarios. We describe what a program such as pdftk has to do internally to achieve certain tasks (for example, merging or splitting documents).

Chapter 10, PDF Software and Documentation

Here, we describe both Adobe and open-source software for viewing, converting, editing, and programming with PDF files. We give sources of further documentation and other resources such as support and discussion forums.

Content Updates

May 22, 2012

  • Added an index

  • Corrections and clarifications to history of PDF in Chapter 1

  • Changed references to PDF 1.0 throughout to PDF 1.1, and to PDF 1.5 for transparency related files.

  • Fixed an incorrect line ending for OS X/Unix

  • Clarified language about string encodings

  • Clarified that ID strings should match for newly created PDFs

  • Added a comment about cross-reference invalidation

In addition, a few small errors have been corrected throughout the text.

Acknowledgments

I should like to thank my editor, Simon St.Laurent, who was enthusiastic about this project from the beginning. Leonard Rosenthol at Adobe provided valuable comments. Thanks are due to those readers who spotted mistakes in the first release and took the time to contact the author.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Caution

This icon indicates a warning or caution.

Obtaining Code Examples

All the example PDF files in this book are available for download in a zip archive from the O’Reilly website. The text of the book contains enough information to reconstruct these examples (with the exception of encrypted documents, which are not suitable for typing in manually).

The examples include the PDF source for the figures in this book.

Using Code Examples

This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “PDF Explained by John Whitington (O’Reilly). Copyright 2012 John Whitington, 978-1-449-31002-8.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

Safari® Books Online

Note

Safari Books Online is an on-demand digital library that lets you easily search over 7,500 technology and creative reference books and videos to find the answers you need quickly.

With a subscription, you can read any page and watch any video from our library online. Read books on your cell phone and mobile devices. Access new titles before they are available for print, and get exclusive access to manuscripts in development and post feedback for the authors. Copy and paste code samples, organize your favorites, download chapters, bookmark key sections, create notes, print out pages, and benefit from tons of other time-saving features.

O’Reilly Media has uploaded this book to the Safari Books Online service. To have full digital access to this book and others on similar topics from O’Reilly and other publishers, sign up for free at http://my.safaribooksonline.com.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at:

http://oreil.ly/pdf-explained

To comment or ask technical questions about this book, send email to:

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

Get PDF Explained now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.