Chapter 8. Embedded Files

This chapter explains how a PDF can be used as a container for other files, much as a ZIP file can, while still providing rich page content to accompany them.

In most cases, file formats (such as .docx or .xslx) will be converted into PDF for distribution. However, sometimes it can be useful to have the original file as well. Unfortunately, there is a good chance that the two files will become disconnected, so having a way to embed or attach the original inside of the PDF is a useful capability. Additionally, you might choose to embed other files related to the PDF that aren’t the actual content, such as XML data.

For these reasons and more, PDF supports the ability to embed other files inside of itself and then have them presented in the UI of the PDF viewer.

File Specifications

At the heart of embedding files is the file specification dictionary. This dictionary actually supports both embedded and referenced files, but we will focus strictly on the embedded form (see Figure 8-1). In order to ensure that the dictionary can be identified, it must contain a Type key whose value is Filespec. Additionally, there must be three other keys present in the dictionary: F, UF, and EF (see Example 8-1 for a sample).

The F key contains the name of the file in a special encoding specific to file specification strings (ISO 32000-1:2008, 7.11.2), which is the “standard encoding for the platform on which the document is being viewed.” For most modern operating systems, that’s ...

Get Developing with PDF now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.