BUY THIS BOOK
Add to Cart

Print Book $24.95


Add to Cart

Print+PDF $32.44

Add to Cart

PDF $19.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £17.50

What is this?

Looking to Reprint or License this content?


PDF Hacks
PDF Hacks 100 Industrial-Strength Tips & Tools

By Sid Steward
Book Price: $24.95 USD
£17.50 GBP
PDF Price: $19.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Consuming PDF
Most people experience PDF as a document they must read or print. Adobe Reader and Adobe Acrobat are the most common tools for consuming PDF, but other tools provide their own distinctive features. First, we will look into the most popular PDF readers, and then we will discuss ways you can improve your PDF reading experience.
Use Adobe's Acrobat Reader, renamed Adobe Reader in its latest release, to read PDF files on the Web and elsewhere.
Lots of web sites that use PDF files include a Get Adobe Reader icon along with the PDF files. Whether you're running Windows, Mac OS X, Mac OS 7.5.3 or later, Linux, Solaris, AIX, HP-UX, OS/2, Symbian OS, Palm OS, or a Pocket PC, Adobe has a reader for your platform. (Different platforms are frequently at different versions and have different capabilities, but they all can provide basic PDF-reading functionality.)
To get your free reader, visit http://www.adobe.com/products/acrobat/readstep2.html. You'll need to choose a language, platform, and connection speed, and then a second field showing your download options will appear. Each version has slightly different installation instructions, but when you're done you'll have either the Adobe Reader or Adobe Acrobat Reader installed. The installer will also integrate Reader with your web browser or browsers, if appropriate.
Depending on your needs, newer isn't always better. If you want an older version of Acrobat Reader, visit http://www.adobe.com/products/acrobat/reader_archive.html.
Once Reader is installed, clicking web site links to PDFs will bring up a reader that enables you to view the PDFs, typically inside the browser window itself. You can also open PDFs on your local filesystem by selecting File Open . . . , or by opening them through your GUI environment as usual, typically by double-clicking. Figure 1-1 shows a document as seen through Acrobat Reader running in a web browser, and Figure 1-2 shows the same document through Acrobat Reader running as a separate application.
Figure 1-1: Viewing a PDF document through Acrobat Reader in the browser
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introduction: Hacks #1-14
Most people experience PDF as a document they must read or print. Adobe Reader and Adobe Acrobat are the most common tools for consuming PDF, but other tools provide their own distinctive features. First, we will look into the most popular PDF readers, and then we will discuss ways you can improve your PDF reading experience.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Read PDFs with the Adobe Reader
Use Adobe's Acrobat Reader, renamed Adobe Reader in its latest release, to read PDF files on the Web and elsewhere.
Lots of web sites that use PDF files include a Get Adobe Reader icon along with the PDF files. Whether you're running Windows, Mac OS X, Mac OS 7.5.3 or later, Linux, Solaris, AIX, HP-UX, OS/2, Symbian OS, Palm OS, or a Pocket PC, Adobe has a reader for your platform. (Different platforms are frequently at different versions and have different capabilities, but they all can provide basic PDF-reading functionality.)
To get your free reader, visit http://www.adobe.com/products/acrobat/readstep2.html. You'll need to choose a language, platform, and connection speed, and then a second field showing your download options will appear. Each version has slightly different installation instructions, but when you're done you'll have either the Adobe Reader or Adobe Acrobat Reader installed. The installer will also integrate Reader with your web browser or browsers, if appropriate.
Depending on your needs, newer isn't always better. If you want an older version of Acrobat Reader, visit http://www.adobe.com/products/acrobat/reader_archive.html.
Once Reader is installed, clicking web site links to PDFs will bring up a reader that enables you to view the PDFs, typically inside the browser window itself. You can also open PDFs on your local filesystem by selecting File Open . . . , or by opening them through your GUI environment as usual, typically by double-clicking. Figure 1-1 shows a document as seen through Acrobat Reader running in a web browser, and Figure 1-2 shows the same document through Acrobat Reader running as a separate application.
Figure 1-1: Viewing a PDF document through Acrobat Reader in the browser
Figure 1-2: Viewing a PDF document through Acrobat Reader running as a separate application
As with any GUI application, you can scroll around the document, and Acrobat provides zoom options (the magnifying glass and the zoom percentage box in the toolbar), print options (the printer), search options (the binoculars), and navigation options (the arrows in the toolbar, as well as the Show/Hide Navigation Pane button to the left of the arrows that enables you to see bookmarks, if any are provided by the document's creator). Unlike the commercial Acrobat applications, Reader doesn't provide means for creating or modifying PDF documents.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Read PDFs with Mac OS X's Preview
If you have a Macintosh running OS X, the operating system includes a Preview application that enables you to look at PDFs without downloading Acrobat Reader.
Apple's latest operating system, Mac OS X, uses PDF all over. Icons and other pieces of applications are PDFs, the rendering system is tied closely to the data model used by PDFs, and any application that can print can also produce PDFs. Given this fondness for PDF, it makes sense that the Preview application Apple provides for examining the contents many different file types also supports PDF.
The Preview application is installed on Macs at Macintosh HD:Applications:Preview. It reads a variety of graphics formats, including JPEG, TIFF, and GIF, as well as (of course) PDF. You can open PDFs in Preview by selecting File Open . . . , by dragging their icons to the Preview application, or (if Acrobat isn't installed) by double-clicking. An open PDF in Preview looks like Figure 1-3.
Figure 1-3: Viewing a PDF document through Mac OS X's Preview application
Preview's overall interface is much simpler than the Acrobat Reader's interface, though the options are friendly and clear. Preview also creates thumbnail images of pages, which is convenient for quick navigation. Preview also supports the PDF-creation functionality built into Mac OS X [Hack #40] .
Also, Preview's File Export . . . command enables you to save the PDFs or graphics you're examining in any of a variety of PDF formats. If you need to convert a JPEG to a PDF file, or a PDF to a TIFF file, it's a convenient option. (It's also worth noting that screenshots taken using Mac OS X's Command-Shift-3 or Command-Shift-4 options are saved to the desktop as PDFs. Those PDFs contain bitmaps, much as if they were created as TIFFs and exported to PDF through Preview.)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Read PDFs with Ghostscript's GSview
The Ghostscript toolkit for working with PostScript and PDF supports a number of simple viewers, including GSview.
The Ghostscript set of tools (http://www.cs.wisc.edu/~ghost/) is an alternative to a number of Adobe products. At its heart is a PostScript processor, which also works on PDF files.
PostScript is both an ancestor of PDF and a complement to it. PostScript is a programming language focused on describing how pages should be printed, while PDF is more descriptive. You can convert from PostScript to PDF and back. Many printers and typesetting systems handle PostScript, while PDF is more commonly used as a format for exchange between computers.
Although typically you run Ghostscript from the command line or you integrate it with other processes, you can also use it as the rendering engine inside a number of viewers. Ghostview and GV support Unix and VMS, while MacGSview is a viewer for the Macintosh and GSview supports Windows, OS/2, and Linux. You'll need to install Ghostscript [Hack #39] before you install GSview. Once GSview is installed, it can open PostScript, Encapsulated PostScript (EPS), and, of course, PDF, as shown in Figure 1-4.
Figure 1-4: Viewing a PDF document through GSview
GSview doesn't provide a lot of bells and whistles. The toolbar across the top offers basic navigation, zoom, and search (the eyes). If you explore the menus, however, you'll find lots of PostScript-oriented utility functions. GSView is a useful tool if you need to work with PostScript and EPS files generally, because it lets you explore these files just as if they were PDFs. GSView is also a useful tool if you have a file that's misbehaving, because it provides a fair amount of detail about errors in PostScript and PDF handling. For many users, it's too stripped down to be useful, but what it lacks in chrome it has in power.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Speed Up Acrobat Startup
Move the plug-ins you don't need out of your way.
Both Adobe Acrobat and Reader implement several standard features as modular application plug-ins . These plug-ins are loaded when Acrobat starts up. You can speed up Acrobat startup and clean up its menus by telling Acrobat to load only the features you desire.
One simple technique is to hold down the Shift key when launching Acrobat; this prevents all plug-ins from loading. A longer-term solution is to move unwanted plug-ins to another, inert directory where the startup loader won't find them. Another solution is to create plug-in profiles [Hack #5] that are switched using a batch file gateway. This latter solution becomes really useful when combined with context menu hacks [Hack #6] .
Keep in mind that omitting plug-ins will alter how some PDFs interact with you. If a PDF seems to be malfunctioning, try viewing it with the full complement of Adobe's stock plug-ins installed.
Acrobat (or Reader) loads its plug-ins only once, when the application starts. On Windows, it scans a specific directory and tries to interface with specific files, recursing into subdirectories as it goes. This directory is named plug_ins and it usually lives someplace such as:
C:\Program Files\Adobe\Acrobat 6.0\Acrobat\plug_ins\
or:
C:\Program Files\Adobe\Acrobat 6.0\Reader\plug_ins\
On Windows, plug-in files are named *.api , but they are really DLLs [Hack #97] .
On the Macintosh, plug-ins are stored inside the Acrobat package. Control-click (or right-click, if you have a two-button mouse) the icon for Acrobat, and choose Show Package Contents from the menu. A window with a folder named Contents will appear. Inside that folder is another folder called Plug-ins, which contains the Macintosh version of the same plug-ins. These have names like Checkers.acroplugin.
Create a directory called plug_ins.unplugged in the same directory or folder where plug_ins (or Plug-ins) lives so that they are siblings. To prevent a plug-in from being loaded, simply move it from
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Manage Acrobat Plug-Ins with Profiles on Windows
If you use Acrobat for several purposes, create several profiles.
If you use Acrobat for many different tasks, you probably need different plug-ins at different times. If you use third-party or custom plug-ins [Hack #97] , it can become essential to distinguish the "production workflow" Acrobat from the "plug-in beta testing" Acrobat from the "on-screen reading" Acrobat. We can do that.
In the same directory as your plug_ins folder [Hack #4] , create one folder for each profile, naming it like this: plug_ins.profile_name. For example, a production profile might have the folder name plug_ins.production. Copy the desired plug-ins into each profile folder; you can copy a plug-in into one or more folders. The plug_ins folder will be your default profile.
Copy the following code into a text file called C:\switchboard.bat . Edit its path to Acrobat.exe to suit your configuration. This batch file takes two arguments: the name of the desired profile and, optionally, a PDF filename. Following our previous example, launch Acrobat under the production profile by invoking C:\switchboard.bat production.
:: switchboard.bat, version 1.0
:: visit: http://www.pdfhacks.com/switchboard/
::
:: switch the Acrobat plug_ins directory according to the first argument;
:: the second argument can be a PDF filename to open; we assume that the
:: second argument has been quoted for us, if necessary
:: 
:: change into the directory with Acrobat.exe and plug_ins
@echo off
echo Acrobat Plug-In Switchboard Activated
echo  ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 
echo Do not close this command session;
echo it will close automatically after Acrobat is closed.
cd /D "c:\program files\adobe\acrobat 6.0\acrobat\"
if exist plug_ins.on_hold goto BUSY
if not exist "plug_ins.%1" goto NOSUCHNUMBER
:: make the switch
rename plug_ins plug_ins.on_hold
rename "plug_ins.%1" plug_ins
Acrobat.exe %2
:: switch back
rename plug_ins "plug_ins.%1"
rename plug_ins.on_hold plug_ins
goto DONE
:BUSY
@echo off
echo NOTE-
echo Acrobat is already running with a switched plug_ins directory.
Acrobat.exe %2
goto DONE
:NOSUCHNUMBER
@echo off
echo ERROR-
echo The argument you passed to switchboard.bat does not match
echo a custom plug_ins directory, at least not where I am looking.
Acrobat.exe %2
:DONE
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Open PDF Files Your Way on Windows
Multipurpose PDF defies the double-click, so make a right-click compromise. Impose your will on Internet Explorer or Mozilla with a registry hack.
In Windows, you double-click to open a PDF in the default viewer. But what if you have a couple of different Acrobat profiles from [Hack #5] ? Or maybe you want a quick way to open a PDF inside your web browser [Hack #9] ? Add these file-open options to the context menu that appears when you right-click a PDF file. You can even configure Windows to use one of these options when double-clicking a PDF file. Convincing web browsers to open PDFs your way takes a little more work.
Windows XP and 2000 offer a convenient way to open a PDF file using an alternative application. Right-click your PDF file and select Open With from the context menu. A submenu will open with a variety of alternatives. Your options might include Illustrator and Photoshop, for example.
In [Hack #5] we used a batch program to switch between named Acrobat profiles. You can add these profiles to your PDF context menu, too. In the steps that follow, substitute your profile's name for production.
Windows XP and 2000:
  1. In the Windows File Explorer menu, select Tools Folder Options . . . and click the File Types tab. Select the Adobe Acrobat Document (PDF) file type and click the Advanced button.
  2. Click the New . . . button and a New Action dialog appears. Give the new action a name: Acrobat: production.
  3. Give the action an application to open by clicking the Browse . . . button and selecting cmd.exe, which lives somewhere such as C:\windows\system32\ or C:\winnt\system32\.
  4. Add these arguments after cmd.exe, changing the paths to suit, so it looks like this:
    C:\windows\system32\cmd.exe /C c:\switchboard.bat production "%1"
                      
  5. Click OK, OK, OK and you are done.
Windows 98:
  1. In the Windows File Explorer menu, select Tools Folder Options . . . and click the File Types tab. Select the Adobe Acrobat Document (PDF) file type and click the Edit . . . button.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Copy Data from PDF Pages
Extract data from PDF files and use it in your own documents or spreadsheets.
Copying data from one electronic document to paste into another should be painless and predictable, such as the process depicted in Figure 1-7. Trying to copy data from a PDF, however, can be frustrating. The solution for Acrobat 6 and Adobe Reader users (on Windows, anyway) comes from an unlikely source: Acrobat 5.
Figure 1-7: TAPS faithfully copying formatted text and tables using Acrobat or Reader
Acrobat 5 includes the excellent TAPS text/table selection plug-in. Acrobat 6 does not. Because Acrobat plug-ins are modular, you can copy the TAPS folder (named Table) from the Acrobat 5 plug_ins folder [Hack #4] and paste it into the Acrobat 6 plug_ins folder. Voilà! Don't have Acrobat 5? The TAPS license permits liberal distribution, so visit http://www.pdfhacks.com/TAPS/ to view the license and download a copy. Don't have Acrobat 6, either? Use Adobe Reader instead. TAPS works in both Acrobat and Reader. Who would have guessed?
Adobe Reader gives you a single, simple Text Select tool that works well on single lines of text but not on tables or paragraphs. Sometimes it selects more text than you want. For greater control, hold down the Alt key (Version 6) or the Ctrl key (Version 5) and drag out a selection rectangle. Multiline paragraphs copied with this tool do not preserve their flow. Pasted into Word, each line is a single paragraph. Yuck!
You need the TAPS plug-in, which copies paragraphs and tables with fidelity. Copy the entire Table folder from your Acrobat 5 plug-ins directory (e.g., C:\Program Files\Adobe\Acrobat 5.0\Acrobat\plug_ins\Table) into your Reader plug-ins directory (e.g., C:\Program Files\Adobe\Acrobat 6.0\Reader\plug_ins). Restart Reader.
If you don't have Acrobat 5, visit http://www.pdfhacks.com/TAPS/ and download Acrobat_5_TAPS.zip. Unzip, and then move the resulting TAPS folder into your Reader plug_ins directory. Restart Reader. You'll now have the Table/Formatted Text Select Tool, as shown in Figure 1-8.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Convert PDF Documents to Word
Automatically scrape clipboard data into a new Word document.
In general, PDFs aren't as smart as they appear. Unless they are tagged [Hack #34] , they have no concept of paragraph, table, or column. This becomes a problem only when you must create a new document using material from an old document. Ideally, you would use the old document's source file, or maybe even its HTML edition. This isn't always possible, however. Sometimes you have only a PDF to work with.
Adobe Acrobat 6 enables you to convert your PDF to many different formats with the Save As . . . dialog. These filters work best when the PDF is tagged. Try one to see if it suits your requirements. Adobe Reader enables you to convert your PDF to text by selecting File Save As Text . . . .
If your PDF is not tagged, Acrobat uses an inference engine to assemble the letters into words and the words into paragraphs. It tries to detect and create tables. It works best on documents with very simple formatting. Tables and formatted pages generally don't survive.
Fully automatic conversion of PDF to a structured format such as Word's DOC is not generally possible because the problem is too big. One workaround is to break the problem down to the point where the automation has a chance. The TAPS tool [Hack #7] works well because you meet the automation halfway. You tell it where the table is and it creates a table from the given data. This approach can be scaled to fit the larger problem of converting entire documents.
Copy/Paste works fine for a few items, but it grows cumbersome when processing several pages of data. AutoPasteLoop is a Word macro that watches the clipboard for new data and then immediately pastes it into your new document. Instead of copy/paste, copy/paste, copy/paste, you can just copy, copy, copy. Word automatically pastes, pastes, pastes.
Scott Tupaj has ported AutoPasteLoop to OpenOffice. Download the code from http://www.pdfhacks.com/autopaste/.
Create a new Word macro named
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Browse One PDF in Multiple Windows
Tear off pages and leave them on your desktop for reference as you continue reading.
Both Adobe Reader and Acrobat confine us to a linear view of documents. Often, for instance, page 17 of a file contains a table I would like to consult as I read page 19, and Acrobat makes this difficult. Here are a couple ways to open one PDF document in many windows, as shown in Figure 1-9. These tricks work with both Acrobat and the free Reader.
Figure 1-9: Using your favorite web browser to open one PDF document in many windows
One quick solution is to read the PDF from within your web browser. When you open a new browser window (or Mozilla tab), it will duplicate your current PDF view, giving you two views of the same document.
This works in Internet Explorer by default. Mozilla requires a little configuration. In Mozilla, select Edit Preferences . . . Navigator. On the right, find the Display On section and note its adjacent drop-down box. Set Display on New Window and Display on New Tab to Last Page Visited, as shown in Figure 1-10. Click OK. You must restart Mozilla before these changes take effect.
Figure 1-10: Configuring Mozilla to show the current document in newly opened windows and tabs
Drag-and-drop a PDF into your browser to open the PDF. Acrobat/Reader should display the PDF inside the browser. Select File New . . . Window (or File New . . . Tab) from the browser menu and you'll have two views into your one PDF.
While you're viewing a PDF file in your browser, the browser hot keys won't work if Acrobat has the input focus. You will need to create new windows or tabs using the browser menu.
If trying to open a PDF inside your browser causes it to open inside of Acrobat/Reader instead, check these settings (Windows only):
Acrobat/Reader 6
Select Edit Preferences . . . Internet. Under Web Browser Options, check the Display PDF in Browser checkbox.
Acrobat/Reader 5
Select Edit Preferences . . . General . . . Options. Under Web Browser Options, check the Display PDF in Browser checkbox.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Pace Your Reading or Present a Slideshow in Acrobat or Reader
You can make Acrobat or Reader advance a document at a preset interval, making it easy to maintain a given reading pace or to present slides.
If you are sitting down for a long, on-screen read, consider adding this "cruise control" feature to Acrobat/Reader. It turns PDF pages at an adjustable pace. Acrobat and Reader already have a similar "slideshow" feature, but it works only when viewing PDFs in Full Screen mode.
In Acrobat or Reader 6.0, also try the View Automatically Scroll feature. It smoothly scrolls the pages across the screen.
If you have a PDF photo album [Hack #48] or slideshow presentation, you can configure Acrobat/Reader to automatically advance through the pages at a timed pace. Select Edit Preferences . . . General . . . Full Screen (Acrobat/Reader 6 Windows) or Edit Preferences . . . Full Screen (Acrobat/Reader 5 Windows) or Acrobat Preferences . . . Full Screen (Acrobat/Reader 6 Macintosh). Set the page advance, looping, and navigation options as shown in Figure 1-11, and click OK. Open your PDF, select Window Full Screen View (Acrobat/Reader 6 for Windows or Macintosh) or View Full Screen (Acrobat/Reader 5), and the slideshow begins. To exit Full Screen mode, press Ctrl-L (Windows) or Command-L (Mac).
Figure 1-11: Configuring Acrobat/Reader's Full Screen mode to show slides
You can also use this slideshow feature as a "cruise control" for on-screen reading. However, the Full Screen mode hides document bookmarks and application menus, and adjusting its timing is a multistep burden.
The following JavaScript for Acrobat and Reader provides a more flexible page turner. You can run it outside of Full Screen mode, and its timing is easier to adjust.
Visit http://www.pdfhacks.com/page_turner/ to download the JavaScript in Example 1-3. Unzip it, and then copy it into your Acrobat or Reader JavaScripts directory. [Hack #96] explains where to find this directory on your platform. Restart Acrobat/Reader, and
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Pace Your Reading or Present a Slideshow in Mac OS X Preview
Turn your Mac into a big, beautiful e-book reader, thanks to the wonders of Preview.
It likely comes as no big news to you that you can open images of various flavors and PDFs in Preview (Applications Preview). But it never fails to surprise people that they've somehow managed to overlook the fact that you can hop into Full Screen mode (View Full Screen) and view these images and pages without all the clutter of anything else you happen to have open to distract you from their stunning Quartz-rendered visage.
Just as iDVD's Full Screen mode transforms a Mac into a little movie theater, so too does Preview's Full Screen view turn your 23-inch Apple Cinema Display—or, more likely, your iBook's 12-inch screen—into a rather nice e-book, as shown in Figure 1-12.
Figure 1-12: Cory Doctorow's Eastern Standard Tribe (available from http://craphound.com/est/ under a Creative Commons License), viewed in Full Screen mode in Preview
Flip forward page by page with a click of your mouse or rap on your spacebar. The Page Up, Page Down, and arrow keys move you forward and backward, while Home takes you to the first page and End to (surprise!) the end of the document.
If you switch to another application by using the basic Application Switcher (Command-Tab) and then switch back to Preview, you'll be right back in Full Screen mode. Hit the Esc key to return to normal, fully cluttered view.
It gets even better for iBook and PowerBook owners. This newfound ability to use your Mac as an electronic book means being able to tote about the Library of Alexandria—or at least what's available in Project Gutenberg (http://www.gutenberg.net)—without adding an ounce to your load.
If your PDF is formatted (as most are) in standard page layout, rotate it left or right (View Rotate Left or View Rotate Right) just before going full screen and hold your laptop on its side as if it were actually a book—a book with a keyboard, admittedly. Sit back, take a sip of tea, and catch up with Ms. Austen and life at Mansfield Park.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Unpack PDF Attachments (Even Without Acrobat)
Save attachments to your disk, where you can use them.
Authors sometimes supplement their documents with additional electronic resources. For example, a document that displays large tables of data might also provide the reader with a matching Excel spreadsheet to work with. PDF's file attachment feature is an open-ended mechanism for packing any electronic file into a PDF like this. As discussed in [Hack #54] , these attachments can be associated with the overall document or with individual pages. You can unpack PDF attachments to your disk using Acrobat, Reader, or our pdftk [Hack #79] . After unpacking an attachment, you can view and manipulate it independently from the PDF document.
In Acrobat/Reader 6, you can view and access all PDF attachments by selecting Document File Attachments . . . . Select the desired attachment and click Export . . . to save it to disk.
In Acrobat 5, you can view and access a document's page attachments using the Comments tab. Open this tab by selecting Window Comments. Select the attachments you desire to unpack, click the Comments button, and choose Export Selected . . . from the drop-down menu. View and access document attachments in Acrobat 5 by selecting File Document Properties Embedded Data Objects . . . .
Reader 5 and earlier versions do not enable you to unpack attachments.
pdftk simply unpacks all PDF attachments into the current directory. Future versions might introduce more control. For now, invoke it like this:
               pdftk   
                unpack_files
            
If the PDF is encrypted, you must supply a password, too:
               pdftk   
                input_pw   
                unpack_files
            
Unpacking a PDF's attachments does not remove them from the PDF. You can always unpack them again later.
Dispense with the command line [Hack #56] to create a quick right-click action for unpacking a PDF with pdftk on Windows.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Jump to the Next or Previous Heading
Use PDF bookmark information to stride from section to section in Acrobat on Windows.
PDF bookmarks greatly improve document navigation, but they also have their annoyances. When I click a bookmark in Acrobat, shown in Figure 1-13, the document loses input focus. Pressing arrow keys or Page Up and Page Down has no effect on the document until I click the document page. That makes two clicks, and clicking two times to visit one bookmarked page is annoying.
Figure 1-13: A no-click solution to annoying bookmark behavior
So, I created a "no-click" solution for navigating bookmarks. After installing this Acrobat plug-in, you can jump from bookmarked page to bookmarked page by holding down the Shift key and pressing the left and right arrow keys.
Visit http://www.pdfhacks.com/jumpsection/ and download jumpsection-1.0.zip. Unzip, and then move jumpsection.api to your Adobe Acrobat plug-ins directory. This directory is located somewhere such as C:\Program Files\Adobe\Acrobat 5.0\Acrobat\plug_ins\.
Restart Acrobat, open a bookmarked PDF, and give it a try. Hold down the Shift key and press the right and left arrow keys to jump forward and back.
[Hack #97] uses jumpsection as an example of customizing Acrobat with plug-ins. jumpsection does not work with the free Adobe Reader.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Navigate and Manipulate PDF Using Page Thumbnails
Acrobat's thumbnail view pane has some useful, unexpected features for reorganizing or jumping through your documents.
At first glance, the Acrobat Pages (Acrobat 6) or Thumbnails (Acrobat 5) pane might seem like a cute but unnecessary view into your PDF files. In fact, it is not a passive view, but an interactive easel with features not available anywhere else.
As you widen this pane, more thumbnails become visible and they organize themselves into rows and columns. The nearby Options (Acrobat 6) or Thumbnail (Acrobat 5) button opens a menu where you can change the thumbnail size. Acrobat 6 enables you to enlarge or reduce thumbnails as you desire. Acrobat 5 enables you to choose between small and large thumbnail sizes.
If the Acrobat 6 thumbnails appear grainy as you enlarge them, choose Remove Embedded Thumbnails from the Options menu. This forces Acrobat to render pages on the fly, as shown in Figure 1-14.
Figure 1-14: Large thumbnails showing more detail
If the thumbnails seem to display too slowly, try selecting Embed All Page Thumbnails from the Options (or Thumbnail) menu. Acrobat will store the thumbnail images into the PDF file. You can always undo this by selecting Remove Embedded Thumbnails.
Your current PDF page view, on the right, is represented by a red box in the thumbnail pane. You can resize this box or grab its edge to move it around. Manipulate this box to manipulate the current PDF page view. Click any thumbnail to view that page.
Invoked from the menu, most Acrobat features operate on one page or a contiguous range of pages. In the thumbnail pane, you can select the exact pages you want to print or modify. Click and drag out a rectangle to select a group of pages. Hold down the Ctrl key (Shift on the Macintosh) while clicking single pages to add or remove them to your selection. When your selection is complete, right-click one of your selected pages to see a menu of possible page operations.
To select all pages in the thumbnail view, you must first select one page, then click Select All.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Managing a Collection
While you'll often work with individual PDF files, documents have a way of accumulating. As your collection of PDF files grows, finding things in that collection often becomes more difficult. These hacks will show you ways to work with groups of documents, adding features and creating supporting frameworks for managing multiple documents.
Create and maintain a list of PDF pages for rapid access.
Web browsers enable you to bookmark HTML pages, so why doesn't Adobe Reader enable you to bookmark PDF pages? Here is a JavaScript that extends Reader so that it can create bookmarks to specific PDF pages. It works on Windows, Mac, and Linux.
The bookmarks created by this JavaScript aren't PDF bookmarks that get saved with the document. They behave more like web browser bookmarks in that they enable you to quickly return to a specific PDF page.
Visit http://www.pdfhacks.com/bookmark_page/ to download the JavaScript in Example 2-1. Unzip it, and then copy it into your Acrobat or Reader JavaScripts directory. [Hack #96] explains where to find this directory on your platform. Restart Acrobat/Reader, and bookmark_page.js will add new items to your View menu.
Example 2-1. Adding bookmark functionality to Acrobat and Adobe Reader
// bookmark_page.js, ver. 1.0
// visit: http://www.pdfhacks.com/bookmark_page/

// use this delimiter for serializing our array
var bp_delim= '%#%#';

function SaveData( data ) {
  // data is an array of arrays that needs
  // to be serialized and stored into a persistent
  // global string
  var ds= '';
  for( ii= 0; ii< data.length; ++ii ) {
    for( jj= 0; jj< 3; ++jj ) {
      if( ii!= 0 || jj!= 0 )
        ds+= bp_delim;
      ds+= data[ii][jj];
    }
  }
  global.pdf_hacks_js_bookmarks= ds;
  global.setPersistent( "pdf_hacks_js_bookmarks", true );
}

function GetData( ) {
  // reverse of SaveData; return an array of arrays
  if( global.pdf_hacks_js_bookmarks== null ) {
    return new Array(0);
  }

  var flat= global.pdf_hacks_js_bookmarks.split( bp_delim );
  var data= new Array( );
  for( ii= 0; ii< flat.length; ) {
    var record= new Array( );
    for( jj= 0; jj< 3 && ii< flat.length; ++ii, ++jj ) {
      record.push( flat[ii] );
    }
    if( record.length== 3 ) {
      data.push( record );
    }
  }
  return data;
}

function AddBookmark( ) {
  // query the user for a name, and then combine it with
  // the current PDF page to create a record; store this record
  var label= 
    app.response( "Bookmark Name:",
                  "Bookmark Name",
                  "",
                  false );
  if( label!= null ) {
    var record= new Array(3);
    record[0]= label;
    record[1]= this.path;
    record[2]= this.pageNum;

    data= GetData( );
    data.push( record );
    SaveData( data );
  }
}

function ShowBookmarks( ) {
  // show a pop-up menu; this seems to work only when
  // a PDF is already in the viewer;
  var data= GetData( );
  var items= '';
  for( ii= 0; ii< data.length; ++ii ) {
    if( ii!= 0 )
      items+= ', ';
    items+= '"'+ ii+ ': '+ data[ii][0]+ '"';
  }
  // assemble the command and then execute it with eval( )
  var command= 'app.popUpMenu( '+ items+ ' );';
  var selection= eval( command );
  if( selection== null ) {
    return; // exit
  }

  // the user made a selection; parse out its index and use it
  // to access the bookmark record
  var index= 0;
  // toString( ) converts the String object to a string literal
  // eval( ) converts the string literal to a number
  index= eval( selection.substring( 0, selection.indexOf(':') ).toString( ) );
  if( index< data.length ) {
    try {
      // the document must be 'disclosed' for us to have any access
      // to its properties, so we use these FirstPage NextPage calls
      //
      app.openDoc( data[index][1] );
      app.execMenuItem( "FirstPage" );
      for( ii= 0; ii< data[index][2]; ++ii ) {
        app.execMenuItem( "NextPage" );
      }
    }
    catch( ee ) {
      var response= 
        app.alert("Error trying to open the requested document.\nShould 
I remove this bookmark?", 2, 2);
      if( response== 4 && index< data.length ) {
        data.splice( index, 1 );
        SaveData( data );
      }
    }
  }
}

function DropBookmark( ) {
  // modeled after ShowBookmarks( )
  var data= GetData( );
  var items= '';
  for( ii= 0; ii< data.length; ++ii ) {
    if( ii!= 0 )
      items+= ', ';
    items+= '"'+ ii+ ': '+ data[ii][0]+ '"';
  }
  var command= 'app.popUpMenu( '+ items+ ' );';
  var selection= eval( command );
  if( selection== null ) {
    return; // exit
  }

  var index= 0;
  index= eval( selection.substring( 0, selection.indexOf(':') ).toString( ) );
  if( index< data.length ) {
    data.splice( index, 1 );
    SaveData( data );
  }
}

function ClearBookmarks( ) {
  if( app.alert("Are you sure you want to erase all bookmarks?", 2, 2 )== 4 ) {
    SaveData( new Array(0) );
  }
}

app.addMenuItem( {
cName: "-",              // menu divider
cParent: "View",         // append to the View menu
cExec: "void(0);" } );

app.addMenuItem( {
cName: "Bookmark This Page &5",
cParent: "View",
cExec: "AddBookmark( );",
cEnable: "event.rc= (event.target != null);" } );

app.addMenuItem( {
cName: "Go To Bookmark &6",
cParent: "View",
cExec: "ShowBookmarks( );",
cEnable: "event.rc= (event.target != null);" } );

app.addMenuItem( {
cName: "Remove a Bookmark",
cParent: "View",
cExec: "DropBookmark( );",
cEnable: "event.rc= (event.target != null);" } );

app.addMenuItem( {
cName: "Clear Bookmarks",
cParent: "View",
cExec: "ClearBookmarks( );",
cEnable: "event.rc= true;" } );
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Introduction: Hacks #15-23
While you'll often work with individual PDF files, documents have a way of accumulating. As your collection of PDF files grows, finding things in that collection often becomes more difficult. These hacks will show you ways to work with groups of documents, adding features and creating supporting frameworks for managing multiple documents.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Bookmark PDF Pages in Reader
Create and maintain a list of PDF pages for rapid access.
Web browsers enable you to bookmark HTML pages, so why doesn't Adobe Reader enable you to bookmark PDF pages? Here is a JavaScript that extends Reader so that it can create bookmarks to specific PDF pages. It works on Windows, Mac, and Linux.
The bookmarks created by this JavaScript aren't PDF bookmarks that get saved with the document. They behave more like web browser bookmarks in that they enable you to quickly return to a specific PDF page.
Visit http://www.pdfhacks.com/bookmark_page/ to download the JavaScript in Example 2-1. Unzip it, and then copy it into your Acrobat or Reader JavaScripts directory. [Hack #96] explains where to find this directory on your platform. Restart Acrobat/Reader, and bookmark_page.js will add new items to your View menu.
Example 2-1. Adding bookmark functionality to Acrobat and Adobe Reader
// bookmark_page.js, ver. 1.0
// visit: http://www.pdfhacks.com/bookmark_page/

// use this delimiter for serializing our array
var bp_delim= '%#%#';

function SaveData( data ) {
  // data is an array of arrays that needs
  // to be serialized and stored into a persistent
  // global string
  var ds= '';
  for( ii= 0; ii< data.length; ++ii ) {
    for( jj= 0; jj< 3; ++jj ) {
      if( ii!= 0 || jj!= 0 )
        ds+= bp_delim;
      ds+= data[ii][jj];
    }
  }
  global.pdf_hacks_js_bookmarks= ds;
  global.setPersistent( "pdf_hacks_js_bookmarks", true );
}

function GetData( ) {
  // reverse of SaveData; return an array of arrays
  if( global.pdf_hacks_js_bookmarks== null ) {
    return new Array(0);
  }

  var flat= global.pdf_hacks_js_bookmarks.split( bp_delim );
  var data= new Array( );
  for( ii= 0; ii< flat.length; ) {
    var record= new Array( );
    for( jj= 0; jj< 3 && ii< flat.length; ++ii, ++jj ) {
      record.push( flat[ii] );
    }
    if( record.length== 3 ) {
      data.push( record );
    }
  }
  return data;
}

function AddBookmark( ) {
  // query the user for a name, and then combine it with
  // the current PDF page to create a record; store this record
  var label= 
    app.response( "Bookmark Name:",
                  "Bookmark Name",
                  "",
                  false );
  if( label!= null ) {
    var record= new Array(3);
    record[0]= label;
    record[1]= this.path;
    record[2]= this.pageNum;

    data= GetData( );
    data.push( record );
    SaveData( data );
  }
}

function ShowBookmarks( ) {
  // show a pop-up menu; this seems to work only when
  // a PDF is already in the viewer;
  var data= GetData( );
  var items= '';
  for( ii= 0; ii< data.length; ++ii ) {
    if( ii!= 0 )
      items+= ', ';
    items+= '"'+ ii+ ': '+ data[ii][0]+ '"';
  }
  // assemble the command and then execute it with eval( )
  var command= 'app.popUpMenu( '+ items+ ' );';
  var selection= eval( command );
  if( selection== null ) {
    return; // exit
  }

  // the user made a selection; parse out its index and use it
  // to access the bookmark record
  var index= 0;
  // toString( ) converts the String object to a string literal
  // eval( ) converts the string literal to a number
  index= eval( selection.substring( 0, selection.indexOf(':') ).toString( ) );
  if( index< data.length ) {
    try {
      // the document must be 'disclosed' for us to have any access
      // to its properties, so we use these FirstPage NextPage calls
      //
      app.openDoc( data[index][1] );
      app.execMenuItem( "FirstPage" );
      for( ii= 0; ii< data[index][2]; ++ii ) {
        app.execMenuItem( "NextPage" );
      }
    }
    catch( ee ) {
      var response= 
        app.alert("Error trying to open the requested document.\nShould 
I remove this bookmark?", 2, 2);
      if( response== 4 && index< data.length ) {
        data.splice( index, 1 );
        SaveData( data );
      }
    }
  }
}

function DropBookmark( ) {
  // modeled after ShowBookmarks( )
  var data= GetData( );
  var items= '';
  for( ii= 0; ii< data.length; ++ii ) {
    if( ii!= 0 )
      items+= ', ';
    items+= '"'+ ii+ ': '+ data[ii][0]+ '"';
  }
  var command= 'app.popUpMenu( '+ items+ ' );';
  var selection= eval( command );
  if( selection== null ) {
    return; // exit
  }

  var index= 0;
  index= eval( selection.substring( 0, selection.indexOf(':') ).toString( ) );
  if( index< data.length ) {
    data.splice( index, 1 );
    SaveData( data );
  }
}

function ClearBookmarks( ) {
  if( app.alert("Are you sure you want to erase all bookmarks?", 2, 2 )== 4 ) {
    SaveData( new Array(0) );
  }
}

app.addMenuItem( {
cName: "-",              // menu divider
cParent: "View",         // append to the View menu
cExec: "void(0);" } );

app.addMenuItem( {
cName: "Bookmark This Page &5",
cParent: "View",
cExec: "AddBookmark( );",
cEnable: "event.rc= (event.target != null);" } );

app.addMenuItem( {
cName: "Go To Bookmark &6",
cParent: "View",
cExec: "ShowBookmarks( );",
cEnable: "event.rc= (event.target != null);" } );

app.addMenuItem( {
cName: "Remove a Bookmark",
cParent: "View",
cExec: "DropBookmark( );",
cEnable: "event.rc= (event.target != null);" } );

app.addMenuItem( {
cName: "Clear Bookmarks",
cParent: "View",
cExec: "ClearBookmarks( );",
cEnable: "event.rc= true;" } );
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Create Windows Shortcuts to Online PDF Pages with Acrobat
Quickly return to the particular page of an online PDF, and manage these shortcuts with your other Favorites.
Web browsers don't enable you to bookmark online PDF pages as precisely as you can bookmark HTML web pages. Sure, you can bookmark the PDF document, but if that document is 300 pages long, your bookmark isn't helping you very much. The problem is that the browser doesn't know which PDF page you are viewing; it leaves those details to Acrobat or Reader. The solution is to have Acrobat/Reader create the shortcut for you. This little plug-in for Acrobat does the trick by creating page-specific Internet shortcuts in your Favorites folder, as shown in Figure 2-1.
Our Shortcuts plug-in does not work with Reader. Visit http://www.pdfhacks.com/shortcuts/ to see the status of Reader support.
Figure 2-1: Creating a PDF page shortcut that you can manage with your other Favorites
Visit http://www.pdfhacks.com/shortcuts/ and download shortcuts-1.0.zip. Unzip, and then copy shortcuts.api to your Acrobat plug_ins folder. This folder is usually located somewhere such as C:\Program Files\Adobe\Acrobat 5.0\Acrobat\plug_ins\.
Restart Acrobat. Our Shortcuts plug-in adds a PDF Hacks Shortcuts submenu to the Acrobat Plug-Ins menu. It also adds this Create Shortcut to This Page button to the navigation toolbar:
When viewing an online PDF, click this button and an Internet shortcut will appear in your personal Favorites folder. This shortcut is visible immediately from the Favorites menu in Internet Explorer. You can organize shortcuts into subfolders, rename them, or move them. When you activate one of these shortcuts, your default browser opens to the given URL, in this case to the PDF page you were viewing.
You can convert the shortcuts in your Favorites folder into Mozilla bookmarks by using the Internet Explorer Import/Export Wizard. Start the wizard from Internet Explorer by selecting File Import and Export . . . .
Examine one of these shortcut URLs and you will see our trick for opening an online PDF to a specific page. It is simply a matter of appending information to the PDF's URL. For example,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Create Windows Shortcuts to Local PDF Pages
Pinpoint and organize the essential data in your local PDF collection.
PDF files can hold so much information, yet Acrobat provides no convenient way to reference an individual PDF page outside of Acrobat. This makes it harder to organize a collection. To solve this problem, I developed an Acrobat plug-in that can create Windows shortcuts that open specific PDF pages. However, it works only after you add some special Dynamic Data Exchange (DDE) messages to the PDF Open action. Use this plug-in to create Windows shortcuts to the PDF pages, sections, or chapters most useful to you. Name these shortcuts and organize them in folders just like Internet shortcuts.
Adobe Reader users should use [Hack #15] instead of this hack.
First, we must have Acrobat open PDF files to a particular page, when a page number is given. The Windows shell is responsible for opening Acrobat when you double-click a PDF file or shortcut. You can view and edit this association from the Windows Explorer File Manager.
In the Windows File Explorer menu, select Tools Folder Options . . . and click the File Types tab. Select the Adobe Acrobat Document (PDF) file type and click the Advanced button (Windows XP and 2000) or the Edit . . . button (Windows 98). Double-click the Open action to change its configuration.
Now you should be looking at the Edit Action dialog for the Adobe Acrobat Document file type. Check the Use DDE checkbox and then add/change the DDE fields like so:
Field name
Field value
DDE Message
[DocOpen("%1")] [DocGoTo("%1",%2=0)]
Application
acroview
DDE Application Not Running
Topic
control
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Turn PDF Bookmarks into Windows Shortcuts
Turn your PDF inside out.
We've talked about creating a shortcut to a single PDF page [Hack #17] . Now let's create a complete tree of shortcuts, each representing a PDF bookmark, such as those in Figure 2-3. Organize them in folders and use the Windows File Explorer to navigate your PDF collection. It works with local and online PDF files.
Install the Shortcuts plug-in [Hack #16] and then configure your computer for local shortcuts [Hack #17] . Open a bookmarked PDF and press the button. Or, select Plug-Ins PDF Hacks Shortcuts Create Shortcuts to All Document Bookmarks. A set of shortcuts will appear in your Favorites folder. Create a new folder and move the new shortcuts to a convenient location.
Figure 2-3: Converting a PDF's bookmarks into desktop shortcuts, which can then be organized using the File Explorer
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Generate Document Keywords
Complement your search strategy with document keywords.
Lost information is no use to anybody, and the difference between lost and found is a good collection search strategy. Keywords can play a valuable role in your strategy by giving you insight into a document's topics. Of course, a document's headings, listed in its Table of Contents, provide an outline of its topics. Keywords are different. Derived from the document's full text, they fill in the gaps between the formal, outlined topics and their actual treatments. This hack explains how to find a PDF's keywords using our kw_catcher program.
Finding keywords automatically is a hard problem. To simplify the problem, we are going to make a couple of assumptions. First, the document in question is large—50 pages or longer. Second, the document title is known—i.e., we aren't trying to discover the document's global topic, represented by its title. Rather, we are trying to discover subtopics that emerge throughout the document.

Section 2.6.1.1: Stopwords, noise, and signal

Stopwords are the words that appear most frequently in almost any document, such as the, of, and, to, and so on. Stopwords do not help us identify topics because they are used in all topics. Words that are used with uniform frequency throughout a document are called noise. Stopwords are the best example of noise. For any given document, dozens of other words add to the noise.
We are trying to find a document's signal, which is the set of words that communicate a topic. Automatically separating signal from noise is tricky.
Recall our assumption that the document title, or global topic, is known. This is because a book's global topic tends to come up consistently throughout the document. For example, the word PDF occurs so regularly throughout this book, it looks like noise.

Section 2.6.1.2: Identifying local topics

Document word frequency is the number of times a word occurs in a document. By itself, it does not help us because noise words and signal words can occur with any frequency.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Index and Search Local PDF Collections on Windows
Teach Windows XP or 2000 how to search the full text of your PDF along with your other documents. Or, use Adobe Reader to search PDF only.
Search is essential for utilizing document archives. Search can also find things where you might not have thought to look. The problem is that Windows search doesn't know how to read PDF files, by default. We present a couple of solutions.
The free Adobe Reader 6.0 provides the easiest solution. It enables you to perform searches across your entire PDF collection (Edit Search). Its detailed query results include links to individual PDF pages and snippets of the text surrounding your query, as shown in Figure 2-5. Its Fast Find setting, enabled by default, caches the results of your searches, so subsequent sear