July 2017
Beginner to intermediate
715 pages
17h 3m
English
The Portable Document Format (PDF) is a format not tied to a specific platform or software application. A PDF document can hold formatted text and images. PDF is an open standard, making it useful in a variety of places.
There are a large number of documents stored as PDF, making it a valuable source of data. There are several Java APIs that allow access to PDF documents, including Apache POI and PDFBox. Techniques for extracting information from a PDF document are illustrated in Chapter 3, Data Cleaning.
Read now
Unlock full access