Extracting data from PDF files

PDF (Portable Document Format) is a file format used to store data in documents agnostic to application software, hardware, and operating systems (hence the name, portable). PDF documents are fixed-layout flat files that include text and graphics and contain information needed to display the content. This recipe will show you how to extract information from PDF files and use the reader object.

Getting ready

To step through this recipe, you will need to install Python v2.7. To work with PDF files, we have PyPDF2, a nice module that can be installed with the following command:

sudo pip install PyPDF2

Already installed the module? So, let's get started!

How to do it...

  1. On your Linux/Mac computer, go to Terminal and use ...

Get Automate it! - Recipes to upskill your business now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.