O'Reilly logo

Automate it! - Recipes to upskill your business by Chetan Giridhar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Extracting data from PDF files

PDF (Portable Document Format) is a file format used to store data in documents agnostic to application software, hardware, and operating systems (hence the name, portable). PDF documents are fixed-layout flat files that include text and graphics and contain information needed to display the content. This recipe will show you how to extract information from PDF files and use the reader object.

Getting ready

To step through this recipe, you will need to install Python v2.7. To work with PDF files, we have PyPDF2, a nice module that can be installed with the following command:

sudo pip install PyPDF2

Already installed the module? So, let's get started!

How to do it...

  1. On your Linux/Mac computer, go to Terminal and use ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required