O'Reilly logo

Python for Secret Agents - Volume II by Steven Lott

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Extracting PDF content

In Chapter 1, New Missions – New Tools, we installed PDF Miner 3K to parse PDF files. It's time to see how this tool works. Here's the link to the documentation for this package: http://www.unixuser.org/~euske/python/pdfminer/index.html. This link is not obvious from the PyPI page, or from the BitBucket site that contains the software. An agent who scans the docs/index.html will see this reference.

In order to see how we use this package, visit http://www.unixuser.org/~euske/python/pdfminer/programming.html. This has an important diagram that shows how the various classes interact to represent the complex internal details of a PDF document. For some helpful insight, visit http://denis.papathanasiou.org/2010/08/04/extracting-text-images-from-pdf-files/ ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required