Skip to Content
Python数据处理
book

Python数据处理

by Jacqueline Kazil, Katharine Jarmul
July 2017
Intermediate to advanced
398 pages
11h 54m
Chinese
Posts & Telecom Press
Content preview from Python数据处理
处理
PDF
文件,以及用
Python
解决问题
75
5-1PyPI 网站上的 PDF
浏览这些
Python
包,了解一下每个库的详细信息,但分辨不出哪一个库是解析
PDF
的最
佳选择。如果你尝试更多的搜索,比如“
parse pdf
”(解析
pdf
),会出现更多的库供你选
择,但还是没有明显的最佳选择(搜索结果见
https://pypi.python.org/pypi?:action=search&t
e
rm=parse+pdf&submit=search
)。所以我们用搜索引擎查看一下大家都在用什么库来解析
PDF
在搜索库或者答案时,注意观察你找到资料的发布日期。帖子或问题的年代
越久远,它过时且不再适用的可能性就越大。先试着将搜索范围限定在过去
的两年内,然后仅在需要时再扩大搜索的时间范围。
在阅读了许多教程、文档、博客文章和几篇有用的文章(例如这一篇:
http://www.binpress.
com/tutorial/manipulating-pdfs-with-python/167
)之后,我们决定使用
slate
库(
https://pypi.
python.org/pypi/slate
)。
slate
能够满足我们的需求,但并非总是如此。放弃并从头开始也是可以的。
如果有很多库可供选择的话,选择你认为最合适的那一个,即使有人告诉你
它不是“最好的”工具。究竟哪一个工具最好,大家见仁见智。在你学习编
程的过程中,“最好的”工具就是你凭直觉选择的那一个。
5.2.1
 利用
slate
库打开并读取
PDF
我们决定用 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学中的实用统计学(第2版)

数据科学中的实用统计学(第2版)

Peter Bruce, Andrew Bruce, Peter Gedeck
Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115459190