Skip to Content
Python数据处理
book

Python数据处理

by Jacqueline Kazil, Katharine Jarmul
July 2017
Intermediate to advanced
398 pages
11h 54m
Chinese
Posts & Telecom Press
Content preview from Python数据处理
处理
PDF
文件,以及用
Python
解决问题
101
5.5
 不常见的文件类型
目前为止,本书已经讲过
CSV
JSON
XML
Excel
PDF
文件。
PDF
中的数据很难解
析,你可能认为数据解析的世界不能比这更糟了。遗憾的是,还有比这更糟糕的事情。
好消息是,你可能不会遇到前人尚未解决的问题。记住,向
Python
社区或更高一级的开源
社区寻求帮助和建议,这永远都是一个好方法,即使你已经认识到应该寻找更容易解析的
数据集。
如果数据具有以下特征,你可能会遇到问题。
文件由旧系统生成,使用的是一种不常见的文件类型。
文件由专用系统(
proprietary system
)生成。
你所有的程序都无法打开该文件。
对于与不常见文件类型相关的问题,仅仅用你之前学过的知识就可以解决。
(1)
确定文件类型。如果从文件扩展名上不容易看出,那么可以用
python-magic
库(
https://
pypi.python.org/pypi/python-magic/0.4.6
)。
(2)
在互联网上搜索“
how to parse <file extension> in Python
”(
Python
如何解析
<
文件扩
展名
>
),将“
<file extension>
”替换为实际的文件扩展名。
(3)
如果找不到显而易见的解决方法,尝试用文本编辑器打开该文件,或者用
Python
open
函数读取该文件。
(4)
如果字符看起来很奇怪,读一些关于
Python
编码的内容。如果你是第一次接触
Python
字符编码,可以观看
PyCon ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学中的实用统计学(第2版)

数据科学中的实用统计学(第2版)

Peter Bruce, Andrew Bruce, Peter Gedeck
Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115459190