Skip to Content
Python数据处理
book

Python数据处理

by Jacqueline Kazil, Katharine Jarmul
July 2017
Intermediate to advanced
398 pages
11h 54m
Chinese
Posts & Telecom Press
Content preview from Python数据处理
数据获取与存储
113
我们还汇总了下面几章会用到的几个数据集。我们将这些数据集都放在数据仓库中,以便
后续使用。
前面已经探讨了如何发现问题并搜索资源,下面我们来看一下数据存储。
6.6
 数据存储
找到数据之后,你需要把数据保存下来!有些时候,你得到的数据是干净的、易于访问
的、机器可读的格式。其他时候,你可能想用另一种方法来保存数据。当你第一次从
CSV
PDF
中提取数据的时候,我们会讲到几种数据存储工具,或者,你可以等数据完全处理
并清洗完成后再进行存储(我们会在第
7
章讲到数据清洗的内容)。
我应该把数据保存在哪里
最开始的问题是,要将数据保存到其他地方,还是留在最开始提取的文件中。这有一
系列问题可以帮你回答这个问题。
你能否用简单的文档阅读器(例如
Microsoft Word
)打开数据集,同时不会造成计
算机死机?
数据看起来是否具有良好的标签和结构,让你可以方便提取出每一段信息?
如果需要不止一台电脑来处理数据的话,数据的保存和移动是否方便?
能否利用
API
实时访问数据,这样你就能在线获取需要的数据?
如果所有问题的回答都是“是”,你可能不必担心保存数据的问题。如果你的回答有
“是”有“否”的话,可能需要将数据保存在数据库或平面文件(
flat file
)中。如果所
有问题的回答都是“否”,继续读下去,我的朋友,我们为你提供了解决方法!
假设你的数据集各不相同——这里的一个文件,那里的一份报告。其中一些很容易下载和
访问,但其他的你可能需要从网络上复制或抓取。第
7
章和第
9
章中会讲到如何清洗与合 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学中的实用统计学(第2版)

数据科学中的实用统计学(第2版)

Peter Bruce, Andrew Bruce, Peter Gedeck
Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115459190