Skip to Content
Python数据处理
book

Python数据处理

by Jacqueline Kazil, Katharine Jarmul
July 2017
Intermediate to advanced
398 pages
11h 54m
Chinese
Posts & Telecom Press
Content preview from Python数据处理
104
6
以帮你分辨好数据和坏数据,还可以帮你评价数据的可用性。第
7
章和第
8
章会讲到用
Python
做数据清洗和数据探索,第
14
章会讲到自动化,在这些章节里我们都会介绍关于
这些工具的更多内容。
刚刚得到新数据时,我们建议做一个数据
气味测试
,测试该数据是否是可靠的信息源,并
决定是否信任该数据。你可以问问自己以下几个问题。
如果我有问题或疑虑的话,能够联系上作者本人吗?
数据是否定期检查错误并更新?
数据里是否包含数据获取方法的信息,是否包含数据获取过程中使用的样本类型?
有没有其他数据源可以验证这个数据集?
根据我对这个话题了解的所有知识,数据看起来是否可信?
如果你对至少三个问题的回答都是“是”,这说明你走对路了!如果至少对两个问题的回
答是“否”,你可能需要花更多时间寻找可靠的数据。
你可能需要联系最初采集数据并发布的作者或机构,以寻求更多信息。通常
情况下,给合适的人打电话或发电子邮件,可以帮你回答上面至少一个问
题,并验证数据源的可靠性。
6.2
 真实性核查
为保证报告的可信,对你的数据做真实性核查是非常重要的,尽管有时可能既烦人又累
人。根据数据集的不同情况,真实性核查可能包括以下内容。
联系数据源,核实最新的方法和版本。
找到其他好的数据源作对照。
联系专家,探讨好的数据源和真实信息。
进一步研究你选定的主题,检查你的数据源和
/
或数据集是否可信。
有些图书馆和大学可以访问只对订阅用户开放的出版物和教育档案,它们是真实性核查的
重要资源。如果可以访问类似
LexisNexis ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学中的实用统计学(第2版)

数据科学中的实用统计学(第2版)

Peter Bruce, Andrew Bruce, Peter Gedeck
Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115459190