Skip to Content
Python数据处理
book

Python数据处理

by Jacqueline Kazil, Katharine Jarmul
July 2017
Intermediate to advanced
398 pages
11h 54m
Chinese
Posts & Telecom Press
Content preview from Python数据处理
数据清洗:研究、匹配与格式化
151
在函数末尾,返回新的数据字典。
现在传入数据行运行该函数,并将新生成的字典赋值给一个变量,以供后续使用。本行
代码将新生成的字典命名为
mn_dict
,我们可以利用这个字典来查看有多少个唯一家庭,
以及每个家庭分别做了多少份调查。
如果函数结尾没有
return
的话,函数将会返回
None
。在你开始编写自己的
函数时,一定要注意返回值的错误。
我们找到了约
7000
个唯一家庭,这说明采访中有
2000
多的男性与其他男性属于同一家
庭。本次采访每个家庭平均有
1.3
个男性。像这样的简单计算可以让我们对数据有更深入
的了解,还可以帮我们思考数据的含义,并发现基于现有数据我们可以回答哪些问题。
7.3
 小结
本章你学习了数据清洗的基础知识,以及在数据处理过程中数据清洗的重要性。你用到了
一些
MICS
原始数据,并直接与这些数据进行了交互。现在你学会了观察数据,并能判断
你可能会遇到的数据清洗问题。现在你还可以找到并删除错误数据和重复数据。
7-2
中详细描述了本章讲到的新概念和新库。
7-2Python编程的新概念和新库
概念/库 作用
列表生成式 利用迭代器、函数和
/
if
语句,可以方便快速地创建列
表,用于进一步清洗和处理数据
字典的
values
方法 返回由字典的值组成的列表。在测试内部元素时很有用
in
not in
语句 测试内部元素。通常用于字符串或列表。返回一个布尔值
列表的
remove
方法 传入一个元素,删除列表中第一个匹配的元素。对于一个创
建好的列表,如果你确切知道要删除的元素是什么,这个方 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学中的实用统计学(第2版)

数据科学中的实用统计学(第2版)

Peter Bruce, Andrew Bruce, Peter Gedeck
Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115459190