Skip to Content
Python数据处理
book

Python数据处理

by Jacqueline Kazil, Katharine Jarmul
July 2017
Intermediate to advanced
398 pages
11h 54m
Chinese
Posts & Telecom Press
Content preview from Python数据处理
数据清洗:标准化和脚本化
157
8.4
 数据清洗脚本化
随着你的
Python
知识的逐步深化与丰富,你编写的
Python
代码也会逐渐变得复杂。现在
你可以编写函数、解析文件、导入并使用多个
Python
库,甚至还可以存储数据。是时候开
始将代码脚本化了。脚本化(
scripting
)的意思是,确定代码的结构,用于后续使用、学
习和分享。
UNICEF
数据为例。我们知道,
UNICEF
每隔几年会发布这些数据集,其中许多数据是
不变的。调查不太可能发生较大变化——它是建立在多年经验的基础之上。考虑到这些事
实,我们可以信任这些数据集有相当高的一致性。如果我们需要再次用到
UNICEF
数据,
可能至少可以复用第一次写的脚本中的一部分代码。
目前我们代码的结构比较简单,也缺少代码文档。除了可读性较差外,这样的代码还很难
复用。虽然现在我们可以看懂自己写的函数,但一年后我们还能准确地读懂并理解这些函
数吗?我们把这些函数发给同事,他们能看懂我们的笔记吗?在我们对这些问题做出肯定
的回答之前,最好一行代码也不要写。如果一年后我们无法读懂自己的代码,那么这些代
码是没有任何用处的,当发布新报告时会有人(很可能是我们自己)重新写这些代码。
Python
之禅不仅适用于编写代码,还适用于组织代码,函数、变量和类的命名,等等。最
好在选择命名上花点时间,判断哪些名字可以让你和他人都一目了然。注释和文档可以帮
助理解,但代码本身也应该具有较强的可读性。
经常有人称赞
Python
是最容易读懂的语言之一,即使是看不懂代码的人也能
读懂!保持代码语法简洁可读,这样解释代码功能的文档也不需要太长。 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学中的实用统计学(第2版)

数据科学中的实用统计学(第2版)

Peter Bruce, Andrew Bruce, Peter Gedeck
Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115459190