Skip to Content
Python数据处理
book

Python数据处理

by Jacqueline Kazil, Katharine Jarmul
July 2017
Intermediate to advanced
398 pages
11h 54m
Chinese
Posts & Telecom Press
Content preview from Python数据处理
92
5
里。如果你查看
PDF
文件里对应的那列数据,会发现
PDF
文本文件中的规律是从
1251
开始的:
1250
1251 total
1252 –
1253 5 x
1254 26
1255 –
1266 –
进一步观察发现,出生登记列标题的结尾是
total
266 Birth
267 registration
268 (%)+
269 2005–2012*
270 total
271 37
272 99
目前搜集童工总数的函数找寻的是
total
这个词,所以在找到下一行国家之前,我们先找到
了这一列数据。我们还发现
暴力惩戒比例
Violent discipline
%
)]列也有一个
total
标签,
上面有一个空行。这和我们要采集的
total
具有相同的规律。
接二连三地遇到
bug
,说明你的代码逻辑可能存在问题。我们的脚本最开始用的是
/
函数,所以想要从根本解决问题,就要重构那里的逻辑。我们想要知道如何找到正确的数
据列,或许可以采集列名并排序。我们可能还需要找到一种方法,检查“页码”是否发生
了变化。如果我们一直这样头痛医头脚痛医脚,很可能会遇到更多的错误。
只在脚本上投入你认为必要的时间。如果你想构建一个可持续过程,在很
长一段时间内都可以在大型数据集上多次运行,你需要花时间仔细考虑所
有步骤。
这就是编程的过程:写代码,调试,写代码,调试。无论是经验多么丰富的程序员,有时
都会在代码中遇到错误。在学习编程的过程中,遇到错误会非常沮丧。你可能会想:“为
什么无法运行?一定是我不擅长编程。”但事实并非如此 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学中的实用统计学(第2版)

数据科学中的实用统计学(第2版)

Peter Bruce, Andrew Bruce, Peter Gedeck
Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115459190