Skip to Content
Python数据处理
book

Python数据处理

by Jacqueline Kazil, Katharine Jarmul
July 2017
Intermediate to advanced
398 pages
11h 54m
Chinese
Posts & Telecom Press
Content preview from Python数据处理
98
5
continue
elif
row[2] == '':
first_name = row[0]
continue
if
first_name:
row[0] = u'{} {}'.format(first_name, row[0])
first_name = False
final_data.append(dict(zip(headers, row)))
if
row[0] == 'Zimbabwe':
break
pprint.pprint(final_data)
如果这一行有
first_name
,那么在该行内将国名合并。
first_name
重新设置为
False
,这样下一次迭代可以正常运行。
现在数据导入工作已全部完成。如果你希望数据结构与从
Excel
导入的数据完全相同,需
要对数据做进一步处理,但我们已经可以将
PDF
中的数据保存成行数据。
pdftables
已经不再受到积极的支持,它的开发者现在提供替代的新产品,
但却是收费的(
https://pdftables.com/
)。依赖不受支持的代码是很危险的,我
们也不能认为
pdftables
总是可用
3
。但是,开源社区的一部分内容就是回馈,
所以我们鼓励你找到好项目,为它做贡献,帮它宣传,希望像
pdftables
样的项目能够保持开源,能够继续成长并发展。
下面,我们来看一下解析
PDF
数据的其他方法,其中包括手动清洗数据。
5.4.2
 练习
手动清洗数据
我们来聊一聊一个大家闭口不谈却确实存在的事实。阅读本章的过程中,你可能一直想知 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学中的实用统计学(第2版)

数据科学中的实用统计学(第2版)

Peter Bruce, Andrew Bruce, Peter Gedeck
Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115459190