Skip to Content
Python数据处理
book

Python数据处理

by Jacqueline Kazil, Katharine Jarmul
July 2017
Intermediate to advanced
398 pages
11h 54m
Chinese
Posts & Telecom Press
Content preview from Python数据处理
154
8
都在
0.3
0.4
之间,那么你就知道,没在这个范围内的得分可能就是离群值)。
如果想对同样的数据做标准化,应该怎么做呢?举个例子,你可以将数据标准化,计算出
每分钟的平均得分。然后你可以将平均得分作图,查看分布情况。哪些球队每分钟得分较
高?有没有离群值?
你还可以计算标准差来查看分布情况。在第
9
章中我们会更全面地介绍标准化,但主要问
题就是:数据的正常范围是什么?这个范围之外都有哪些数据?数据有没有什么规律?
可以看出,归一化和标准化是不同的。但二者通常都可以让研究人员或调查人员确定数据
的分布,并明白该分布对后续研究或计算的含义。
数据标准化和归一化有时还需要删除离群值,这样你才能更好地发现数据的规律和分布。
回头看前面的球队例子,如果你从整个联赛中删除顶级得分球员的得分,球队的成绩是否
发生了巨大的变化?如果一名球员得到了所在球队一半的得分,那么回答是“是的”,这
会使球队成绩发生巨大的变化。
与此类似,如果某支球队总是大比分胜出,从联赛数据中剔除这支队伍,可能会大幅改变
平均得分及其分布情况。你可以使用归一化、标准化和剔除离群值的方法来帮你找到问题
的答案,这取决于你要解决的问题。
8.2
 数据存储
我们已经讲过几种数据存储的方法,现在有了可用的数据,我们先来复习一下这些方法。
如果你正在使用数据库、知道预期的表格格式,并想要保存已经清洗过的数据,那么你应
该继续使用第
6
章讲过的
Python
库来连接数据库并保存数据。对于这些
Python
库中的大
部分库,你都可以使用游标直接向数据库提交。 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学中的实用统计学(第2版)

数据科学中的实用统计学(第2版)

Peter Bruce, Andrew Bruce, Peter Gedeck
Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115459190