Skip to Content
面向数据科学家的实用统计学
book

面向数据科学家的实用统计学

by Peter Bruce, Andrew Bruce
October 2018
Beginner to intermediate
238 pages
6h 32m
Chinese
Posts & Telecom Press
Content preview from 面向数据科学家的实用统计学
探索性数据分析
13
即使数据符合正态分布,方差、标准偏差、平均绝对偏差以及中位数绝对偏
差这四者也并非是等价的估计量。事实上,标准偏差总是大于平均绝对偏
差,而平均绝对偏差总是大于中位数绝对偏差。有时,中位数绝对偏差会乘
上一个常数比例因子(通常使用
1.4826
),使得在正态分布下,中位数绝对
偏差与标准偏差具有相同的尺度。
1.4.2
 基于百分位数的估计量
另一种估计离差的方法基于对有序数据分布情况的查看。基于有序数据的统计量被称为
序统计量
,其中最基本的测量是
极差
,即数据的最大值与最小值之间的差值。知道最大值
和最小值本身也是十分有用的,这有助于识别离群值。但是极差对离群值非常敏感,对于
测量数据的离差并非十分有用。
为避免对离群值敏感,我们可以删除有序数据两端的值,然后再查看数据的极差。正式表
述为,此估计量基于
百分位数
间的差异。在一个数据集中,第
P
百分位数表明,至少有
P
%
的值小于或等于该值,而
(100
-
P
)%
的值大于或等于该值。例如,如果要找到第
80
分位数,我们首先对数据进行排序,然后从最小值开始,按照从小到大的顺序数出其中
80%
的数值。注意,中位数等同于第
50
百分位数。百分位数在本质上等同于
四分位数
而四分位数是根据分数做索引的,因此
0.8
四分位数等同于第
80
百分位数。
变异性的一种常用测量方法第
25
百分位数和第
75
百分位数间的差值,称为
四分位距
IQR
)。下面给出一个例子,对于数据集
{3, 1, 5, 3, 6, 7, 2, 9}
,我们在排序后得到
{1, 2, 3, 3,
5, 6, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

C++语言导学(原书第2版)

C++语言导学(原书第2版)

本贾尼 斯特劳斯特鲁普
基于Python的智能文本分析

基于Python的智能文本分析

Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda

Publisher Resources

ISBN: 9787115493668