Skip to Content
数据科学中的实用统计学(第2版)
book

数据科学中的实用统计学(第2版)

by Peter Bruce, Andrew Bruce, Peter Gedeck
October 2021
Intermediate to advanced
289 pages
8h 31m
Chinese
Posts & Telecom Press
Content preview from 数据科学中的实用统计学(第2版)
30
1
与单变量分析类似,双变量分析也需要计算摘要统计量和生成可视化图形。合适的双变量
或多变量分析方式取决于数据的本质是数据型数据还是分类型数据。
1.8.1
 六边形分箱图和等高线图
绘制数值型数据之间的
关系
当数据值比较少时,使用散点图非常合适,如图
1-7
中的股票收益图仅有大约
750
个点。
对于有成千上万甚至几百万条记录的数据集,散点图就会显得过于密集,所以需要其他方
法来对数据间的关系进行可视化。出于演示的目的,让我们看一下
kc_tax
数据集,它包含
了华盛顿州金县住宅应纳税额数据。我们使用
subset
函数去掉了价格特别高以及面积特别
大或特别小的住宅纳税数据,只关注数据的主体部分。
kc_tax0 <- subset(kc_tax, TaxAssessedValue < 750000 &
SqFtTotLiving > 100 &
SqFtTotLiving < 3500)
nrow(kc_tax0)
432693
pandas
中,使用以下代码筛选数据:
kc_tax0 = kc_tax.loc[(kc_tax.TaxAssessedValue < 750000) &
(kc_tax.SqFtTotLiving > 100) &
(kc_tax.SqFtTotLiving < 3500), :]
kc_tax0.shape
(432693, 3)
1-8
是一张
六边形分箱图
,表示了金县房屋面积(单位:平方英尺,约为
0.09
平方米)
与应纳税额之间的关系。这张图没有使用数据点来绘制,因为那样做会只会显示出一团黑 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Python机器学习案例精解

Python机器学习案例精解

Posts & Telecom Press, Yuxi (Hayden) Liu

Publisher Resources

ISBN: 9787115569028