Skip to Content
Tableau Prep即学即用
book

Tableau Prep即学即用

by Carl Allchin
August 2022
Beginner to intermediate
463 pages
9h 22m
Chinese
China Electric Power Press Ltd.
Content preview from Tableau Prep即学即用
94
11
重复的记录:重复记录是在数据准备之前,还是在这个过程的前几个步骤中产
生的?
丢失的数据:是否有某些值没有出现在数据集中,但应该有?是否在你期望值
的地方存在空值?
如果你有很多数据集,检查所有这些因素可能会相当耗时,但有一些方法可以使这
项任务变得更容易和更直观。
11.2
为什么可视化数据集很重要
进行数据集概要分析,最重要的策略之一是可视化。
11.2.1
安斯库姆四要素
如果你读过任何关于数据可视化的书籍,那么你很可能已经看到了
Anscombe
(安
斯库姆)的要素,这是对为什么描述性统计(最小值、最大值、平均值等)不足以
理解数据集中的真实情况的最好论证。
1973
年,
Francis Anscombe
构建了四个由
x
y
值对组成的数据集(见图
11-1
)。
11
-
1
Anscombe 的数据集
95
数据概要分析
从图
11-1
中可以看出,数值范围很大,但四组数据的描述性统计信息基本相同,即:
均值:
x
=9
y
=7.5
样本方差:
x
11
y
4.5
相关性、线性回归、确定系数都类似于小数点后两三位。
然而,当把每一组的各个数据点进行可视化展示时,我们可以看到,这些数据集实
际上是非常不同的(见图
11-2
)。
11
-
2
:安斯库姆的可视化展示
因此,为了充分准备数据,你必须用一些基本的方式将数据可视化──此时还不一 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

深度学习:核心原理与案例分析

深度学习:核心原理与案例分析

Posts & Telecom Press, Ahmed Menshawy
Python金融实战

Python金融实战

Posts & Telecom Press, Yuxing Yan
Python机器学习案例精解

Python机器学习案例精解

Posts & Telecom Press, Yuxi (Hayden) Liu
HBase管理指南

HBase管理指南

Posts & Telecom Press, Yifeng Jiang

Publisher Resources

ISBN: 9787519864439