Skip to Content
Tableau Prep即学即用
book

Tableau Prep即学即用

by Carl Allchin
August 2022
Beginner to intermediate
463 pages
9h 22m
Chinese
China Electric Power Press Ltd.
Content preview from Tableau Prep即学即用
266
30
去除重复数据
了解数据的颗粒度水平是为分析做好准备的关键。然而,在研究粒度问题时,你可
能会发现一些不清晰的答案。造成这种不清晰的原因往往是重复数据。本章将介绍
如何识别数据集中的重复数据,以及你可以对它们做些什么。
30.1
如何识别重复的数据
除非你是故意寻找重复的数据,否则就是在依靠“了解数字”的人告知你有问题。
因此,重要的是,你要积极主动地去尝试避免数据集中的重复数据,并在需要时知
道如何删除它们。去除重复的数据使数据汇总变得更容易,因为你可以简单地对记
录进行求和,从而找到总数,这反过来又使得到的数据集更容易进行分析。
让我们看一个例子,当订单进入
Chin & Beard Suds Co.
公司时的情况。在分析订单
时,我们希望有一个数据集,每个订单有自己的行。当把数据集加载到
Prep Builder
中时,你可以很容易地确定:
每张订单有多少行(这里用
Case ID
表示)。
是否如概况窗格中显示的那样,行数分布均匀。
点击单个
Case ID
,可以在数据窗格中看到每个
ID
有多行,不同的
ID
在数据集中
的行数不同(见图
30-1
)。
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

深度学习:核心原理与案例分析

深度学习:核心原理与案例分析

Posts & Telecom Press, Ahmed Menshawy
Python金融实战

Python金融实战

Posts & Telecom Press, Yuxing Yan
Python机器学习案例精解

Python机器学习案例精解

Posts & Telecom Press, Yuxi (Hayden) Liu
HBase管理指南

HBase管理指南

Posts & Telecom Press, Yifeng Jiang

Publisher Resources

ISBN: 9787519864439