Skip to Content
Tableau Prep即学即用
book

Tableau Prep即学即用

by Carl Allchin
August 2022
Beginner to intermediate
463 pages
9h 22m
Chinese
China Electric Power Press Ltd.
Content preview from Tableau Prep即学即用
93
11
数据概要分析
数据准备的艺术是理解数据集,以确定你可能需要做什么准备来进行分析。了解数
据的概况是形成数据全貌的关键。如果不对数据进行概要分析,则很容易错过一个
明显的准备步骤或增加不必要的工作。本章将探讨什么是概要分析,为什么数据概
要分析很重要,以及
Prep
如何对数据进行概要分析。
11.1
什么是数据概况
所谓数据概况,是指数据集的特征。正如前面几章所讨论的那样,了解数据集中的
数据类型对分析至关重要。同样重要的是了解数据集的分类数据字段的数量和方差。
确定数据集的粒度级别将有助于确定有多少唯一的记录,或者是否有需要在数据准
备过程中删除的重复记录。所有这些因素构成了数据集概况的基础,它包括以下这
些因素:
最小值、最大值和数值范围:最小值和最大值之间的范围是否合理?
边界值之外的数据:数据中是否存在自然限制,比如
100%
,或者当前日期不能
超过但已经超过了?
异常值:除了一个或几个超出范围的值外,这些值是否在一定范围内?
不合规的记录数:某些维度的行数是否一致,这个数字是否会突然改变?例如,
你是否希望数据集中的每个日期都有固定的记录数?
拼写不规范:能确定数据集中的名称和单词的正确拼写吗?
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

深度学习:核心原理与案例分析

深度学习:核心原理与案例分析

Posts & Telecom Press, Ahmed Menshawy
Python金融实战

Python金融实战

Posts & Telecom Press, Yuxing Yan
Python机器学习案例精解

Python机器学习案例精解

Posts & Telecom Press, Yuxi (Hayden) Liu
HBase管理指南

HBase管理指南

Posts & Telecom Press, Yifeng Jiang

Publisher Resources

ISBN: 9787519864439