Skip to Content
Tableau Prep即学即用
book

Tableau Prep即学即用

by Carl Allchin
August 2022
Beginner to intermediate
463 pages
9h 22m
Chinese
China Electric Power Press Ltd.
Content preview from Tableau Prep即学即用
103
12
数据集采样
采样还是不采样?这是个问题。在一个数据量不断增长、存储解决方案越来越便宜、
数据创建比以往任何时候都更容易的世界里,数据准备者必须决定是否使用采样的
数据子集,并明白这样做的意义。本章将探讨为什么要谨慎使用采样,什么时候可
能需要采样,以及在
Prep Builder
中可以使用哪些技术进行数据采样。
12.1
一个简单的规则:如果可能,全部使用
我们使用数据的原因是为了掌握情况、趋势和异常值,以帮助我们在日常和工作生
活中做出更好的决策。所以,为什么不以使用所有的数据和信息为目标呢?
不过,由于数据集数据量大小的缘故,经常地使用全部数据集并非总是可行的。
Preppin' Data
存在的原因是,数据往往需要准备好而进行分析。要做到这一点,我
们需要知道哪些是可以完全清理的,哪些是不可能清理的。如果不可能完全清理数
据集,那么删除不能清理的部分是有意义的,但这并不是数据采样的目的。数据采
样意味着使用完整数据集的一个子集──不是因为数据无法清理,而是因为很多其
他原因。
12.2
绕过技术限制的数据采样
数据采样可以把需要清理的数据及时“冻结”起来,以应对数据准备的两大技术难题:
104
12
数据规模
数据采样可以让你建立分析模型,然后根据该逻辑运行完整的数据集。
数据速度
数据采样限制了持续变化的数量,允许你在更频繁的更新之前建立逻辑。 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

深度学习:核心原理与案例分析

深度学习:核心原理与案例分析

Posts & Telecom Press, Ahmed Menshawy
Python金融实战

Python金融实战

Posts & Telecom Press, Yuxing Yan
Python机器学习案例精解

Python机器学习案例精解

Posts & Telecom Press, Yuxi (Hayden) Liu
HBase管理指南

HBase管理指南

Posts & Telecom Press, Yifeng Jiang

Publisher Resources

ISBN: 9787519864439