Skip to Content
Tableau Prep即学即用
book

Tableau Prep即学即用

by Carl Allchin
August 2022
Beginner to intermediate
463 pages
9h 22m
Chinese
China Electric Power Press Ltd.
Content preview from Tableau Prep即学即用
231
26
基于分组的数据清理
如果我们总是让别人为我们策划一个完美的数据集,那么数据准备工作就没有必要
了。然而,我们可以(而且很遗憾,经常得这么做)自己清理数据。正如在第
9
中提到的,在数据准备过程中,你将面临的最常见的挑战之一是清理字符串数据。
例如,将字符串值标准化,使其即使在有错别字的情况下也能统计出其值的实例。
对于这种情况,有一种技术可以让我们的工作特别方便——分组。本章将介绍什么
是分组,以及如何使用
Prep Builder
中内置的分组工具。
26.1
什么是分组
分组意味着将逻辑应用于(大多数)字符串数据字段,以识别它们之间的共同特征,
如它们的含义或预期值。例如,我们可能希望将以下数据项分组:
Edinburgh
Edenburgh
Edinborough
3d!nburgh
作为人类,我们可以认识到这些不同的名字可能都是指苏格兰的爱丁堡(特别是如
果这一列数据用于城市名称时)。但是数据软件对这些数据的看法并不一样,所以
我们必须给它一些方向,让它知道如何处理这些不同的字符集合。
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

深度学习:核心原理与案例分析

深度学习:核心原理与案例分析

Posts & Telecom Press, Ahmed Menshawy
Python金融实战

Python金融实战

Posts & Telecom Press, Yuxing Yan
Python机器学习案例精解

Python机器学习案例精解

Posts & Telecom Press, Yuxi (Hayden) Liu
HBase管理指南

HBase管理指南

Posts & Telecom Press, Yifeng Jiang

Publisher Resources

ISBN: 9787519864439