Skip to Content
R在数据科学中的应用,第2版
book

R在数据科学中的应用,第2版

by Hadley Wickham, Mine Cetinkaya-Rundel, Garrett Grolemund
May 2025
Intermediate to advanced
578 pages
8h 9m
Chinese
O'Reilly Media, Inc.
Content preview from R在数据科学中的应用,第2版

第 5 章 数据整理 数据 Tidy

本作品已使用人工智能进行翻译。欢迎您提供反馈和意见:translation-feedback@oreilly.com

导言

"幸福的家庭都是相似的,不幸的家庭各有各的不幸"。 列奥-托尔斯泰

"Tidy数据集都是相似的,但每一个凌乱的数据集都有其凌乱之处"。 哈德利-维克汉姆

在本章中,你将学习使用一种名为 "tidy data"的系统在 R 中组织数据的一致方法。将数据转换成这种格式需要一些前期工作,但这些工作会带来长期回报。一旦你拥有了整洁数据和由 Tidyverse 中的软件包提供的整洁工具,你就可以花更少的时间将数据从一种表示法混杂到另一种表示法,从而将更多的时间花在你关心的数据问题上。

在本章中,你将首先学习整理数据的定义,并将其应用于一个简单的玩具数据集。然后,我们将深入学习整理数据的主要工具:透视。透视功能允许你在不改变任何值的情况下改变数据的形式。

先决条件

在本章中,我们将重点介绍 Tidyr,它是一个提供大量工具帮助整理凌乱数据集的软件包。

library(tidyverse)

从本章开始,我们将抑制来自 library(tidyverse).

Tidy 数据

您可以用多种方式表示相同的基础数据。下面的示例显示了以三种不同方式组织的相同数据。每个数据集都显示了四个变量的相同值:国家年份人口和记录在案的肺结核(TB)病例数,但每个数据集以不同的方式组织这些值。

table1
#> # A tibble: 6 × 4
#>   country      year  cases population
#>   <chr>       <dbl>  <dbl>      <dbl>
#> 1 Afghanistan  1999    745   19987071
#> 2 Afghanistan  2000   2666   20595360
#> 3 Brazil       1999  37737  172006362
#> 4 Brazil       2000  80488  174504898
#> 5 China        1999 212258 1272915272
#> 6 China        2000 213766 1280428583

table2
#> # A tibble: 12 × 4
#>   country      year type           count
#>   <chr>       <dbl> <chr>          <dbl>
#> 1 Afghanistan  1999 cases            745
#> 2 Afghanistan  1999 population  19987071
#> 3 Afghanistan  2000 cases           2666
#> 4 Afghanistan  2000 population  20595360
#> 5 Brazil       1999 cases          37737
#> 6 Brazil       1999 population 172006362
#> # … with 6 more rows

table3
#> # A tibble: 6 × 3
#>   country      year rate             
#>   <chr>     <dbl>     <chr>         
#> 1 Afghanistan  1999 745/19987071     
#> 2 Afghanistan  2000 2666/20595360    
#> 3 Brazil       1999 37737/172006362  
#> 4 Brazil       2000 80488/174504898  
#> 5 China        1999 212258/1272915272
#> ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

R深度学习权威指南

R深度学习权威指南

Posts & Telecom Press, Joshua F. Wiley
AI工程

AI工程

Chip Huyen
Raku学习手册

Raku学习手册

brian d foy

Publisher Resources

ISBN: 9798341657304