Skip to Content
R在数据科学中的应用,第2版
book

R在数据科学中的应用,第2版

by Hadley Wickham, Mine Cetinkaya-Rundel, Garrett Grolemund
May 2025
Intermediate to advanced
578 pages
8h 9m
Chinese
O'Reilly Media, Inc.
Content preview from R在数据科学中的应用,第2版

第19章 连接

本作品已使用人工智能进行翻译。欢迎您提供反馈和意见:translation-feedback@oreilly.com

导言

数据分析很少只涉及一个数据帧。通常情况下,你会有很多数据帧,你必须将它们连接起来,才能回答你感兴趣的问题。本章将向你介绍两种重要的连接类型:

  • 突变连接,即从一个数据帧中的匹配观测值向另一个数据帧中添加新变量。
  • 过滤连接,根据一个数据帧中的观测值是否与另一个数据帧中的观测值相匹配来过滤观测值。

我们将首先讨论键,即在连接中用于连接一对数据帧的变量。我们将通过检查 nycflights13 软件包数据集中的键来巩固理论,然后利用这些知识开始将数据帧连接到一起。接下来,我们将讨论连接是如何工作的,重点是连接对行的作用。最后,我们将讨论非相等连接,这是一系列连接,提供了比默认相等关系更灵活的键匹配方式。

先决条件

在本章中,我们将使用 dplyr 的连接功能来探索 nycflights13 中的五个相关数据集。

library(tidyverse)
library(nycflights13)

钥匙

要了解连接,首先需要了解如何通过每个表内的一对键将两个表连接起来。在本节中,您将了解两种类型的键,并在 nycflights13 软件包的数据集中看到这两种键的示例。您还将学习如何检查键是否有效,以及如果表中缺少键该怎么办。

主键和外键

每个连接都涉及一对键:主键和外键。主键是一个变量或一组变量,能唯一标识每个观测值。如果需要一个以上的变量,则称为复合键。例如,在 nycflights13:

  • airlines 记录了每家航空公司的两项数据:航空公司代码和全称。您可以通过航空公司的双字母代码来识别航空公司,从而使 成为主键。carrier

    airlines
    #> # A tibble: 16 × 2
    #>   carrier name                    
    #>   <chr>   <chr>                   
    #> 1 9E      Endeavor Air Inc.       
    #> 2 AA      American Airlines Inc.  
    #> 3 AS      Alaska Airlines Inc.    
    #> 4 B6      JetBlue Airways         
    #> 5 DL      Delta Air Lines Inc.    
    #> 6 EV      ExpressJet Airlines Inc.
    #> # … with 10 more rows
  • airports 记录每个机场的数据。您可以通过三个字母的机场代码来识别每个机场,从而使 成为主键。faa

    airports
    #> # A tibble: 1,458 × 8
    #>   faa   name                            lat   lon   alt    tz dst  
    #>   <chr> <chr>                         <dbl> <dbl> <dbl> <dbl> <chr>
    #> 1 04G   Lansdowne Airport              41.1 -80.6  1044    -5 A    
    #> 2 06A   Moton Field Municipal Airport  32.5 -85.7   264    -6 A    
    #> 3 06C   Schaumburg Regional            42.0 -88.1   801    -6 A    
    #> 4 06N   Randall Airport                41.4 -74.4   523    -5 A    
    #> 5 09J   Jekyll Island Airport          31.1 -81.4    11    -5 A    
    #> 6 0A9 Elizabethton Municipal Airpo… 36.4 -82.2 1593 -5 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

R深度学习权威指南

R深度学习权威指南

Posts & Telecom Press, Joshua F. Wiley
AI工程

AI工程

Chip Huyen
Raku学习手册

Raku学习手册

brian d foy

Publisher Resources

ISBN: 9798341657304