Skip to Content
R在数据科学中的应用,第2版
book

R在数据科学中的应用,第2版

by Hadley Wickham, Mine Cetinkaya-Rundel, Garrett Grolemund
May 2025
Intermediate to advanced
578 pages
8h 9m
Chinese
O'Reilly Media, Inc.
Content preview from R在数据科学中的应用,第2版

第 16 章 因素

本作品已使用人工智能进行翻译。欢迎您提供反馈和意见:translation-feedback@oreilly.com

导言

因子用于分类变量,即具有固定已知可能值集的变量。如果要以非字母顺序显示字符向量,因数也很有用。

首先,我们将说明为什么数据分析需要因子1以及如何使用 factor().然后,我们将向您介绍gss_cat 数据集,其中包含大量分类变量供您尝试。然后,您将使用该数据集练习修改因子的顺序和值,最后我们将讨论有序因子。

先决条件

Base R 提供了一些创建和操作因子的基本工具。我们将使用 forcats 软件包对其进行补充,它是核心 tidyverse 的一部分。它提供了处理分类变量的工具(它是因子的变位词!),并使用大量的辅助工具来处理因子。

library(tidyverse)

因子基础知识

假设有一个记录月份的变量:

x1 <- c("Dec", "Apr", "Jan", "Mar")

使用字符串来记录这个变量有两个问题:

  1. 只有 12 个可能的月份,而且无法避免错别字:

    x2 <- c("Dec", "Apr", "Jam", "Mar")
  2. 它没有进行有用的分类:

    sort(x1)
    #> [1] "Apr" "Dec" "Jan" "Mar"

使用因子可以解决这两个问题。要创建一个因子,首先必须创建一个有效水平列表:

month_levels <- c(
  "Jan", "Feb", "Mar", "Apr", "May", "Jun", 
  "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
)

现在,您可以创建一个因子:

y1 <- factor(x1, levels = month_levels)
y1
#> [1] Dec Apr Jan Mar
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

sort(y1)
#> [1] Jan Mar Apr Dec
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

任何不在等级中的值都将被静默转换为NA

y2 <- factor(x2, levels = month_levels)
y2
#> [1] Dec  Apr  <NA> Mar 
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

这似乎有风险,因此您可能需要使用 forcats::fct()来代替:

y2 <- fct(x2, levels = month_levels)
#> Error in `fct()`:
#> ! All values of `x` must appear in `levels` or `na`
#> ℹ Missing level: "Jam"

如果省略级别,则将按字母顺序从数据中提取:

factor(x1)
#> [1] Dec Apr Jan Mar
#> Levels: Apr Dec Jan Mar

按字母顺序排序略有风险,因为并非每台计算机都会以相同的方式对字符串进行排序。因此 forcats::fct()按首次出现排序:

fct(x1)
#> [1] Dec Apr Jan Mar
#> Levels: Dec Apr Jan Mar

如果您需要直接访问有效级别集,可以使用 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

R深度学习权威指南

R深度学习权威指南

Posts & Telecom Press, Joshua F. Wiley
AI工程

AI工程

Chip Huyen
Raku学习手册

Raku学习手册

brian d foy

Publisher Resources

ISBN: 9798341657304