Skip to Content
R在数据科学中的应用,第2版
book

R在数据科学中的应用,第2版

by Hadley Wickham, Mine Cetinkaya-Rundel, Garrett Grolemund
May 2025
Intermediate to advanced
578 pages
8h 9m
Chinese
O'Reilly Media, Inc.
Content preview from R在数据科学中的应用,第2版

第 13 章 数字 编号

本作品已使用人工智能进行翻译。欢迎您提供反馈和意见:translation-feedback@oreilly.com

导言

数值矢量是数据科学的支柱,在本书的前面部分,你已经多次使用过它们。现在,是时候系统地了解一下在 R 中可以用它们做些什么了,以确保你能很好地应对未来任何涉及数值矢量的问题。

我们先给你几个工具,如果你有字符串,就可以用它们来做数字,然后再详细介绍一下 count().然后,我们将深入研究各种数字变换,这些变换与 mutate()的各种数字变换,包括可用于其他类型矢量但常用于数字矢量的通用变换。最后,我们将介绍与 summarize()搭配的摘要函数,并向您展示如何将它们与 mutate().

先决条件

本章主要使用基础 R 的函数,这些函数无需加载任何软件包即可使用。但我们仍然需要 tidyverse,因为我们将在 tidyverse 函数中使用这些基础 R 函数,例如 mutate()filter().与前一章一样,我们将使用来自 nycflights13 的真实示例,以及使用 c()tribble().

library(tidyverse)
library(nycflights13)

制作数字

在大多数情况下,你会得到已经记录在 R 数值类型之一的数字:整数或双倍。但在某些情况下,你会遇到字符串形式的数字,这可能是因为你从列头透视创建了这些数字,也可能是因为在数据导入过程中出了问题。

readr 提供了两个有用的函数,用于将字符串解析为数字: parse_double()parse_number().使用 parse_double()来将数字写成字符串:

x <- c("1.2", "5.6", "1e3")
parse_double(x)
#> [1]    1.2    5.6 1000.0

使用 parse_number()当字符串中包含要忽略的非数字文本时。这对货币数据和百分比尤其有用:

x <- c("$1,234", "USD 3,513", "59%")
parse_number(x)
#> [1] 1234 3513   59

计数

仅凭计数和一点基本算术,就能完成如此多的数据科学工作,实在令人吃惊。 count().该函数非常适合在分析过程中进行快速探索和检查:

flights |> count(dest)
#> # A tibble: 105 × 2
#>   dest      n
#>   <chr> <int>
#> 1 ABQ     254
#> 2 ACK     265
#> 3 ALB     439
#> 4 ANC       8
#> 5 ATL   17215
#> 6 AUS    2439
#> # … with 99 more rows

(尽管第 4 章给出了建议,但我们通常还是把 count()放在一行,因为它通常用于在控制台快速检查计算是否按预期进行)。

如果您想查看最常见的值,请添加sort = TRUE

flights |> count(dest, sort = TRUE)
#> # A tibble: 105 × 2
#>   dest      n
#>   <chr> <int>
#> 1 ORD   17283
#> 2 ATL   17215
#> 3 LAX   16174
#> 4 BOS   15508
#> 5 MCO   14082
#> 6 CLT   14064
#> # … with 99 more rows

请记住,如果您想查看所有值,可以使用|> View()|> print(n ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

R深度学习权威指南

R深度学习权威指南

Posts & Telecom Press, Joshua F. Wiley
AI工程

AI工程

Chip Huyen
Raku学习手册

Raku学习手册

brian d foy

Publisher Resources

ISBN: 9798341657304