Skip to Content
R在数据科学中的应用,第2版
book

R在数据科学中的应用,第2版

by Hadley Wickham, Mine Cetinkaya-Rundel, Garrett Grolemund
May 2025
Intermediate to advanced
578 pages
8h 9m
Chinese
O'Reilly Media, Inc.
Content preview from R在数据科学中的应用,第2版

第 7 章 数据导入 数据导入

本作品已使用人工智能进行翻译。欢迎您提供反馈和意见:translation-feedback@oreilly.com

导言

使用 R 软件包提供的数据是学习数据科学工具的好方法,但你也希望能将所学应用到自己的数据中。在本章中,你将学习将数据文件读入 R 的基础知识。

具体来说,本章将重点介绍如何读取纯文本矩形文档。我们将从处理列名、类型和缺失数据等特征的实用建议开始。然后,您将学习如何一次性从多个文件中读取数据,以及如何将数据从 R 写入文件。最后,您将学习如何在 R 中手工制作数据帧。

先决条件

在本章中,你将学习如何使用作为核心 tidyverse 一部分的 readr 软件包在 R 中加载平面文件:

library(tidyverse)

从文件中读取数据

开始,我们将重点介绍最常见的矩形数据文件类型:CSV 是 "逗号分隔值 "的简称。下面是一个简单的 CSV 文件。第一行通常称为标题行,给出列名,下面六行提供数据。各列之间用逗号隔开,又称分隔

Student ID,Full Name,favourite.food,mealPlan,AGE
1,Sunil Huffmann,Strawberry yoghurt,Lunch only,4
2,Barclay Lynn,French fries,Lunch only,5
3,Jayendra Lyne,N/A,Breakfast and lunch,7
4,Leon Rossini,Anchovies,Lunch only,
5,Chidiegwu Dunkel,Pizza,Breakfast and lunch,five
6,Güvenç Attila,Ice cream,Lunch only,6

表 7-1表示与表格相同的数据。

表 7-1. 以表格形式显示的 students.csv 文件中的数据
学生证 全名 favourite.food 用餐计划 年龄
1 苏尼尔-哈夫曼 草莓酸奶 仅限午餐 4
2 巴克利-林恩 炸薯条 仅限午餐 5
3 Jayendra Lyne 不适用 早餐和午餐 7
4 莱昂-罗西尼 鳀鱼 仅限午餐 NA
5 奇迪格乌-邓克尔 比萨 早餐和午餐 五个
6 居文奇-阿提拉 冰淇淋 仅限午餐 6

我们可以使用 read_csv().第一个参数是最重要的:文件路径。你可以把路径看作文件的地址:文件名是students.csv ,它位于data 文件夹中。

students <- read_csv("data/students.csv")
#> Rows: 6 Columns: 5
#> ── Column specification ─────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (4): Full Name, favourite.food, mealPlan, AGE
#> dbl (1): Student ID
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

R深度学习权威指南

R深度学习权威指南

Posts & Telecom Press, Joshua F. Wiley
AI工程

AI工程

Chip Huyen
Raku学习手册

Raku学习手册

brian d foy

Publisher Resources

ISBN: 9798341657304