Skip to Content
高效R语言编程
book

高效R语言编程

by Colin Gillespie, Robin Lovelace
August 2018
Intermediate to advanced
227 pages
4h 16m
Chinese
China Electric Power Press Ltd.
Content preview from 高效R语言编程
112
6
猜测并试验下面命令的执行结果:
print
df_base
df_base$colA
df_base$col
df_base$colB
创建一个
tibble
数据集并重复前面的命令。
使用
tidyr
与正则表达式整理数据
数据分析的关键技巧是理解数据集的结构并能够“重构”它们。
从工作流
程效率角度而言这很重要:超过一半的数据分析时间花在重组数据工作上
((
Wickham 2014b
,
所以尽早将数据处理成合适的形式可为以后节约不少
时间。从计算效率而言,数据处理成“整洁”的格式也是有利的,因为在整
洁的数据上执行分析与绘图命令通常较快。
整理数据包括数据清理与数据重构。数据清理是重定格式与标记脏数据的过
程。
stringi
stringr
的包可以通过正则表达式帮助更新脏字符串;
assertive
assertr
包可在数据分析项目的一开始进行数据完整性的校验。
通常的数据
清理任务是将非标准文本字符串转换成
lubridate
简介所描述的数据格式
[
vignette("lubridate")
]
。整洁是个广泛的概念,当然也包括重构数据,
以便有利于数据分析与建模。表
6-1
和表
6-2
给出了重构的过程,由
Hadley
Wickham
提供,可通过如下代码加载:
library("efficient")
data(pew) # see ?pew - dataset from the efficient package
pew[1:3, 1:4] # take a look at the data
#> #
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学之编程技术:使用R进行数据清理、分析与可视化

数据科学之编程技术:使用R进行数据清理、分析与可视化

迈克尔 弗里曼, 乔尔 罗斯
R数据科学

R数据科学

Hadley Wickham, Garrett Grolemund

Publisher Resources

ISBN: 9787519820855