book

高效R语言编程

Name: 高效R语言编程
ISBN: 9787519820855

by Colin Gillespie, Robin Lovelace

August 2018

Intermediate to advanced

227 pages

4h 16m

Chinese

China Electric Power Press Ltd.

Read now

Unlock full access

Content preview from 高效R语言编程

109

第

章

高效数据木匠

有很多词来描述数据处理。在准备进入下一阶段前，

你可清理（

clean

）、删

减（

hack

）、操作（

manipluate

）、改写（

munge

）、提炼（

refine

）与整理（

tidy

）

你的数据集。

每个词均表达了人们对数据处理过程的看法：在真正有趣与重

要工作开始之前，数据处理被看作脏活累活，是必须忍受的不愉快的阶段。

这个看法是错误的。将你的数据整理好是一个可敬的、某些时候是至关重要

的技能。基于这个原因，我们使用一个令人钦佩的词语（数据木匠）来描述

该工作。

这种比喻并非偶然。

木匠是用几块粗糙的木头，通过细心、勤奋并精确的劳动，

从而创造出最终产品。一个木匠不是随机乱砍木头。他将检查原料并为自己

的工作选用合适的工具。同样，

数据木工是将粗糙、原始甚至某种程度上随

机排列的输入数据创造成整齐有序的数据的过程。早早学习数据木匠的技巧

将为后续工作带来收益。正如谚语所说：“磨刀不误砍柴工”。

在任何涉及外部资源（即，多数是真实的应用）数据集的项目中，数据处理

是关键阶段。

正如在第

章讨论的技术债务会削弱工作流程

处理杂乱的数

据同样会导致项目管理工作糟糕透顶。

幸运的是，如果你在项目的开始（而不是半途才做，那样可能太晚了）就高

效的处理数据并选用合适工具，这样的数据处理阶段是非常有意义的。

更重

要的是，以效率角度来看，你的

项目各子阶段均会从整齐有序的数据中受益。

所以，对于数据密集型应用程序，这可能是本书中最重要的一章。

在本章，

我们将涉及如下内容：

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9787519820855

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

高效R语言编程

by Colin Gillespie, Robin Lovelace

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

数据科学之编程技术：使用R进行数据清理、分析与可视化

商战数据挖掘：你需要了解的数据科学与分析思维

R数据科学

数据库系统内幕

Publisher Resources