book

R in a Nutshell

Name: R in a Nutshell
Author: Joseph Adler
ISBN: 9780596801700

by Joseph Adler

January 2010

Beginner

634 pages

19h 50m

English

O'Reilly Media, Inc.

Read now

Unlock full access

R in a Nutshell
A Note Regarding Supplemental Files
Preface
Why I Wrote This Book
When Should You Use R?
R License Terms
Examples
How This Book Is Organized
Conventions Used in This Book
Using Code Examples

How to Contact Us
Safari® Books Online
Acknowledgments
I. R Basics
1. Getting and Installing R
R Versions
Getting and Installing Interactive R Binaries
WindowsMac OS XLinux and Unix SystemsInstallation using package management systemsInstalling R from downloaded files
2. The R User Interface
The R Graphical User Interface
WindowsMac OS XLinux and Unix
The R Console
Command-Line Editing
Batch Mode
Using R Inside Microsoft Excel
Other Ways to Run R
3. A Short R Tutorial
Basic Operations in R
Functions
Variables
Introduction to Data Structures
Objects and Classes
Models and Formulas
Charts and Graphics
Getting Help
4. R Packages
An Overview of Packages
Listing Packages in Local Libraries
Loading Packages
Loading Packages on Windows and LinuxLoading Packages on Mac OS X
Exploring Package Repositories
Exploring Packages on the WebFinding and Installing Packages Inside RWindows and Linux GUIsMac OS X GUIR consoleInstalling from the command line
Custom Packages
Creating a Package DirectoryBuilding the Package
II. The R Language
5. An Overview of the R Language
Expressions
Objects
Symbols
Functions
Objects Are Copied in Assignment Statements
Everything in R Is an Object
Special Values
NAInf and -InfNaNNULL
Coercion
The R Interpreter
Seeing How R Works
6. R Syntax
Constants
Numeric VectorsCharacter VectorsSymbols
Operators
Order of OperationsAssignments
Expressions
Separating ExpressionsParenthesesCurly Braces
Control Structures
Conditional StatementsLoops
Accessing Data Structures
Data Structure OperatorsIndexing by Integer VectorIndexing by Logical VectorIndexing by Name
R Code Style Standards
7. R Objects
Primitive Object Types
Vectors
Lists
Other Objects
MatricesArraysFactorsData FramesFormulasTime SeriesShinglesDates and TimesConnections
Attributes
Class
8. Symbols and Environments
Symbols
Working with Environments
The Global Environment
Environments and Functions
Working with the Call StackEvaluating Functions in Different EnvironmentsAdding Objects to an Environment
Exceptions
Signaling ErrorsCatching Errors
9. Functions
The Function Keyword
Arguments
Return Values
Functions As Arguments
Anonymous FunctionsProperties of Functions
Argument Order and Named Arguments
Side Effects
Changes to Other EnvironmentsInput/OutputGraphics
10. Object-Oriented Programming
Overview of Object-Oriented Programming in R
Key IdeasImplementation Example
Object-Oriented Programming in R: S4 Classes
Defining ClassesNew ObjectsAccessing SlotsWorking with ObjectsCreating Coercion MethodsMethodsManaging MethodsBasic ClassesMore Help
Old-School OOP in R: S3
S3 ClassesS3 MethodsUsing S3 Classes in S4 ClassesFinding Hidden S3 Methods
11. High-Performance R
Use Built-in Math Functions
Use Environments for Lookup Tables
Use a Database to Query Large Data Sets
Preallocate Memory
Monitor How Much Memory You Are Using
Monitoring Memory UsageIncreasing Memory LimitsCleaning Up Objects
Functions for Big Data Sets
Parallel Computation with R
High-Performance R Binaries
Revolution RBuilding Your OwnBuilding on Microsoft WindowsBuilding R on Unix-like systemsBuilding R on Mac OS X
III. Working with Data
12. Saving, Loading, and Editing Data
Entering Data Within R
Entering Data Using R CommandsUsing the Edit GUIWindows Data EditorMac OS X Data EditorX Windows (Linux) Data Editor
Saving and Loading R Objects
Saving Objects with save
Importing Data from External Files
Text FilesDelimited filesFixed-width filesOther functions to parse dataOther Software
Exporting Data
Importing Data from Databases
Export Then ImportDatabase Connection PackagesRODBCGetting RODBC workingInstalling the RODBC packageInstalling ODBC driversExample: SQLite ODBC on Mac OS XExample: SQLite ODBC on WindowsUsing RODBCOpening a channelGetting information about the databaseGetting dataClosing a channelDBIOpening a connectionGetting DB informationQuerying the databaseCleaning upTSDBI
13. Preparing Data
Combining Data Sets
Pasting Together Data StructuresPasterbind and cbindAn extended exampleMerging Data by Common Fields
Transformations
Reassigning VariablesThe Transform FunctionApplying a Function to Each Element of an ObjectApplying a function to an arrayApplying a function to a list or vector
Binning Data
ShinglesCutCombining Objects with a Grouping Variable
Subsets
Bracket Notationsubset FunctionRandom Sampling
Summarizing Functions
tapply, aggregateAggregating Tables with rowsumCounting ValuesReshaping DataTransposing matrices and data framesReshaping data frames and matrices
Data Cleaning
Finding and Removing Duplicates
Sorting
14. Graphics
An Overview of R Graphics
Scatter PlotsPlotting Time SeriesBar ChartsPie ChartsPlotting Categorical DataThree-Dimensional DataPlotting DistributionsBox Plots
Graphics Devices
Customizing Charts
Common Arguments to Chart FunctionsGraphical ParametersAnnotationMarginsMultiple plotsText propertiesText sizeTypefaceAlignment and spacingRotationLine propertiesColorsAxesPointsGraphical parameter by nameBasic Graphics Functionspointslinescurvetextablinepolygonsegmentslegendtitleaxisboxmtexttrans3d
15. Lattice Graphics
History
An Overview of the Lattice Package
How Lattice WorksA Simple ExampleUsing Lattice FunctionsCustom Panel Functions
High-Level Lattice Plotting Functions
Univariate Trellis PlotsBar chartsDot plotsHistogramsDensity plotsStrip plotsUnivariate quantile-quantile plotsBivariate Trellis PlotsScatter plotsBox plots in latticeScatter plots matricesBivariate quantile-quantile plotsTrivariate PlotsLevel plotsContour plotsCloud plotsWire-frame plotsOther Plots
Customizing Lattice Graphics
Common Arguments to Lattice Functionstrellis.skeletonControlling How Axes Are DrawnParametersplot.trellisstrip.defaultsimpleKey
Low-Level Functions
Low-Level Graphics FunctionsPanel Functions
IV. Statistics with R
16. Analyzing Data
Summary Statistics
Correlation and Covariance
Principal Components Analysis
Factor Analysis
Bootstrap Resampling
17. Probability Distributions
Normal Distribution
Common Distribution-Type Arguments
Distribution Function Families
18. Statistical Tests
Continuous Data
Normal Distribution-Based TestsComparing meansComparing paired dataComparing variances of two populationsComparing means across more than two groupsPairwise t-tests between multiple groupsTesting for normalityTesting if a data vector came from an arbitrary distributionTesting if two data vectors came from the same distributionCorrelation testsNon-Parametric TestsComparing two meansComparing more than two meansComparing variancesDifference in scale parameters
Discrete Data
Proportion TestsBinomial TestsTabular Data TestsNon-Parametric Tabular Data Tests
19. Power Tests
Experimental Design Example
t-Test Design
Proportion Test Design
ANOVA Test Design
20. Regression Models
Example: A Simple Linear Model
Fitting a ModelHelper Functions for Specifying the ModelGetting Information About a ModelViewing the modelPredicting values using a modelAnalyzing the fitRefining the Model
Details About the lm Function
Assumptions of Least Squares RegressionRobust and Resistant RegressionResistant regressionRobust regressionComparing lm, lqs, and rlm
Subset Selection and Shrinkage Methods
Stepwise Variable SelectionRidge RegressionLasso and Least Angle RegressionPrincipal Components Regression and Partial Least Squares Regression
Nonlinear Models
Generalized Linear ModelsNonlinear Least Squares
Survival Models
Smoothing
SplinesFitting Polynomial SurfacesKernel Smoothing
Machine Learning Algorithms for Regression
Regression Tree ModelsRecursive partitioning treesPatient rule induction methodBagging for regressionBoosting for regressionRandom forests for regressionMARSNeural NetworksProject Pursuit RegressionGeneralized Additive ModelsSupport Vector Machines
21. Classification Models
Linear Classification Models
Logistic RegressionLinear Discriminant AnalysisLog-Linear Models
Machine Learning Algorithms for Classification
k Nearest NeighborsClassification Tree ModelsBaggingBoostingNeural NetworksSVMsRandom Forests
22. Machine Learning
Market Basket Analysis
Clustering
Distance MeasuresClustering Algorithms
23. Time Series Analysis
Autocorrelation Functions
Time Series Models
24. Bioconductor
An Example
Loading Raw Expression DataLoading Data from GEOMatching Phenotype DataAnalyzing Expression Data
Key Bioconductor Packages
Data Structures
eSetAssayDataAnnotatedDataFrameMIAMEOther Classes Used by Bioconductor Packages
Where to Go Next
Resources Outside BioconductorVignettesCoursesBooks
A. R Reference
base
FunctionsData Sets
boot
FunctionsData Sets
class
Functions
cluster
FunctionsData Sets
codetools
foreign
Functions
grDevices
FunctionsData Sets
graphics
Functions
grid
KernSmooth
Functions
lattice
FunctionsData Sets
MASS
FunctionsData Sets
methods
Functions
mgcv
nlme
nnet
Functions
rpart
FunctionsData Sets
spatial
Functions
splines
Functions
stats
FunctionsData Set
stats4
Functions
survival
FunctionsData Sets
tcltk
tools
FunctionsData Sets
utils
Functions
Bibliography
Index
About the Author
Colophon
Copyright

Content preview from R in a Nutshell

Clustering

Another important data mining technique is clustering. Clustering is a way to find similar sets of observations in a data set; groups of similar observations are called clusters. There are several functions available for clustering in R.

Distance Measures

To effectively use clustering algorithms, you need to begin by measuring the distance between observations. A convenient way to do this in R is through the function dist in the stats package:

dist(x, method = "euclidean", diag = FALSE, upper = FALSE, p = 2)

The dist function computes the distance between pairs of objects in another object, such as as matrix or a data frame. It returns a distance matrix (an object of type “dist”) containing the computed distances. Here is a description of the arguments to dist.

Argument	Description	Default
x	The object on which to compute distances. Must be a data frame, matrix, or “dist” object.
method	The method for computing distances. Specify `method="euclidean"` for Euclidean distances (2-norm), `method="maximum"` for the maximum distance between observations (supremum norm), `method="manhattan"` for the absolute distance between two vectors (1-norm), `method="canberra"` for Canberra distances (see the help file), `method="binary"` to regard nonzero values as 1 and zeros as 0, or `method="minkowski"` to use the p-norm (the pth root of the sum of the pth powers of the differences of the components).	“euclidean”
diag	A logical value specifying whether the diagonal of the distance matrix should be printed by ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781449377502Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

R in a Nutshell

by Joseph Adler

Clustering

Distance Measures

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

R in a Nutshell, 2nd Edition

The Big R-Book

The R Book

Perfecting Your Thinking Skills

Publisher Resources