book

R in a Nutshell, 2nd Edition

by Joseph Adler

October 2012

Beginner to intermediate

721 pages

21h 38m

English

O'Reilly Media, Inc.

Read now

Unlock full access

WindowsMac OS XLinux and Unix SystemsInstallation using package management systemsInstalling R from downloaded files
WindowsMac OS XLinux and Unix
Command-Line Editing
Loading Packages on Windows and LinuxLoading Packages on Mac OS X
Exploring R Package Repositories on the WebFinding and Installing Packages Inside RWindows and Linux GUIsMac OS X GUIR consoleInstalling from the command line
Creating a Package DirectoryBuilding the Package
NAInf and -InfNaNNULL
Numeric VectorsCharacter VectorsSymbols
Order of OperationsAssignments
Separating ExpressionsParenthesesCurly Braces
Conditional StatementsLoops
Data Structure OperatorsIndexing by Integer VectorIndexing by Logical VectorIndexing by Name
MatricesArraysFactorsData FramesFormulasTime SeriesShinglesDates and TimesConnections
Class
Working with the Call StackEvaluating Functions in Different EnvironmentsAdding Objects to an Environment
Signaling ErrorsCatching Errors
Anonymous FunctionsProperties of Functions
Changes to Other EnvironmentsInput/OutputGraphics
Key IdeasImplementation Example
Defining ClassesNew ObjectsAccessing SlotsWorking with ObjectsCreating Coercion MethodsMethodsManaging MethodsBasic ClassesMore Help
S3 ClassesS3 MethodsUsing S3 Classes in S4 ClassesFinding Hidden S3 Methods
Entering Data Using R CommandsUsing the Edit GUIWindows Data EditorMac OS X Data EditorX Windows (Linux) Data Editor
Saving Objects with save
Text FilesDelimited filesFixed-width filesOther functions to parse dataOther Software
Export Then ImportDatabase Connection PackagesRODBCGetting RODBC workingInstalling the RODBC packageInstalling ODBC driversExample: SQLite ODBC on Mac OS XExample: SQLite ODBC on WindowsUsing RODBCOpening a channelGetting information about the databaseGetting dataClosing a channelDBIOpening a connectionGetting DB informationQuerying the databaseCleaning upTSDBI
Pasting Together Data StructuresPasterbind and cbindAn extended exampleMerging Data by Common Fields
Reassigning VariablesThe Transform FunctionApplying a Function to Each Element of an ObjectApplying a function to an arrayApplying a function to a list or vectorthe plyr library
ShinglesCutCombining Objects with a Grouping Variable
Bracket Notationsubset FunctionRandom Sampling
tapply, aggregateAggregating Tables with rowsumCounting ValuesReshaping DataTransposing matrices and data framesReshaping data frames and matricesUsing the Reshape LibraryMelting and CastingExamples of reshapemeltCast
Scatter PlotsPlotting Time SeriesBar ChartsPie ChartsPlotting Categorical DataThree-Dimensional DataPlotting DistributionsBox Plots
Common Arguments to Chart FunctionsGraphical ParametersAnnotationMarginsMultiple plotsText propertiesText sizeTypefaceAlignment and spacingRotationLine propertiesColorsAxesPointsGraphical parameters by nameBasic Graphics Functionspointslinescurvetextablinepolygonsegmentslegendtitleaxisboxmtexttrans3d
How Lattice WorksA Simple ExampleUsing Lattice FunctionsCustom Panel Functions
Univariate Trellis PlotsBar chartsDot plotsHistogramsDensity plotsStrip plotsUnivariate quantile-quantile plotsBivariate Trellis PlotsScatter plotsBox plots in latticeScatter plots matricesBivariate quantile-quantile plotsTrivariate PlotsLevel plotsContour plotsCloud plotsWire-frame plotsOther Plots
Common Arguments to Lattice Functionstrellis.skeletonControlling How Axes Are DrawnParametersplot.trellisstrip.defaultsimpleKey
Low-Level Graphics FunctionsPanel Functions
Normal Distribution-Based TestsComparing meansComparing paired dataComparing variances of two populationsComparing means across more than two groupsPairwise t-tests between multiple groupsTesting for normalityTesting if a data vector came from an arbitrary distributionTesting if two data vectors came from the same distributionCorrelation testsNon-Parametric TestsComparing two meansComparing more than two meansComparing variancesDifference in scale parameters
Proportion TestsBinomial TestsTabular Data TestsNon-Parametric Tabular Data Tests
Fitting a ModelHelper Functions for Specifying the ModelGetting Information About a ModelViewing the modelPredicting values using a modelAnalyzing the fitRefining the Model
Assumptions of Least Squares RegressionRobust and Resistant RegressionResistant regressionRobust regressionComparing lm, lqs, and rlm
Stepwise Variable SelectionRidge RegressionLasso and Least Angle RegressionelasticnetPrincipal Components Regression and Partial Least Squares Regression
Generalized Linear ModelsglmnetNonlinear Least Squares
SplinesFitting Polynomial SurfacesKernel Smoothing
Regression Tree ModelsRecursive partitioning treesPatient rule induction methodBagging for regressionBoosting for regressionRandom forests for regressionMARSNeural NetworksProject Pursuit RegressionGeneralized Additive ModelsSupport Vector Machines
Logistic RegressionLinear Discriminant AnalysisLog-Linear Models
k Nearest NeighborsClassification Tree ModelsBaggingBoostingNeural NetworksSVMsRandom Forests
Distance MeasuresClustering Algorithms
TimingProfilingMonitor How Much Memory You Are UsingProfiling Memory Usage
Using Vector OperationsIterative algorithms and vector operationsTransforming problems to use built-in functionsLookup Performance in RLookups and R objectsUsing environment objects in place of vectorsUse a Database to Query Large Data SetsPreallocate MemoryCleaning Up MemoryFunctions for Big Data Sets
The R Byte Code CompilerManual compilationInspecting byte codeJust-in-time compilationHigh-Performance R BinariesRevolution RBuilding your ownBuilding on Microsoft WindowsBuilding R on Unix-like systemsBuilding R on Mac OS X
Loading Raw Expression DataLoading Data from GEOMatching Phenotype DataAnalyzing Expression Data
eSetAssayDataAnnotatedDataFrameMIAMEOther Classes Used by Bioconductor Packages
Resources Outside BioconductorVignettesCoursesBooks
Overview of HadoopMap/ReduceDistributed data storageManaging a cluster of serversJava frameworkWhen should you consider Hadoop?RHadoopMake sure Hadoop is installed locallyInstalling RHadoop locallyAn example RHadoop applicationDetails of rmrLearning moreHadoop StreamingLearning More
SeguedoMC
FunctionsData Sets
FunctionsData Sets
Functions
FunctionsData Sets
Functions
FunctionsData Sets
Functions
Functions
FunctionsData Sets
FunctionsData Sets
Functions
Functions
FunctionsData Sets
Functions
Functions
FunctionsData Set
Functions
FunctionsData Sets
FunctionsData Sets
Functions

Content preview from R in a Nutshell, 2nd Edition

Machine Learning Algorithms for Classification

Much like regression, there are problems where linear methods don’t work well for classification. This section describes some machine learning algorithms for classification problems.

k Nearest Neighbors

One of the simplest techniques for classification problems is k nearest neighbors. Here’s how the algorithm works:

The analyst specifies a “training” data set.
To predict the class of a new value, the algorithm looks for the k observations in the training set that are closest to the new value.
The prediction for the new value is the class of the “majority” of the k nearest neighbors.

To use k nearest neighbors in R, use the function knn in the class package:

libary(class)
knn(train, test, cl, k = 1, l = 0, prob = FALSE, use.all = TRUE)

Here is the description of the arguments to the knn function.

Argument	Description	Default
train	A matrix or data frame containing the training data.
test	A matrix or data frame containing the test data.
cl	A factor specifying the classification of observations in the training set.
k	A numeric value specifying the number of neighbors to consider.	`1`
l	When k > 0, specifies the minimum vote for a decision. (If there aren’t enough votes, the value `doubt` is returned.)	`0`
prob	If `prob=TRUE`, then the proportion of votes for the winning class is returned as attribute `prob`.	`FALSE`
use.all	Controls the handling of ties when selecting nearest neighbors. If `use.all=TRUE`, then all distances equal to the kth largest are included. If `use.all=FALSE ...`

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

R in a Nutshell

Joseph Adler

The R Book, 2nd Edition

Michael J. Crawley

The R Book

Michael J. Crawley

R Packages

Hadley Wickham

Publisher Resources

ISBN: 9781449358204Errata Page