Book description
Over 100 handson recipes to effectively solve realworld data problems using the most popular R packages and techniques
About This Book
 Gain insight into how data scientists collect, process, analyze, and visualize data using some of the most popular R packages
 Understand how to apply useful data analysis techniques in R for realworld applications
 An easytofollow guide to make the life of data scientist easier with the problems faced while performing data analysis
Who This Book Is For
This book is for those who are already familiar with the basic operation of R, but want to learn how to efficiently and effectively analyze realworld data problems using practical R packages.
What You Will Learn
 Get to know the functional characteristics of R language
 Extract, transform, and load data from heterogeneous sources
 Understand how easily R can confront probability and statistics problems
 Get simple R instructions to quickly organize and manipulate large datasets
 Create professional data visualizations and interactive reports
 Predict user purchase behavior by adopting a classification approach
 Implement data mining techniques to discover items that are frequently purchased together
 Group similar text documents by using various clustering methods
In Detail
This cookbook offers a range of data analysis samples in simple and straightforward R code, providing stepbystep resources and timesaving methods to help you solve data problems efficiently.
The first section deals with how to create R functions to avoid the unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation is provided, illustrating how to use the ?dplyr? and ?data.table? packages to efficiently process larger data structures. We also focus on ?ggplot2? and show you how to create advanced figures for data exploration.
In addition, you will learn how to build an interactive report using the ?ggvis? package. Later chapters offer insight into time series analysis on financial data, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction.
By the end of this book, you will understand how to resolve issues and will be able to comfortably offer solutions to problems encountered while performing data analysis.
Style and approach
This easytofollow guide is full of handson examples of data analysis with R. Each topic is fully explained beginning with the core concept, followed by stepbystep practical examples, and concluding with detailed explanations of each concept used.
Publisher resources
Table of contents

R for Data Science Cookbook
 Table of Contents
 R for Data Science Cookbook
 Credits
 About the Author
 About the Reviewer
 www.PacktPub.com
 Preface
 1. Functions in R
 2. Data Extracting, Transforming, and Loading
 3. Data Preprocessing and Preparation

4. Data Manipulation
 Introduction
 Enhancing a data.frame with a data.table
 Managing data with a data.table
 Performing fast aggregation with a data.table
 Merging large datasets with a data.table
 Subsetting and slicing data with dplyr
 Sampling data with dplyr
 Selecting columns with dplyr
 Chaining operations in dplyr
 Arranging rows with dplyr
 Eliminating duplicated rows with dplyr
 Adding new columns with dplyr
 Summarizing data with dplyr
 Merging data with dplyr
 5. Visualizing Data with ggplot2

6. Making Interactive Reports
 Introduction
 Creating R Markdown reports
 Learning the markdown syntax
 Embedding R code chunks
 Creating interactive graphics with ggvis
 Understanding basic syntax and grammar
 Controlling axes and legends
 Using scales
 Adding interactivity to a ggvis plot
 Creating an R Shiny document
 Publishing an R Shiny report

7. Simulation from Probability Distributions
 Introduction
 Generating random samples
 Understanding uniform distributions
 Generating binomial random variates
 Generating Poisson random variates
 Sampling from a normal distribution
 Sampling from a chisquared distribution
 Understanding Student's tdistribution
 Sampling from a dataset
 Simulating the stochastic process

8. Statistical Inference in R
 Introduction
 Getting confidence intervals
 Performing Ztests
 Performing student's Ttests
 Conducting exact binomial tests
 Performing KolmogorovSmirnov tests
 Working with the Pearson's chisquared tests
 Understanding the Wilcoxon Rank Sum and Signed Rank tests
 Conducting oneway ANOVA
 Performing twoway ANOVA

9. Rule and Pattern Mining with R
 Introduction
 Transforming data into transactions
 Displaying transactions and associations
 Mining associations with the Apriori rule
 Pruning redundant rules
 Visualizing association rules
 Mining frequent itemsets with Eclat
 Creating transactions with temporal information
 Mining frequent sequential patterns with cSPADE
 10. Time Series Mining with R

11. Supervised Machine Learning
 Introduction
 Fitting a linear regression model with lm
 Summarizing linear model fits
 Using linear regression to predict unknown values
 Measuring the performance of the regression model
 Performing a multiple regression analysis
 Selecting the bestfitted regression model with stepwise regression
 Applying the Gaussian model for generalized linear regression
 Performing a logistic regression analysis
 Building a classification model with recursive partitioning trees
 Visualizing a recursive partitioning tree
 Measuring model performance with a confusion matrix
 Measuring prediction performance using ROCR

12. Unsupervised Machine Learning
 Introduction
 Clustering data with hierarchical clustering
 Cutting tree into clusters
 Clustering data with the kmeans method
 Clustering data with the densitybased method
 Extracting silhouette information from clustering
 Comparing clustering methods
 Recognizing digits using the densitybased clustering method
 Grouping similar text documents with kmeans clustering methods
 Performing dimension reduction with Principal Component Analysis (PCA)
 Determining the number of principal components using a scree plot
 Determining the number of principal components using the Kaiser method
 Visualizing multivariate data using a biplot
 Index
Product information
 Title: R for Data Science Cookbook
 Author(s):
 Release date: July 2016
 Publisher(s): Packt Publishing
 ISBN: 9781784390815
You might also like
book
Data Analysis with R  Second Edition
Learn, by example, the fundamentals of data analysis as well as several intermediate to advanced methods …
book
Statistics for Machine Learning
Build Machine Learning models with a sound statistical understanding. About This Book Learn about the statistics …
video
Python Fundamentals
51+ hours of video instruction. Overview The professional programmer’s Deitel® video guide to Python development with …
book
Data Science from Scratch, 2nd Edition
To really learn data science, you should not only master the tools—data science libraries, frameworks, modules, …