Book description
Over 80 recipes to help you breeze through your data analysis projects using R
About This Book
- Analyse your data using the popular R packages like ggplot2 with ready-to-use and customizable recipes
- Find meaningful insights from your data and generate dynamic reports
- A practical guide to help you put your data analysis skills in R to practical use
Who This Book Is For
This book is for data scientists, analysts and even enthusiasts who want to learn and implement the various data analysis techniques using R in a practical way. Those looking for quick, handy solutions to common tasks and challenges in data analysis will find this book to be very useful. Basic knowledge of statistics and R programming is assumed.
What You Will Learn
- Acquire, format and visualize your data using R
- Using R to perform an Exploratory data analysis
- Introduction to machine learning algorithms such as classification and regression
- Get started with social network analysis
- Generate dynamic reporting with Shiny
- Get started with geospatial analysis
- Handling large data with R using Spark and MongoDB
- Build Recommendation system- Collaborative Filtering, Content based and Hybrid
- Learn real world dataset examples- Fraud Detection and Image Recognition
In Detail
Data analytics with R has emerged as a very important focus for organizations of all kinds. R enables even those with only an intuitive grasp of the underlying concepts, without a deep mathematical background, to unleash powerful and detailed examinations of their data.
This book will show you how you can put your data analysis skills in R to practical use, with recipes catering to the basic as well as advanced data analysis tasks. Right from acquiring your data and preparing it for analysis to the more complex data analysis techniques, the book will show you how you can implement each technique in the best possible manner. You will also visualize your data using the popular R packages like ggplot2 and gain hidden insights from it. Starting with implementing the basic data analysis concepts like handling your data to creating basic plots, you will master the more advanced data analysis techniques like performing cluster analysis, and generating effective analysis reports and visualizations. Throughout the book, you will get to know the common problems and obstacles you might encounter while implementing each of the data analysis techniques in R, with ways to overcoming them in the easiest possible way.
By the end of this book, you will have all the knowledge you need to become an expert in data analysis with R, and put your skills to test in real-world scenarios.
Style and Approach
- Hands-on recipes to walk through data science challenges using R
- Your one-stop solution for common and not-so-common pain points while performing real-world problems to execute a series of tasks.
- Addressing your common and not-so-common pain points, this is a book that you must have on the shelf
Table of contents
- Preface
-
Acquire and Prepare the Ingredients - Your Data
- Introduction
- Working with data
- Reading data from CSV files
- Reading XML data
- Reading JSON data
- Reading data from fixed-width formatted files
- Reading data from R files and R libraries
- Removing cases with missing values
- Replacing missing values with the mean
- Removing duplicate cases
- Rescaling a variable to specified min-max range
- Normalizing or standardizing data in a data frame
- Binning numerical data
- Creating dummies for categorical variables
- Handling missing data
- Correcting data
- Imputing data
- Detecting outliers
-
What's in There - Exploratory Data Analysis
- Introduction
- Creating standard data summaries
- Extracting a subset of a dataset
- Splitting a dataset
- Creating random data partitions
- Generating standard plots, such as histograms, boxplots, and scatterplots
- Generating multiple plots on a grid
- Creating plots with the lattice package
- Creating charts that facilitate comparisons
- Creating charts that help to visualize possible causality
-
Where Does It Belong? Classification
- Introduction
- Generating error/classification confusion matrices
- Principal Component Analysis
- Generating receiver operating characteristic charts
- Building, plotting, and evaluating with classification trees
- Using random forest models for classification
- Classifying using the support vector machine approach
- Classifying using the Naive Bayes approach
- Classifying using the KNN approach
- Using neural networks for classification
- Classifying using linear discriminant function analysis
- Classifying using logistic regression
- Text classification for sentiment analysis
-
Give Me a Number - Regression
- Introduction
- Computing the root-mean-square error
- Building KNN models for regression
- Performing linear regression
- Performing variable selection in linear regression
- Building regression trees
- Building random forest models for regression
- Using neural networks for regression
- Performing k-fold cross-validation
- Performing leave-one-out cross-validation to limit overfitting
-
Can you Simplify That? Data Reduction Techniques
- Introduction
- Performing cluster analysis using hierarchical clustering
- Performing cluster analysis using partitioning clustering
- Image segmentation using mini-batch K-means
- Partitioning around medoids
- Clustering large application
- Performing cluster validation
- Performing Advance clustering
- Model-based clustering with the EM algorithm
- Reducing dimensionality with principal component analysis
-
Lessons from History - Time Series Analysis
- Introduction
- Exploring finance datasets
- Creating and examining date objects
- Operating on date objects
- Performing preliminary analyses on time series data
- Using time series objects
- Decomposing time series
- Filtering time series data
- Smoothing and forecasting using the Holt-Winters method
- Building an automated ARIMA model
-
How does it look? - Advanced data visualization
- Introduction
- Creating scatter plots
- Creating line graphs
- Creating bar graphs
- Making distributions plots
- Creating mosaic graphs
- Making treemaps
- Plotting a correlations matrix
- Creating heatmaps
- Plotting network graphs
- Labeling and legends
- Coloring and themes
- Creating multivariate plots
- Creating 3D graphs and animation
- Selecting a graphics device
- This may also interest you - Building Recommendations
-
It's All About Your Connections - Social Network Analysis
- Introduction
- Downloading social network data using public APIs
- Creating adjacency matrices and edge lists
-
Plotting social network data
- Getting ready
- How to do it...
- How it works...
-
There's more...
- Specifying plotting preferences
- Plotting directed graphs
- Creating a graph object with weights
- Extracting the network as an adjacency matrix from the graph object
- Extracting an adjacency matrix with weights
- Extracting an edge list from a graph object
- Creating a bipartite network graph
- Generating projections of a bipartite network
- Computing important network metrics
- Cluster analysis
- Force layout
- YiFan Hu layout
- Put Your Best Foot Forward - Document and Present Your Analysis
-
Work Smarter, Not Harder - Efficient and Elegant R Code
- Introduction
- Exploiting vectorized operations
- Processing entire rows or columns using the apply function
- Applying a function to all elements of a collection with lapply and sapply
- Applying functions to subsets of a vector
- Using the split-apply-combine strategy with plyr
- Slicing, dicing, and combining data with data tables
-
Where in the World? Geospatial Analysis
- Introduction
- Downloading and plotting a Google map of an area
- Overlaying data on the downloaded Google map
- Importing ESRI shape files to R
- Using the sp package to plot geographic data
- Getting maps from the maps package
- Creating spatial data frames from regular data frames containing spatial and other data
- Creating spatial data frames by combining regular data frames with spatial objects
- Adding variables to an existing spatial data frame
- Spatial data analysis with R and QGIS
-
Playing Nice - Connecting to Other Systems
- Introduction
- Using Java objects in R
- Using JRI to call R functions from Java
- Using Rserve to call R functions from Java
- Executing R scripts from Java
- Using the xlsx package to connect to Excel
- Reading data from relational databases - MySQL
- Reading data from NoSQL databases - MongoDB
- Working with in-memory data processing with Apache Spark
Product information
- Title: R Data Analysis Cookbook - Second Edition
- Author(s):
- Release date: September 2017
- Publisher(s): Packt Publishing
- ISBN: 9781787124479
You might also like
book
Data Analysis with R - Second Edition
Learn, by example, the fundamentals of data analysis as well as several intermediate to advanced methods …
book
R: Data Analysis and Visualization
Master the art of building analytical models using R About This Book Load, wrangle, and analyze …
book
R for Data Science Cookbook
Over 100 hands-on recipes to effectively solve real-world data problems using the most popular R packages …
book
R Data Analysis Projects
Get valuable insights from your data by building data analysis systems from scratch with R. About …