O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

R Data Science Essentials

Book Description

Learn the essence of data science and visualization using R in no time at all

About This Book

  • Become a pro at making stunning visualizations and dashboards quickly and without hassle
  • For better decision making in business, apply the R programming language with the help of useful statistical techniques.
  • From seasoned authors comes a book that offers you a plethora of fast-paced techniques to detect and analyze data patterns

Who This Book Is For

If you are an aspiring data scientist or analyst who has a basic understanding of data science and has basic hands-on experience in R or any other analytics tool, then R Data Science Essentials is the book for you.

What You Will Learn

  • Perform data preprocessing and basic operations on data
  • Implement visual and non-visual implementation data exploration techniques
  • Mine patterns from data using affinity and sequential analysis
  • Use different clustering algorithms and visualize them
  • Implement logistic and linear regression and find out how to evaluate and improve the performance of an algorithm
  • Extract patterns through visualization and build a forecasting algorithm
  • Build a recommendation engine using different collaborative filtering algorithms
  • Make a stunning visualization and dashboard using ggplot and R shiny

In Detail

With organizations increasingly embedding data science across their enterprise and with management becoming more data-driven it is an urgent requirement for analysts and managers to understand the key concept of data science. The data science concepts discussed in this book will help you make key decisions and solve the complex problems you will inevitably face in this new world.

R Data Science Essentials will introduce you to various important concepts in the field of data science using R. We start by reading data from multiple sources, then move on to processing the data, extracting hidden patterns, building predictive and forecasting models, building a recommendation engine, and communicating to the user through stunning visualizations and dashboards.

By the end of this book, you will have an understanding of some very important techniques in data science, be able to implement them using R, understand and interpret the outcomes, and know how they helps businesses make a decision.

Style and approach

This easy-to-follow guide contains hands-on examples of the concepts of data science using R.

Table of Contents

  1. R Data Science Essentials
    1. Table of Contents
    2. R Data Science Essentials
    3. Credits
    4. About the Authors
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    8. 1. Getting Started with R
      1. Reading data from different sources
      2. Reading data from a database
      3. Data types in R
        1. Variable data types
      4. Data preprocessing techniques
      5. Performing data operations
        1. Arithmetic operations on the data
        2. String operations on the data
        3. Aggregation operations on the data
          1. Mean
          2. Median
          3. Sum
          4. Maximum and minimum
          5. Standard deviation
      6. Control structures in R
        1. Control structures – if and else
        2. Control structures – for
        3. Control structures – while
        4. Control structures – repeat and break
        5. Control structures – next and return
      7. Bringing data to a usable format
      8. Summary
    9. 2. Exploratory Data Analysis
      1. The Titanic dataset
      2. Descriptive statistics
        1. Box plot
        2. Exercise
      3. Inferential statistics
      4. Univariate analysis
      5. Bivariate analysis
      6. Multivariate analysis
        1. Cross-tabulation analysis
        2. Graphical analysis
      7. Summary
    10. 3. Pattern Discovery
      1. Transactional datasets
        1. Using the built-in dataset
        2. Building the dataset
      2. Apriori analysis
      3. Support, confidence, and lift
        1. Support
        2. Confidence
        3. Lift
      4. Generating filtering rules
      5. Plotting
        1. Dataset
        2. Rules
      6. Sequential dataset
      7. Apriori sequence analysis
      8. Understanding the results
        1. Reference
      9. Business cases
      10. Summary
    11. 4. Segmentation Using Clustering
      1. Datasets
        1. Reading and formatting the dataset in R
      2. Centroid-based clustering and an ideal number of clusters
      3. Implementation using K-means
      4. Visualizing the clusters
      5. Connectivity-based clustering
      6. Visualizing the connectivity
      7. Business use cases
      8. Summary
    12. 5. Developing Regression Models
      1. Datasets
      2. Sampling the dataset
      3. Logistic regression
      4. Evaluating logistic regression
      5. Linear regression
      6. Evaluating linear regression
      7. Methods to improve the accuracy
      8. Ensemble models
        1. Replacing NA with mean or median
        2. Removing the highly correlated values
        3. Removing outliers
      9. Summary
    13. 6. Time Series Forecasting
      1. Datasets
      2. Extracting patterns
      3. Forecasting using ARIMA
      4. Forecasting using Holt-Winters
      5. Methods to improve accuracy
      6. Summary
    14. 7. Recommendation Engine
      1. Dataset and transformation
      2. Recommendations using user-based CF
      3. Recommendations using item-based CF
      4. Challenges and enhancements
      5. Summary
    15. 8. Communicating Data Analysis
      1. Dataset
      2. Plotting using the googleVis package
      3. Creating an interactive dashboard using Shiny
      4. Summary
    16. Index