Hands-On Exploratory Data Analysis with R

Book description

Learn exploratory data analysis concepts using powerful R packages to enhance your R data analysis skills

Key Features

  • Speed up your data analysis projects using powerful R packages and techniques
  • Create multiple hands-on data analysis projects using real-world data
  • Discover and practice graphical exploratory analysis techniques across domains

Book Description

Hands-On Exploratory Data Analysis with R will help you build a strong foundation in data analysis and get well-versed with elementary ways to analyze data. You will learn how to understand your data and summarize its characteristics. You'll also study the structure of your data, and you'll explore graphical and numerical techniques using the R language.

This book covers the entire exploratory data analysis (EDA) process—data collection, generating statistics, distribution, and invalidating the hypothesis. As you progress through the book, you will set up a data analysis environment with tools such as ggplot2, knitr, and R Markdown, using DOE Scatter Plot and SML2010 for multifactor, optimization, and regression data problems.

By the end of this book, you will be able to successfully carry out a preliminary investigation on any dataset, uncover hidden insights, and present your results in a business context.

What you will learn

  • Learn effective R techniques that can accelerate your data analysis projects
  • Import, clean, and explore data using powerful R packages
  • Practice graphical exploratory analysis techniques
  • Create informative data analysis reports using ggplot2
  • Identify and clean missing and erroneous data
  • Explore data analysis techniques to analyze multi-factor datasets

Who this book is for

Hands-On Exploratory Data Analysis with R is for data enthusiasts who want to build a strong foundation in data analysis. If you are a data analyst, data engineer, software engineer, or product manager, this book will sharpen your skills in the complete exploratory data analysis workflow.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Exploratory Data Analysis with R
  3. Dedication
  4. About Packt
    1. Why subscribe?
    2. Packt.com
  5. Contributors
    1. About the authors
    2. About the reviewer
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
    4. Code in Action
      1. Conventions used
    5. Get in touch
      1. Reviews
  7. Section 1: Setting Up Data Analysis Environment
  8. Setting Up Our Data Analysis Environment
    1. Technical requirements
    2. The benefits of EDA across vertical markets
    3. Manipulating data
      1. Examining, cleaning, and filtering data
      2. Visualizing data
      3. Creating data reports
    4. Installing the required R packages and tools
      1. Installing R packages from the Terminal
      2. Installing R packages from inside RStudio
    5. Summary
  9. Importing Diverse Datasets
    1. Technical requirements
    2. Converting rectangular data into R with the readr R package
      1. readr read functions
        1. read_tsv method
        2. read_delim method
        3. read_fwf method
        4. read_table method
        5. read_log method
    3. Reading in Excel data with the readxl R package
    4. Reading in JSON data with the jsonlite R package
      1. Loading the jsonlite package
    5. Getting data into R from web APIs using the httr R package
    6. Getting data into R by scraping the web using the rvest package
    7. Importing data into R from relational databases using the DBI R package
    8. Summary
  10. Examining, Cleaning, and Filtering
    1. Technical requirements
    2. About the dataset
    3. Reshaping and tidying up erroneous data
      1. The gather() function
      2. The unite() function
      3. The separate() function
      4. The spread() function
    4. Manipulating and mutating data
      1. The mutate() function
      2. The group_by() function
      3. The summarize() function
      4. The arrange() function
      5. The glimpse() function
    5. Selecting and filtering data
      1. The select() function
      2. The filter() function
    6. Cleaning and manipulating time series data
    7. Summary
  11. Visualizing Data Graphically with ggplot2
    1. Technical requirements
    2. Advanced graphics grammar of ggplot2
      1. Data
      2. Layers
      3. Scales
      4. The coordinate system
      5. Faceting
      6. Theme
    3. Installing ggplot2
    4. Scatter plots
    5. Histogram plots
    6. Density plots
    7. Probability plots
      1. dnorm()
      2. pnorm()
      3. rnorm()
    8. Box plots
    9. Residual plots
    10. Summary
  12. Creating Aesthetically Pleasing Reports with knitr and R Markdown
    1. Technical requirements
    2. Installing R Markdown
      1. Working with R Markdown
    3. Reproducible data analysis reports with knitr
    4. Exporting and customizing reports
    5. Summary
  13. Section 2: Univariate, Time Series, and Multivariate Data
  14. Univariate and Control Datasets
    1. Technical requirements
    2. Reading the dataset
    3. Cleaning and tidying up the data
    4. Understanding the structure of the data
    5. Hypothesis tests
      1. Statistical hypothesis in R
        1. The t-test in R
        2. Directional hypothesis in R
        3. Correlation in R
    6. Tietjen-Moore test
    7. Parsimonious models
    8. Probability plots
    9. The Shapiro-Wilk test
    10. Summary
  15. Time Series Datasets
    1. Technical requirements
    2. Introducing and reading the dataset
    3. Cleaning the dataset
    4. Mapping and understanding structure
    5. Hypothesis test
      1. t-test in R
      2. Directional hypothesis in R
    6. Grubbs' test and checking outliers
    7. Parsimonious models
    8. Bartlett's test
    9. Data visualization
      1. Autocorrelation plots
      2. Spectrum plots
      3. Phase plots
    10. Summary
  16. Multivariate Datasets
    1. Technical requirements
    2. Introducing and reading a dataset
    3. Cleaning the data
    4. Mapping and understanding the structure
    5. Hypothesis test
      1. t-test in R
      2. Directional hypothesis in R
    6. Parsimonious model
    7. Levene's test
    8. Data visualization
      1. Principal Component Regression
      2. Partial Least Squares Regression
    9. Summary
  17. Section 3: Multifactor, Optimization, and Regression Data Problems
  18. Multi-Factor Datasets
    1. Technical requirements
    2. Introducing and reading the dataset
    3. Cleaning the dataset
    4. Mapping and understanding data structure
    5. Hypothesis test
      1. t-test in R
      2. Directional hypothesis in R
    6. Grubbs test and checking outliers
    7. Parsimonious model
    8. Multi-factor variance analysis
    9. Exploring graphically the dataset
    10. Summary
  19. Handling Optimization and Regression Data Problems
    1. Technical requirements
    2. Introducing and reading a dataset
    3. Cleaning the dataset
    4. Mapping and understanding the data structure
    5. Hypothesis test
      1. t-test in R
      2. Directional hypothesis in R
    6. Grubbs' test and checking outliers
    7. Parsimonious model
    8. Exploration using graphics
    9. Summary
  20. Section 4: Conclusions
  21. Next Steps
    1. Technical requirements
    2. What to learn next
    3. Why R?
      1. Environmental setup
      2. R syntax
      3. R packages
      4. Understanding the help system
    4. The data analysis workflow
      1. Data import
      2. Manipulating data
      3. Visualizing data
      4. Reporting results
      5. Standout as R wizard
    5. Building a data science portfolio
    6. Datasets in R
    7. Getting help with exploratory data analysis
    8. Summary
  22. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Hands-On Exploratory Data Analysis with R
  • Author(s): Radhika Datar, Harish Garg
  • Release date: May 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781789804379