Hands-On Data Science with R

Book description

A hands-on guide for professionals to perform various data science tasks in R

Key Features

  • Explore the popular R packages for data science
  • Use R for efficient data mining, text analytics and feature engineering
  • Become a thorough data science professional with the help of hands-on examples and use-cases in R

Book Description

R is the most widely used programming language, and when used in association with data science, this powerful combination will solve the complexities involved with unstructured datasets in the real world. This book covers the entire data science ecosystem for aspiring data scientists, right from zero to a level where you are confident enough to get hands-on with real-world data science problems.

The book starts with an introduction to data science and introduces readers to popular R libraries for executing data science routine tasks. This book covers all the important processes in data science such as data gathering, cleaning data, and then uncovering patterns from it. You will explore algorithms such as machine learning algorithms, predictive analytical models, and finally deep learning algorithms. You will learn to run the most powerful visualization packages available in R so as to ensure that you can easily derive insights from your data.

Towards the end, you will also learn how to integrate R with Spark and Hadoop and perform large-scale data analytics without much complexity.

What you will learn

  • Understand the R programming language and its ecosystem of packages for data science
  • Obtain and clean your data before processing
  • Master essential exploratory techniques for summarizing data
  • Examine various machine learning prediction, models
  • Explore the H2O analytics platform in R for deep learning
  • Apply data mining techniques to available datasets
  • Work with interactive visualization packages in R
  • Integrate R with Spark and Hadoop for large-scale data analytics

Who this book is for

If you are a budding data scientist keen to learn about the popular pandas library, or a Python developer looking to step into the world of data analysis, this book is the ideal resource you need to get started. Some programming experience in Python will be helpful to get the most out of this course

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Data Science with R
  3. About Packt
    1. Why subscribe?
    2. Packt.com
  4. Contributors
    1. About the authors
    2. About the reviewer
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Getting Started with Data Science and R
    1. Introduction to data science
      1. Key components of data science
        1. Computer science
        2. Predictive analytics (machine learning)
        3. Domain knowledge
    2. Active domains of data science
      1. Finance
      2. Healthcare
      3. Pharmaceuticals
      4. Government
      5. Manufacturing and retail
      6. Web industry
      7. Other industries
    3. Solving problems with data science
    4. Using R for data science
      1. Key features of R
    5. Our first R program
      1. UN development index
    6. Summary
    7. Quiz
  7. Descriptive and Inferential Statistics
    1. Measures of central tendency and dispersion
      1. Measures of central tendency
        1. Calculating mean, median, and mode with base R
      2. Measures of dispersion
        1. Useful functions to draw automated summaries
    2. Statistical hypothesis testing
      1. Running t-tests with R
        1. Decision rule – a brief overview of the p-value approach
        2. Be careful
      2. Running z-tests with R
        1. Elaborating a little longer
      3. A/B testing – a brief introduction and a practical example with R
    3. Summary
    4. Quiz
  8. Data Wrangling with R
    1. Introduction to data wrangling with R
      1. Data types, formats, and sources
    2. Data extraction, transformation, and load
      1. Basic tools of data wrangling
      2. Using base R for data manipulation and analysis
      3. Applying families of functions 
        1. Aggregation functions
        2. Merging DataFrames
        3. Using tibble and dplyr for data manipulation
        4. Basic dplyr usage
          1. Using select
          2. Filtering with filter
          3. Using arrange for sorting
          4. Summarise
          5. Sampling data
          6. The tidyr  package
        5. Converting wide tables into long tables
        6. Converting wide tables into long tables
        7. Joining tables
        8. dbplyr – databases and dplyr
      4. Using data.table for data manipulation
        1. Grouping operations
        2. Adding a column
        3. Ordering columns
        4. What is the advantage of searching using key by?
        5. Creating new columns in data.table
        6. Deleting a column
        7. Pivots on data.table
        8. The melt functionality
      5. Reading and writing files with data.table
      6. A special note on dates and/or time
    3. Miscellaneous topics
      1. Checking data quality
        1. Reading other file formats – Excel, SAS, and other data sources
        2. On-disk formats
        3. Working with web data
        4. Web APIs
    4. Tutorial – looking at airline flight times data
    5. Summary
    6. Quiz
  9. KDD, Data Mining, and Text Mining
    1. Good practices of KDD and data mining
      1. Stages of KDD
    2. Scraping a dwarf name
    3. Retrieving text from the web
      1. Legality of web scraping
      2. Web scraping made easy with rvest
    4. Retrieving tweets from R community 
      1. Creating your Twitter application 
      2. Fetching the number of tweets
    5. Cleaning and transforming data
    6. Looking for patterns – peeking, visualizing, and clustering data
      1. Peeking data
      2. Visualizing data
      3. Cluster analysis
    7. Summary
    8. Quiz
  10. Data Analysis with R
    1. Preparing data for analysis
      1. Data categories
      2. Data types in R
      3. Reading data
      4. Managing data issues
        1. Mixed data types
      5. Missing data
      6. Handling strings and dates
        1. Handling dates using POSIXct or POSIXlt
        2. Handling strings in R
          1. Reading data
          2. Combining strings
      7. Simple pattern matching and replacement with R
        1. Printing results
    2. Data visualisation
      1. Types of charts – basic primer
        1. Histograms
        2. Line plots
        3. Scatter plots
        4. Boxplots
      2. Bar charts
        1. Heatmaps
        2. Summarizing data
    3. Saving analysis for future work
      1. Packrat
      2. Checkpoint
      3. Rocker
    4. Summary
    5. Quiz
  11. Machine Learning with R
    1. What is machine learning?
      1. Machine learning everywhere
      2. Machine learning vocabulary
      3. Generic problems solved by machine learning
    2. Linear regression with R
      1. Tricks for lm
    3. Tree models
      1. Strengths and weakness
        1. The Chilean plebiscite data
      2. Starting with decision trees
        1. Growing trees with tree and rpart
    4. Random forests – a collection of trees
    5. Support vector machines
    6. What about regressions?
    7. Hierarchical and k-means clustering
    8. Neural networks
      1. Introduction to feedforward neural networks with R
    9. Summary
    10. Quiz
  12. Forecasting and ML App with R
    1. The UI and server
    2. Forecasting machine learning application
      1. Application details
    3. Summary
    4. Quiz
  13. Neural Networks and Deep Learning
    1. Daily neural nets
    2. Overview – NNs and deep learning
      1. Neuroscience inspiration
      2. ANN nodes
      3. Activation functions
      4. Layers
      5. Training algorithms
    3. NNs with Keras
      1. Getting things ready for Keras
      2. Getting practical with Keras
      3. Further tips
    4. Summary
    5. Quiz
  14. Markovian in R
    1. Markovian-type models
      1. Markovian models – real-world applications
      2. The Markov chain
    2. Programming an HMM with R
    3. Summary
    4. Quiz
  15. Visualizing Data
    1. Retrieving and cleaning data
    2. Crafting visualizations
    3. Summary
    4. Quiz
  16. Going to Production with R
    1. What is R Shiny?
    2. How to build a Shiny app
    3. Building an application inside R
      1. The reactive and isolate functions
      2. The observeEvent and eventReactive functions
    4. Approach for creating a data product from statistical modeling and web UI
    5. Some advice about Shiny
    6. Summary
    7. Quiz
  17. Large Scale Data Analytics with Hadoop
    1. Installing the package and Spark
    2. Manipulating Spark data using both dplyr and SQL
    3. Filtering and aggregating Spark datasets 
    4. Using Spark machine learning or H2O Sparking Water
    5. Providing interfaces to Spark packages
    6. Spark DataFrames within the RStudio IDE
    7. Summary
    8. Quiz
  18. R on Cloud
    1. Cloud computing
      1. Cloud types
      2. Things to look for
      3. Why Azure?
    2. Azure registration
    3. Azure Machine Learning Studio
      1. How modules work
      2. Building an experiment that uses R
    4. Summary 
    5. Quiz 
  19. The Road Ahead
    1. Growing your skills
      1. Gathering data
      2. Content to stay tuned to
    2. Meeting Stack Overflow
  20. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Hands-On Data Science with R
  • Author(s): Vitor Bianchi Lanzetta, Nataraj Dasgupta, Ricardo Anjoleto Farias
  • Release date: November 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781789139402