R Data Analysis Projects

Book description

Get valuable insights from your data by building data analysis systems from scratch with R.

About This Book

  • A handy guide to take your understanding of data analysis with R to the next level
  • Real-world projects that focus on problems in finance, network analysis, social media, and more
  • From data manipulation to analysis to visualization in R, this book will teach you everything you need to know about building end-to-end data analysis pipelines using R

Who This Book Is For

If you are looking for a book that takes you all the way through the practical application of advanced and effective analytics methodologies in R, then this is the book for you. A fundamental understanding of R and the basic concepts of data analysis is all you need to get started with this book.

What You Will Learn

  • Build end-to-end predictive analytics systems in R
  • Build an experimental design to gather your own data and conduct analysis
  • Build a recommender system from scratch using different approaches
  • Use and leverage RShiny to build reactive programming applications
  • Build systems for varied domains including market research, network analysis, social media analysis, and more
  • Explore various R Packages such as RShiny, ggplot, recommenderlab, dplyr, and find out how to use them effectively
  • Communicate modeling results using Shiny Dashboards
  • Perform multi-variate time-series analysis prediction, supplemented with sensitivity analysis and risk modeling

In Detail

R offers a large variety of packages and libraries for fast and accurate data analysis and visualization. As a result, it's one of the most popularly used languages by data scientists and analysts, or anyone who wants to perform data analysis. This book will demonstrate how you can put to use your existing knowledge of data analysis in R to build highly efficient, end-to-end data analysis pipelines without any hassle.

You'll start by building a content-based recommendation system, followed by building a project on sentiment analysis with tweets. You'll implement time-series modeling for anomaly detection, and understand cluster analysis of streaming data. You'll work through projects on performing efficient market data research, building recommendation systems, and analyzing networks accurately, all provided with easy to follow codes.

With the help of these real-world projects, you'll get a better understanding of the challenges faced when building data analysis pipelines, and see how you can overcome them without compromising on the efficiency or accuracy of your systems. The book covers some popularly used R packages such as dplyr, ggplot2, RShiny, and others, and includes tips on using them effectively.

By the end of this book, you'll have a better understanding of data analysis with R, and be able to put your knowledge to practical use without any hassle.

Style and approach

This book takes a unique, learn-as-you-do approach, as you build on your understanding of data analysis progressively with each project. This book is designed in a way that implementing each project will empower you with a unique skill set, and enable you to implement the next project more confidently.

Table of contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Errata
      3. Piracy
      4. Questions
  2. Association Rule Mining
    1. Understanding the recommender systems
      1. Transactions
        1. Weighted transactions
        2. Our web application
    2. Retailer use case and data
    3. Association rule mining
      1. Support and confidence thresholds
    4. The cross-selling campaign
      1. Leverage 
      2. Conviction
    5. Weighted association rule mining
    6. Hyperlink-induced topic search (HITS)
    7. Negative association rules
    8. Rules visualization
    9. Wrapping up
    10. Summary
  3. Fuzzy Logic Induced Content-Based Recommendation
    1. Introducing content-based recommendation
    2. News aggregator use case and data
    3. Designing the content-based recommendation engine
      1. Building a similarity index
        1. Bag-of-words
        2. Term frequency
        3. Document frequency
        4. Inverse document frequency (IDF)
        5. TFIDF
        6. Why cosine similarity?
      2. Searching
        1. Polarity scores
        2. Jaccard's distance
        3. Jaccards distance/index
        4. Ranking search results
        5. Fuzzy logic
        6.  Fuzzification
        7. Defining the rules
        8. Evaluating the rules
        9. Defuzzification
    4. Complete R Code
    5. Summary
  4. Collaborative Filtering
    1. Collaborative filtering
      1. Memory-based approach
      2. Model-based approach
      3. Latent factor approach
    2. Recommenderlab package
      1. Popular approach
    3. Use case and data
    4. Designing and implementing collaborative filtering
      1. Ratings matrix
      2. Normalization
      3. Train test split
      4. Train model
        1. User-based models
        2. Item-based models
        3. Factor-based models
    5. Complete R Code
    6. Summary
  5. Taming Time Series Data Using Deep Neural Networks
    1. Time series data
      1. Non-seasonal time series
      2. Seasonal time series
      3. Time series as a regression problem
    2. Deep neural networks
      1. Forward cycle
        1. Backward cycle
    3. Introduction to the MXNet R package
    4. Symbolic programming in MXNet
      1. Softmax activation
        1. Use case and data
          1. Deep networks for time series prediction
    5. Training test split
    6. Complete R code
    7. Summary
  6. Twitter Text Sentiment Classification Using Kernel Density Estimates
    1. Kernel density estimation
    2. Twitter text
    3. Sentiment classification
      1. Dictionary methods
      2. Machine learning methods
      3. Our approach
    4. Dictionary based scoring
    5. Text pre-processing
      1. Term-frequeny inverse document frequency (TFIDF)
      2. Delta TFIDF
    6. Building a sentiment classifier
    7. Assembling an RShiny application
    8. Complete R code
    9. Summary
  7. Record Linkage - Stochastic and Machine Learning Approaches
    1. Introducing our use case
    2. Demonstrating the use of RecordLinkage package
      1. Feature generation
        1. String features
        2. Phonetic features
    3. Stochastic record linkage
      1. Expectation maximization method
      2. Weights-based method
    4. Machine learning-based record linkage
      1. Unsupervised learning
      2. Supervised learning
    5. Building an RShiny application
    6. Complete R code
      1. Feature generation
      2. Expectation maximization method
      3. Weights-based method
      4. Machine learning method
      5. RShiny application
    7. Summary
  8. Streaming Data Clustering Analysis in R
    1. Streaming data and its challenges
      1. Bounded problems
      2. Drift
      3. Single pass
      4. Real time
    2. Introducing stream clustering
      1. Macro-cluster
    3. Introducing the stream package
      1. Data stream data
      2. DSD as a static simulator
        1. DSD as a simulator with drift
      3. DSD connecting to memory, file, or database
      4. Inflight operation
      5. Can we connect this DSD to an actual data stream?
      6. Data stream task
    4. Use case and data
      1. Speed layer
      2. Batch layer
      3. Reservoir sampling
    5. Complete R code
    6. Summary
  9. Analyze and Understand Networks Using R
    1. Graphs in R
      1. Degree of a vertex
      2. Strength of a vertex
      3. Adjacency Matrix
      4. More networks in R
      5. Centrality of a vertex
      6. Farness and Closeness of a node
      7. Finding the shortest path between nodes
      8. Random walk on a graph
    2. Use case and data
    3. Data preparation
    4. Product network analysis
    5. Building a RShiny application
    6. The complete R script
    7. Summary

Product information

  • Title: R Data Analysis Projects
  • Author(s): Gopi Subramanian
  • Release date: November 2017
  • Publisher(s): Packt Publishing
  • ISBN: 9781788621878