Unsupervised Learning with R

Book Description

Work with over 40 packages to draw inferences from complex datasets and find hidden patterns in raw unstructured data

About This Book

  • Unlock and discover how to tackle clusters of raw data through practical examples in R
  • Explore your data and create your own models from scratch
  • Analyze the main aspects of unsupervised learning with this comprehensive, practical step-by-step guide

Who This Book Is For

This book is intended for professionals who are interested in data analysis using unsupervised learning techniques, as well as data analysts, statisticians, and data scientists seeking to learn to use R to apply data mining techniques. Knowledge of R, machine learning, and mathematics would help, but are not a strict requirement.

What You Will Learn

  • Load, manipulate, and explore your data in R using techniques for exploratory data analysis such as summarization, manipulation, correlation, and data visualization
  • Transform your data by using approaches such as scaling, re-centering, scale [0-1], median/MAD, natural log, and imputation data
  • Build and interpret clustering models using K-Means algorithms in R
  • Build and interpret clustering models by Hierarchical Clustering Algorithm's in R
  • Understand and apply dimensionality reduction techniques
  • Create and use learning association rules models, such as recommendation algorithms
  • Use and learn about the techniques of feature selection
  • Install and use end-user tools as an alternative to programming directly in the R console

In Detail

The R Project for Statistical Computing provides an excellent platform to tackle data processing, data manipulation, modeling, and presentation. The capabilities of this language, its freedom of use, and a very active community of users makes R one of the best tools to learn and implement unsupervised learning.

If you are new to R or want to learn about unsupervised learning, this book is for you. Packed with critical information, this book will guide you through a conceptual explanation and practical examples programmed directly into the R console.

Starting from the beginning, this book introduces you to unsupervised learning and provides a high-level introduction to the topic. We quickly move on to discuss the application of key concepts and techniques for exploratory data analysis. The book then teaches you to identify groups with the help of clustering methods or building association rules. Finally, it provides alternatives for the treatment of high-dimensional datasets, as well as using dimensionality reduction techniques and feature selection techniques.

By the end of this book, you will be able to implement unsupervised learning and various approaches associated with it in real-world projects.

Style and approach

This book takes a step-by-step approach to unsupervised learning concepts and tools, explained in a conversational and easy-to-follow style. Each topic is explained sequentially, explaining the theory and then putting it into practice by using specialized R packages for each topic.

Table of Contents

  1. Unsupervised Learning with R
    1. Table of Contents
    2. Unsupervised Learning with R
    3. Credits
    4. About the Author
    5. Acknowledgments
    6. About the Reviewer
    7. www.PacktPub.com
      1. Support files, eBooks, discount offers, and more
        1. Why subscribe?
        2. Free access for Packt account holders
    8. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Downloading the color images of this book
        3. Errata
        4. Piracy
        5. Questions
    9. 1. Welcome to the Age of Information Technology
      1. The information age
        1. Data mining
        2. Machine learning
          1. Supervised learning
          2. Unsupervised learning
      2. Information theory
        1. Entropy
        2. Information gain
      3. Data mining methodology and software tools
        1. CRISP-DM
      4. Benefits of using R
      5. Summary
    10. 2. Working with Data – Exploratory Data Analysis
      1. Exploratory data analysis
      2. Loading a dataset
      3. Basic exploration of the dataset
      4. Exploring data by basic visualization
        1. Histograms
        2. Barplots
        3. Boxplots
        4. Special visualizations
      5. Exploring relations in data
      6. Exploration by end-user interfaces
        1. Loading data into Rattle
        2. Basic exploration of dataset in Rattle
        3. Exploring data by graphs in Rattle
        4. Exploring relations in data using Rattle
      7. Summary
    11. 3. Identifying and Understanding Groups – Clustering Algorithms
      1. Transforming data
        1. Rescaling data
          1. Recenter
          2. Scale [0-1]
          3. Median/MAD
          4. Natural log
        2. Imputation of missing data
          1. Zero/Missing
          2. Mean imputation
      2. Fundamentals of clustering techniques
        1. The K-Means clustering
          1. Defining the number of clusters
          2. Defining the cluster K-Mean algorithm
          3. Alternatives for plotting clusters
        2. Hierarchical clustering
          1. Clustering distance metric
          2. Linkage methods
          3. Hierarchical clustering in R
          4. Hierarchical clustering with factors
          5. Tips for choosing a hierarchical clustering algorithm
          6. Plotting alternatives for hierarchical clustering
      3. Clustering by end-user interfaces
      4. Summary
    12. 4. Association Rules
      1. Fundamentals of association rules
        1. Representation
      2. Exploring the association rules model
      3. Plotting alternatives for association rules
      4. Association rules by end-user tool
      5. Summary
    13. 5. Dimensionality Reduction
      1. The curse of dimensionality
      2. Feature extraction
        1. Principal component analysis
        2. Additional visual support for PCA
        3. Advanced tools for plotting PCA
        4. Hierarchical clustering on principal components
        5. Principal components analysis by user interfaces
      3. Summary
    14. 6. Feature Selection Methods
      1. Feature selection techniques
        1. Expert knowledge-based techniques
        2. Feature ranking
        3. Subset selection techniques
          1. Embedded methods
          2. Wrapper methods
          3. Filter methods
      2. Summary
    15. A. References
      1. Chapter 1, Welcome to the Age of Information Technology
      2. Chapter 2, Working with Data – Exploratory Data Analysis
      3. Chapter 3, Identifying and Understanding Groups – Clustering Algorithms
      4. Chapter 4, Association Rules
      5. Chapter 5, Dimensionality Reduction
      6. Chapter 6, Feature Selection Methods
    16. Index

Product Information

  • Title: Unsupervised Learning with R
  • Author(s): Erik Rodríguez Pacheco
  • Release date: December 2015
  • Publisher(s): Packt Publishing
  • ISBN: 9781785887093