Book description
Design clever algorithms that discover hidden patterns and draw responses from unstructured, unlabeled data.
Key Features
- Build state-of-the-art algorithms that can solve your business' problems
- Learn how to find hidden patterns in your data
- Revise key concepts with hands-on exercises using real-world datasets
Book Description
Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and features of R that enable you to understand your data better and get answers to your most pressing business questions.
This book begins with the most important and commonly used method for unsupervised learning - clustering - and explains the three main clustering algorithms - k-means, divisive, and agglomerative. Following this, you'll study market basket analysis, kernel density estimation, principal component analysis, and anomaly detection. You'll be introduced to these methods using code written in R, with further instructions on how to work with, edit, and improve R code. To help you gain a practical understanding, the book also features useful tips on applying these methods to real business problems, including market segmentation and fraud detection. By working through interesting activities, you'll explore data encoders and latent variable models.
By the end of this book, you will have a better understanding of different anomaly detection methods, such as outlier detection, Mahalanobis distances, and contextual and collective anomaly detection.
What you will learn
- Implement clustering methods such as k-means, agglomerative, and divisive
- Write code in R to analyze market segmentation and consumer behavior
- Estimate distribution and probabilities of different outcomes
- Implement dimension reduction using principal component analysis
- Apply anomaly detection methods to identify fraud
- Design algorithms with R and learn how to edit or improve code
Who this book is for
Applied Unsupervised Learning with R is designed for business professionals who want to learn about methods to understand their data better, and developers who have an interest in unsupervised learning. Although the book is for beginners, it will be beneficial to have some basic, beginner-level familiarity with R. This includes an understanding of how to open the R console, how to read data, and how to create a loop. To easily understand the concepts of this book, you should also know basic mathematical concepts, including exponents, square roots, means, and medians.
Table of contents
- Preface
- Chapter 1
-
Introduction to Clustering Methods
- Introduction
- Introduction to Clustering
- Introduction to the Iris Dataset
- Introduction to k-means Clustering
- Introduction to k-means Clustering with Built-In Functions
- Introduction to Market Segmentation
-
Introduction to k-medoids Clustering
- The k-medoids Clustering Algorithm
- k-medoids Clustering Code
- Exercise 5: Implementing k-medoid Clustering
- k-means Clustering versus k-medoids Clustering
- Activity 3: Performing Customer Segmentation with k-medoids Clustering
- Deciding the Optimal Number of Clusters
- Types of Clustering Metrics
- Silhouette Score
- Exercise 6: Calculating the Silhouette Score
- Exercise 7: Identifying the Optimum Number of Clusters
- WSS/Elbow Method
- Exercise 8: Using WSS to Determine the Number of Clusters
- The Gap Statistic
- Exercise 9: Calculating the Ideal Number of Clusters with the Gap Statistic
- Activity 4: Finding the Ideal Number of Market Segments
- Summary
- Chapter 2
-
Advanced Clustering Methods
- Introduction
- Introduction to k-modes Clustering
-
Introduction to Density-Based Clustering (DBSCAN)
- Steps for DBSCAN
- Exercise 11: Implementing DBSCAN
- Uses of DBSCAN
- Activity 6: Implementing DBSCAN and Visualizing the Results
- Introduction to Hierarchical Clustering
- Types of Similarity Metrics
- Steps to Perform Agglomerative Hierarchical Clustering
- Exercise 12: Agglomerative Clustering with Different Similarity Measures
- Divisive Clustering
- Steps to Perform Divisive Clustering
- Exercise 13: Performing DIANA Clustering
- Activity 7: Performing Hierarchical Cluster Analysis on the Seeds Dataset
- Summary
- Chapter 3
-
Probability Distributions
- Introduction
-
Basic Terminology of Probability Distributions
- Uniform Distribution
- Exercise 14: Generating and Plotting Uniform Samples in R
- Normal Distribution
- Exercise 15: Generating and Plotting a Normal Distribution in R
- Skew and Kurtosis
- Log-Normal Distributions
- Exercise 16: Generating a Log-Normal Distribution from a Normal Distribution
- The Binomial Distribution
- Exercise 17: Generating a Binomial Distribution
- The Poisson Distribution
- The Pareto Distribution
- Introduction to Kernel Density Estimation
- Introduction to the Kolmogorov-Smirnov Test
- Summary
- Chapter 4
-
Dimension Reduction
- Introduction
-
Market Basket Analysis
- Exercise 22: Data Preparation for the Apriori Algorithm
- Exercise 23: Passing through the Data to Find the Most Common Baskets
- Exercise 24: More Passes through the Data
- Exercise 25: Generating Associative Rules as the Final Step of the Apriori Algorithm
- Principal Component Analysis
- Linear Algebra Refresher
- Matrices
- Variance
- Covariance
- Exercise 26: Examining Variance and Covariance on the Wine Dataset
- Eigenvectors and Eigenvalues
- The Idea of PCA
- Exercise 27: Performing PCA
- Exercise 28: Performing Dimension Reduction with PCA
- Activity 10: Performing PCA and Market Basket Analysis on a New Dataset
- Summary
- Chapter 5
-
Data Comparison Methods
- Introduction
-
Analytic Signatures
- Exercise 31: Perform the Data Preparation for Creating an Analytic Signature for an Image
- Exercise 32: Creating a Brightness Comparison Function
- Exercise 33: Creating a Function to Compare Image Sections to All of the Neighboring Sections
- Exercise 34: Creating a Function that Generates an Analytic Signature for an Image
- Activity 11: Creating an Image Signature for a Photograph of a Person
- Comparison of Signatures
- Latent Variable Models â Factor Analysis
- Summary
- Chapter 6
-
Anomaly Detection
- Introduction
-
Univariate Outlier Detection
- Exercise 37: Performing an Exploratory Visual Check for Outliers Using R's boxplot Function
- Exercise 38: Transforming a Fat-Tailed Dataset to Improve Outlier Classification
- Exercise 39: Finding Outliers without Using R's Built-In boxplot Function
- Exercise 40: Detecting Outliers Using a Parametric Method
- Multivariate Outlier Detection
- Exercise 41: Calculating Mahalanobis Distance
- Detecting Anomalies in Clusters
- Other Methods for Multivariate Outlier Detection
- Exercise 42: Classifying Outliers based on Comparisons of Mahalanobis Distances
- Detecting Outliers in Seasonal Data
- Exercise 43: Performing Seasonality Modeling
- Exercise 44: Finding Anomalies in Seasonal Data Using a Parametric Method
- Contextual and Collective Anomalies
- Exercise 45: Detecting Contextual Anomalies
- Exercise 46: Detecting Collective Anomalies
- Kernel Density
- Summary
- Appendix
Product information
- Title: Applied Unsupervised Learning with R
- Author(s):
- Release date: March 2019
- Publisher(s): Packt Publishing
- ISBN: 9781789956399
You might also like
book
Applied Supervised Learning with R
Learn the ropes of supervised machine learning with R by studying popular real-world use cases, and …
book
Advanced Machine Learning with R
Master an array of machine learning techniques with real-world projects that interface TensorFlow with R, H2O, …
book
Practical Machine Learning in R
Guides professionals and students through the rapidly growing field of machine learning with hands-on examples in …
book
Machine Learning with R, the tidyverse, and mlr
Machine learning (ML) is a collection of programming techniques for discovering relationships in data. With ML …