Book Description
Design clever algorithms that discover hidden patterns and draw responses from unstructured, unlabeled data.
Key Features
 Build stateoftheart algorithms that can solve your business' problems
 Learn how to find hidden patterns in your data
 Revise key concepts with handson exercises using realworld datasets
Book Description
Starting with the basics, Applied Unsupervised Learning with R explains clustering methods, distribution analysis, data encoders, and features of R that enable you to understand your data better and get answers to your most pressing business questions.
This book begins with the most important and commonly used method for unsupervised learning  clustering  and explains the three main clustering algorithms  kmeans, divisive, and agglomerative. Following this, you'll study market basket analysis, kernel density estimation, principal component analysis, and anomaly detection. You'll be introduced to these methods using code written in R, with further instructions on how to work with, edit, and improve R code. To help you gain a practical understanding, the book also features useful tips on applying these methods to real business problems, including market segmentation and fraud detection. By working through interesting activities, you'll explore data encoders and latent variable models.
By the end of this book, you will have a better understanding of different anomaly detection methods, such as outlier detection, Mahalanobis distances, and contextual and collective anomaly detection.
What you will learn
 Implement clustering methods such as kmeans, agglomerative, and divisive
 Write code in R to analyze market segmentation and consumer behavior
 Estimate distribution and probabilities of different outcomes
 Implement dimension reduction using principal component analysis
 Apply anomaly detection methods to identify fraud
 Design algorithms with R and learn how to edit or improve code
Who this book is for
Applied Unsupervised Learning with R is designed for business professionals who want to learn about methods to understand their data better, and developers who have an interest in unsupervised learning. Although the book is for beginners, it will be beneficial to have some basic, beginnerlevel familiarity with R. This includes an understanding of how to open the R console, how to read data, and how to create a loop. To easily understand the concepts of this book, you should also know basic mathematical concepts, including exponents, square roots, means, and medians.
Publisher Resources
Table of Contents
 Preface
 Chapter 1

Introduction to Clustering Methods
 Introduction
 Introduction to Clustering
 Introduction to the Iris Dataset
 Introduction to kmeans Clustering
 Introduction to kmeans Clustering with BuiltIn Functions
 Introduction to Market Segmentation

Introduction to kmedoids Clustering
 The kmedoids Clustering Algorithm
 kmedoids Clustering Code
 Exercise 5: Implementing kmedoid Clustering
 kmeans Clustering versus kmedoids Clustering
 Activity 3: Performing Customer Segmentation with kmedoids Clustering
 Deciding the Optimal Number of Clusters
 Types of Clustering Metrics
 Silhouette Score
 Exercise 6: Calculating the Silhouette Score
 Exercise 7: Identifying the Optimum Number of Clusters
 WSS/Elbow Method
 Exercise 8: Using WSS to Determine the Number of Clusters
 The Gap Statistic
 Exercise 9: Calculating the Ideal Number of Clusters with the Gap Statistic
 Activity 4: Finding the Ideal Number of Market Segments
 Summary
 Chapter 2

Advanced Clustering Methods
 Introduction
 Introduction to kmodes Clustering

Introduction to DensityBased Clustering (DBSCAN)
 Steps for DBSCAN
 Exercise 11: Implementing DBSCAN
 Uses of DBSCAN
 Activity 6: Implementing DBSCAN and Visualizing the Results
 Introduction to Hierarchical Clustering
 Types of Similarity Metrics
 Steps to Perform Agglomerative Hierarchical Clustering
 Exercise 12: Agglomerative Clustering with Different Similarity Measures
 Divisive Clustering
 Steps to Perform Divisive Clustering
 Exercise 13: Performing DIANA Clustering
 Activity 7: Performing Hierarchical Cluster Analysis on the Seeds Dataset
 Summary
 Chapter 3

Probability Distributions
 Introduction

Basic Terminology of Probability Distributions
 Uniform Distribution
 Exercise 14: Generating and Plotting Uniform Samples in R
 Normal Distribution
 Exercise 15: Generating and Plotting a Normal Distribution in R
 Skew and Kurtosis
 LogNormal Distributions
 Exercise 16: Generating a LogNormal Distribution from a Normal Distribution
 The Binomial Distribution
 Exercise 17: Generating a Binomial Distribution
 The Poisson Distribution
 The Pareto Distribution
 Introduction to Kernel Density Estimation
 Introduction to the KolmogorovSmirnov Test
 Summary
 Chapter 4

Dimension Reduction
 Introduction

Market Basket Analysis
 Exercise 22: Data Preparation for the Apriori Algorithm
 Exercise 23: Passing through the Data to Find the Most Common Baskets
 Exercise 24: More Passes through the Data
 Exercise 25: Generating Associative Rules as the Final Step of the Apriori Algorithm
 Principal Component Analysis
 Linear Algebra Refresher
 Matrices
 Variance
 Covariance
 Exercise 26: Examining Variance and Covariance on the Wine Dataset
 Eigenvectors and Eigenvalues
 The Idea of PCA
 Exercise 27: Performing PCA
 Exercise 28: Performing Dimension Reduction with PCA
 Activity 10: Performing PCA and Market Basket Analysis on a New Dataset
 Summary
 Chapter 5

Data Comparison Methods
 Introduction

Analytic Signatures
 Exercise 31: Perform the Data Preparation for Creating an Analytic Signature for an Image
 Exercise 32: Creating a Brightness Comparison Function
 Exercise 33: Creating a Function to Compare Image Sections to All of the Neighboring Sections
 Exercise 34: Creating a Function that Generates an Analytic Signature for an Image
 Activity 11: Creating an Image Signature for a Photograph of a Person
 Comparison of Signatures
 Latent Variable Models – Factor Analysis
 Summary
 Chapter 6

Anomaly Detection
 Introduction

Univariate Outlier Detection
 Exercise 37: Performing an Exploratory Visual Check for Outliers Using R's boxplot Function
 Exercise 38: Transforming a FatTailed Dataset to Improve Outlier Classification
 Exercise 39: Finding Outliers without Using R's BuiltIn boxplot Function
 Exercise 40: Detecting Outliers Using a Parametric Method
 Multivariate Outlier Detection
 Exercise 41: Calculating Mahalanobis Distance
 Detecting Anomalies in Clusters
 Other Methods for Multivariate Outlier Detection
 Exercise 42: Classifying Outliers based on Comparisons of Mahalanobis Distances
 Detecting Outliers in Seasonal Data
 Exercise 43: Performing Seasonality Modeling
 Exercise 44: Finding Anomalies in Seasonal Data Using a Parametric Method
 Contextual and Collective Anomalies
 Exercise 45: Detecting Contextual Anomalies
 Exercise 46: Detecting Collective Anomalies
 Kernel Density
 Summary
 Appendix
Product Information
 Title: Applied Unsupervised Learning with R
 Author(s):
 Release date: March 2019
 Publisher(s): Packt Publishing
 ISBN: 9781789956399