Book description
Design clever algorithms that can uncover interesting structures and hidden relationships in unstructured, unlabeled data
Key Features
 Learn how to select the most suitable Python library to solve your problem
 Compare kNearest Neighbor (kNN) and nonparametric methods and decide when to use them
 Delve into the applications of neural networks using realworld datasets
Book Description
Unsupervised learning is a useful and practical solution in situations where labeled data is not available.
Applied Unsupervised Learning with Python guides you in learning the best practices for using unsupervised learning techniques in tandem with Python libraries and extracting meaningful information from unstructured data. The book begins by explaining how basic clustering works to find similar data points in a set. Once you are wellversed with the kmeans algorithm and how it operates, you'll learn what dimensionality reduction is and where to apply it. As you progress, you'll learn various neural network techniques and how they can improve your model. While studying the applications of unsupervised learning, you will also understand how to mine topics that are trending on Twitter and Facebook and build a news recommendation engine for users. You will complete the course by challenging yourself through various interesting activities such as performing a Market Basket Analysis and identifying relationships between different merchandises.
By the end of this book, you will have the skills you need to confidently build your own models using Python.
What you will learn
 Understand the basics and importance of clustering
 Build kmeans, hierarchical, and DBSCAN clustering algorithms from scratch with builtin packages
 Explore dimensionality reduction and its applications
 Use scikitlearn (sklearn) to implement and analyze principal component analysis (PCA) on the Iris dataset
 Employ Keras to build autoencoder models for the CIFAR10 dataset
 Apply the Apriori algorithm with machine learning extensions (Mlxtend) to study transaction data
Who this book is for
This course is designed for developers, data scientists, and machine learning enthusiasts who are interested in unsupervised learning. Some familiarity with Python programming along with basic knowledge of mathematical concepts including exponents, square roots, means, and medians will be beneficial.
Downloading the example code for this ebook: You can download the example code files for this ebook on GitHub at the following link: https://github.com/TrainingByPackt/AppliedUnsupervisedLearningwithPython. If you require support please email: customercare@packt.com/p>
Table of contents

Preface

About the Book
 About the Authors
 Learning Objectives
 Audience
 Approach
 Hardware Requirements
 Software Requirements
 Conventions
 Installation and Setup
 Install Anaconda on Windows
 Install Anaconda on Linux
 Install Anaconda on macOS
 Install Python on Windows
 Install Python on Linux
 Install Python on macOS X
 Additional Resources

About the Book
 Chapter 1

Introduction to Clustering
 Introduction
 Unsupervised Learning versus Supervised Learning
 Clustering

Introduction to kmeans Clustering
 NoMath kmeans Walkthrough
 kmeans Clustering InDepth Walkthrough
 Alternative Distance Metric – Manhattan Distance
 Deeper Dimensions
 Exercise 2: Calculating Euclidean Distance in Python
 Exercise 3: Forming Clusters with the Notion of Distance
 Exercise 4: Implementing kmeans from Scratch
 Exercise 5: Implementing kmeans with Optimization
 Clustering Performance: Silhouette Score
 Exercise 6: Calculating the Silhouette Score
 Activity 1: Implementing kmeans Clustering
 Summary
 Chapter 2
 Hierarchical Clustering
 Chapter 3

Neighborhood Approaches and DBSCAN
 Introduction

Introduction to DBSCAN
 DBSCAN InDepth
 Walkthrough of the DBSCAN Algorithm
 Exercise 9: Evaluating the Impact of Neighborhood Radius Size
 DBSCAN Attributes – Neighborhood Radius
 Activity 4: Implement DBSCAN from Scratch
 DBSCAN Attributes – Minimum Points
 Exercise 10: Evaluating the Impact of Minimum Points Threshold
 Activity 5: Comparing DBSCAN with kmeans and Hierarchical Clustering
 DBSCAN Versus kmeans and Hierarchical Clustering
 Summary
 Chapter 4

Dimension Reduction and PCA
 Introduction
 Overview of Dimensionality Reduction Techniques

PCA
 Mean
 Standard Deviation
 Covariance
 Covariance Matrix
 Exercise 11: Understanding the Foundational Concepts of Statistics
 Eigenvalues and Eigenvectors
 Exercise 12: Computing Eigenvalues and Eigenvectors
 The Process of PCA
 Exercise 13: Manually Executing PCA
 Exercise 14: ScikitLearn PCA
 Activity 6: Manual PCA versus scikitlearn
 Restoring the Compressed Dataset
 Exercise 15: Visualizing Variance Reduction with Manual PCA
 Exercise 16: Visualizing Variance Reduction with
 Exercise 17: Plotting 3D Plots in Matplotlib
 Activity 7: PCA Using the Expanded Iris Dataset
 Summary
 Chapter 5

Autoencoders
 Introduction

Fundamentals of Artificial Neural Networks
 The Neuron
 Sigmoid Function
 Rectified Linear Unit (ReLU)
 Exercise 18: Modeling the Neurons of an Artificial Neural Network
 Activity 8: Modeling Neurons with a ReLU Activation Function
 Neural Networks: Architecture Definition
 Exercise 19: Defining a Keras Model
 Neural Networks: Training
 Exercise 20: Training a Keras Neural Network Model
 Activity 9: MNIST Neural Network
 Autoencoders
 Summary
 Chapter 6
 tDistributed Stochastic Neighbor Embedding (tSNE)
 Chapter 7

Topic Modeling
 Introduction
 Cleaning Text Data

Latent Dirichlet Allocation
 Variational Inference
 Bag of Words
 Exercise 31: Creating a BagofWords Model Using the Count Vectorizer
 Perplexity
 Exercise 32: Selecting the Number of Topics
 Exercise 33: Running Latent Dirichlet Allocation
 Exercise 34: Visualize LDA
 Exercise 35: Trying Four Topics
 Activity 16: Latent Dirichlet Allocation and Health Tweets
 BagofWords FollowUp
 Exercise 36: Creating a BagofWords Using TFIDF
 NonNegative Matrix Factorization
 Summary
 Chapter 8
 Market Basket Analysis
 Chapter 9

Hotspot Analysis
 Introduction

Kernel Density Estimation
 The Bandwidth Value
 Exercise 46: The Effect of the Bandwidth Value
 Selecting the Optimal Bandwidth
 Exercise 47: Selecting the Optimal Bandwidth Using Grid Search
 Kernel Functions
 Exercise 48: The Effect of the Kernel Function
 Kernel Density Estimation Derivation
 Exercise 49: Simulating the Derivation of Kernel Density Estimation
 Activity 21: Estimating Density in One Dimension
 Hotspot Analysis
 Summary

Appendix
 Chapter 1: Introduction to Clustering
 Chapter 2: Hierarchical Clustering
 Chapter 3: Neighborhood Approaches and DBSCAN
 Chapter 4: Dimension Reduction and PCA
 Chapter 5: Autoencoders
 Chapter 6: tDistributed Stochastic Neighbor Embedding (tSNE)
 Chapter 7: Topic Modeling
 Chapter 8: Market Basket Analysis
 Chapter 9: Hotspot Analysis
Product information
 Title: Applied Unsupervised Learning with Python
 Author(s):
 Release date: May 2019
 Publisher(s): Packt Publishing
 ISBN: 9781789952292
You might also like
book
Mastering Machine Learning Algorithms  Second Edition
Updated and revised second edition of the bestselling guide to exploring and mastering the most important …
book
HandsOn Unsupervised Learning with Python
Discover the skillsets required to implement various approaches to Machine Learning with Python Key Features Explore …
book
Python Crash Course, 2nd Edition
This is the second edition of the best selling Python book in the world. Python Crash …
book
Deep Learning for Coders with fastai and PyTorch
Deep learning is often viewed as the exclusive domain of math PhDs and big tech companies. …