Book description
Design clever algorithms that can uncover interesting structures and hidden relationships in unstructured, unlabeled data
Key Features
- Learn how to select the most suitable Python library to solve your problem
- Compare k-Nearest Neighbor (k-NN) and non-parametric methods and decide when to use them
- Delve into the applications of neural networks using real-world datasets
Book Description
Unsupervised learning is a useful and practical solution in situations where labeled data is not available.
Applied Unsupervised Learning with Python guides you in learning the best practices for using unsupervised learning techniques in tandem with Python libraries and extracting meaningful information from unstructured data. The book begins by explaining how basic clustering works to find similar data points in a set. Once you are well-versed with the k-means algorithm and how it operates, you'll learn what dimensionality reduction is and where to apply it. As you progress, you'll learn various neural network techniques and how they can improve your model. While studying the applications of unsupervised learning, you will also understand how to mine topics that are trending on Twitter and Facebook and build a news recommendation engine for users. You will complete the course by challenging yourself through various interesting activities such as performing a Market Basket Analysis and identifying relationships between different merchandises.
By the end of this book, you will have the skills you need to confidently build your own models using Python.
What you will learn
- Understand the basics and importance of clustering
- Build k-means, hierarchical, and DBSCAN clustering algorithms from scratch with built-in packages
- Explore dimensionality reduction and its applications
- Use scikit-learn (sklearn) to implement and analyze principal component analysis (PCA) on the Iris dataset
- Employ Keras to build autoencoder models for the CIFAR-10 dataset
- Apply the Apriori algorithm with machine learning extensions (Mlxtend) to study transaction data
Who this book is for
This course is designed for developers, data scientists, and machine learning enthusiasts who are interested in unsupervised learning. Some familiarity with Python programming along with basic knowledge of mathematical concepts including exponents, square roots, means, and medians will be beneficial.
Table of contents
-
Preface
-
About the Book
- About the Authors
- Learning Objectives
- Audience
- Approach
- Hardware Requirements
- Software Requirements
- Conventions
- Installation and Setup
- Install Anaconda on Windows
- Install Anaconda on Linux
- Install Anaconda on macOS
- Install Python on Windows
- Install Python on Linux
- Install Python on macOS X
- Additional Resources
-
About the Book
- Chapter 1
-
Introduction to Clustering
- Introduction
- Unsupervised Learning versus Supervised Learning
- Clustering
-
Introduction to k-means Clustering
- No-Math k-means Walkthrough
- k-means Clustering In-Depth Walkthrough
- Alternative Distance Metric – Manhattan Distance
- Deeper Dimensions
- Exercise 2: Calculating Euclidean Distance in Python
- Exercise 3: Forming Clusters with the Notion of Distance
- Exercise 4: Implementing k-means from Scratch
- Exercise 5: Implementing k-means with Optimization
- Clustering Performance: Silhouette Score
- Exercise 6: Calculating the Silhouette Score
- Activity 1: Implementing k-means Clustering
- Summary
- Chapter 2
- Hierarchical Clustering
- Chapter 3
-
Neighborhood Approaches and DBSCAN
- Introduction
-
Introduction to DBSCAN
- DBSCAN In-Depth
- Walkthrough of the DBSCAN Algorithm
- Exercise 9: Evaluating the Impact of Neighborhood Radius Size
- DBSCAN Attributes – Neighborhood Radius
- Activity 4: Implement DBSCAN from Scratch
- DBSCAN Attributes – Minimum Points
- Exercise 10: Evaluating the Impact of Minimum Points Threshold
- Activity 5: Comparing DBSCAN with k-means and Hierarchical Clustering
- DBSCAN Versus k-means and Hierarchical Clustering
- Summary
- Chapter 4
-
Dimension Reduction and PCA
- Introduction
- Overview of Dimensionality Reduction Techniques
-
PCA
- Mean
- Standard Deviation
- Covariance
- Covariance Matrix
- Exercise 11: Understanding the Foundational Concepts of Statistics
- Eigenvalues and Eigenvectors
- Exercise 12: Computing Eigenvalues and Eigenvectors
- The Process of PCA
- Exercise 13: Manually Executing PCA
- Exercise 14: Scikit-Learn PCA
- Activity 6: Manual PCA versus scikit-learn
- Restoring the Compressed Dataset
- Exercise 15: Visualizing Variance Reduction with Manual PCA
- Exercise 16: Visualizing Variance Reduction with
- Exercise 17: Plotting 3D Plots in Matplotlib
- Activity 7: PCA Using the Expanded Iris Dataset
- Summary
- Chapter 5
-
Autoencoders
- Introduction
-
Fundamentals of Artificial Neural Networks
- The Neuron
- Sigmoid Function
- Rectified Linear Unit (ReLU)
- Exercise 18: Modeling the Neurons of an Artificial Neural Network
- Activity 8: Modeling Neurons with a ReLU Activation Function
- Neural Networks: Architecture Definition
- Exercise 19: Defining a Keras Model
- Neural Networks: Training
- Exercise 20: Training a Keras Neural Network Model
- Activity 9: MNIST Neural Network
- Autoencoders
- Summary
- Chapter 6
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Chapter 7
-
Topic Modeling
- Introduction
- Cleaning Text Data
-
Latent Dirichlet Allocation
- Variational Inference
- Bag of Words
- Exercise 31: Creating a Bag-of-Words Model Using the Count Vectorizer
- Perplexity
- Exercise 32: Selecting the Number of Topics
- Exercise 33: Running Latent Dirichlet Allocation
- Exercise 34: Visualize LDA
- Exercise 35: Trying Four Topics
- Activity 16: Latent Dirichlet Allocation and Health Tweets
- Bag-of-Words Follow-Up
- Exercise 36: Creating a Bag-of-Words Using TF-IDF
- Non-Negative Matrix Factorization
- Summary
- Chapter 8
- Market Basket Analysis
- Chapter 9
-
Hotspot Analysis
- Introduction
-
Kernel Density Estimation
- The Bandwidth Value
- Exercise 46: The Effect of the Bandwidth Value
- Selecting the Optimal Bandwidth
- Exercise 47: Selecting the Optimal Bandwidth Using Grid Search
- Kernel Functions
- Exercise 48: The Effect of the Kernel Function
- Kernel Density Estimation Derivation
- Exercise 49: Simulating the Derivation of Kernel Density Estimation
- Activity 21: Estimating Density in One Dimension
- Hotspot Analysis
- Summary
-
Appendix
- Chapter 1: Introduction to Clustering
- Chapter 2: Hierarchical Clustering
- Chapter 3: Neighborhood Approaches and DBSCAN
- Chapter 4: Dimension Reduction and PCA
- Chapter 5: Autoencoders
- Chapter 6: t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Chapter 7: Topic Modeling
- Chapter 8: Market Basket Analysis
- Chapter 9: Hotspot Analysis
Product information
- Title: Applied Unsupervised Learning with Python
- Author(s):
- Release date: May 2019
- Publisher(s): Packt Publishing
- ISBN: 9781789952292
You might also like
book
Hands-On Unsupervised Learning Using Python
Many industry experts consider unsupervised learning the next frontier in artificial intelligence, one that may hold …
book
Supervised Learning with Python: Concepts and Practical Implementation Using Python
Gain a thorough understanding of supervised learning algorithms by developing use cases with Python. You will …
book
Advanced Deep Learning with Python
Gain expertise in advanced deep learning domains such as neural networks, meta-learning, graph neural networks, and …
book
Hands-On Unsupervised Learning with Python
Discover the skill-sets required to implement various approaches to Machine Learning with Python Key Features Explore …