book

Hands-On Unsupervised Learning Using Python

by Ankur A. Patel

March 2019

Intermediate to advanced

359 pages

8h 46m

English

O'Reilly Media, Inc.

Read now

Unlock full access

A Brief History of Machine LearningAI Is Back, but Why Now?The Emergence of Applied AIMajor Milestones in Applied AI over the Past 20 YearsFrom Narrow AI to AGIObjective and ApproachPrerequisitesRoadmapConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
Basic Machine Learning TerminologyRules-Based vs. Machine LearningSupervised vs. UnsupervisedThe Strengths and Weaknesses of Supervised LearningThe Strengths and Weaknesses of Unsupervised LearningUsing Unsupervised Learning to Improve Machine Learning SolutionsA Closer Look at Supervised AlgorithmsLinear MethodsNeighborhood-Based MethodsTree-Based MethodsSupport Vector MachinesNeural NetworksA Closer Look at Unsupervised AlgorithmsDimensionality ReductionClusteringFeature ExtractionUnsupervised Deep LearningSequential Data Problems Using Unsupervised LearningReinforcement Learning Using Unsupervised LearningSemisupervised LearningSuccessful Applications of Unsupervised LearningAnomaly DetectionConclusion
Environment SetupVersion Control: GitClone the Hands-On Unsupervised Learning Git RepositoryScientific Libraries: Anaconda Distribution of PythonNeural Networks: TensorFlow and KerasGradient Boosting, Version One: XGBoostGradient Boosting, Version Two: LightGBMClustering AlgorithmsInteractive Computing Environment: Jupyter NotebookOverview of the DataData PreparationData AcquisitionData ExplorationGenerate Feature Matrix and Labels ArrayFeature Engineering and Feature SelectionData VisualizationModel PreparationSplit into Training and Test SetsSelect Cost FunctionCreate k-Fold Cross-Validation SetsMachine Learning Models (Part I)Model #1: Logistic RegressionEvaluation MetricsConfusion MatrixPrecision-Recall CurveReceiver Operating CharacteristicMachine Learning Models (Part II)Model #2: Random ForestsModel #3: Gradient Boosting Machine (XGBoost)Model #4: Gradient Boosting Machine (LightGBM)Evaluation of the Four Models Using the Test SetEnsemblesStackingFinal Model SelectionProduction PipelineConclusion
The Motivation for Dimensionality ReductionThe MNIST Digits DatabaseDimensionality Reduction AlgorithmsLinear Projection vs. Manifold LearningPrincipal Component AnalysisPCA, the ConceptPCA in PracticeIncremental PCASparse PCAKernel PCASingular Value DecompositionRandom ProjectionGaussian Random ProjectionSparse Random ProjectionIsomapMultidimensional ScalingLocally Linear Embeddingt-Distributed Stochastic Neighbor EmbeddingOther Dimensionality Reduction MethodsDictionary LearningIndependent Component AnalysisConclusion
Credit Card Fraud DetectionPrepare the DataDefine Anomaly Score FunctionDefine Evaluation MetricsDefine Plotting FunctionNormal PCA Anomaly DetectionPCA Components Equal Number of Original DimensionsSearch for the Optimal Number of Principal ComponentsSparse PCA Anomaly DetectionKernel PCA Anomaly DetectionGaussian Random Projection Anomaly DetectionSparse Random Projection Anomaly DetectionNonlinear Anomaly DetectionDictionary Learning Anomaly DetectionICA Anomaly DetectionFraud Detection on the Test SetNormal PCA Anomaly Detection on the Test SetICA Anomaly Detection on the Test SetDictionary Learning Anomaly Detection on the Test SetConclusion
MNIST Digits DatasetData PreparationClustering Algorithmsk-Meansk-Means InertiaEvaluating the Clustering Resultsk-Means Accuracyk-Means and the Number of Principal Componentsk-Means on the Original DatasetHierarchical ClusteringAgglomerative Hierarchical ClusteringThe DendrogramEvaluating the Clustering ResultsDBSCANDBSCAN AlgorithmApplying DBSCAN to Our DatasetHDBSCANConclusion
Lending Club DataData PreparationTransform String Format to Numerical FormatImpute Missing ValuesEngineer FeaturesSelect Final Set of Features and Perform ScalingDesignate Labels for EvaluationGoodness of the Clustersk-Means ApplicationHierarchical Clustering ApplicationHDBSCAN ApplicationConclusion

Neural NetworksTensorFlowKerasAutoencoder: The Encoder and the DecoderUndercomplete AutoencodersOvercomplete AutoencodersDense vs. Sparse AutoencodersDenoising AutoencoderVariational AutoencoderConclusion
Data PreparationThe Components of an AutoencoderActivation FunctionsOur First AutoencoderLoss FunctionOptimizerTraining the ModelEvaluating on the Test SetTwo-Layer Undercomplete Autoencoder with Linear Activation FunctionIncreasing the Number of NodesAdding More Hidden LayersNonlinear AutoencoderOvercomplete Autoencoder with Linear ActivationOvercomplete Autoencoder with Linear Activation and DropoutSparse Overcomplete Autoencoder with Linear ActivationSparse Overcomplete Autoencoder with Linear Activation and DropoutWorking with Noisy DatasetsDenoising AutoencoderTwo-Layer Denoising Undercomplete Autoencoder with Linear ActivationTwo-Layer Denoising Overcomplete Autoencoder with Linear ActivationTwo-Layer Denoising Overcomplete Autoencoder with ReLu ActivationConclusion
Data PreparationSupervised ModelUnsupervised ModelSemisupervised ModelThe Power of Supervised and UnsupervisedConclusion
Boltzmann MachinesRestricted Boltzmann MachinesRecommender SystemsCollaborative FilteringThe Netflix PrizeMovieLens DatasetData PreparationDefine the Cost Function: Mean Squared ErrorPerform Baseline ExperimentsMatrix FactorizationOne Latent FactorThree Latent FactorsFive Latent FactorsCollaborative Filtering Using RBMsRBM Neural Network ArchitectureBuild the Components of the RBM ClassTrain RBM Recommender SystemConclusion
Deep Belief Networks in DetailMNIST Image ClassificationRestricted Boltzmann MachinesBuild the Components of the RBM ClassGenerate Images Using the RBM ModelView the Intermediate Feature DetectorsTrain the Three RBMs for the DBNExamine Feature DetectorsView Generated ImagesThe Full DBNHow Training of a DBN WorksTrain the DBNHow Unsupervised Learning Helps Supervised LearningGenerate Images to Build a Better Image ClassifierImage Classifier Using LightGBMSupervised OnlyUnsupervised and Supervised SolutionConclusion
GANs, the ConceptThe Power of GANsDeep Convolutional GANsConvolutional Neural NetworksDCGANs RevisitedGenerator of the DCGANDiscriminator of the DCGANDiscriminator and Adversarial ModelsDCGAN for the MNIST DatasetMNIST DCGAN in ActionSynthetic Image GenerationConclusion
ECG DataApproach to Time Series Clusteringk-ShapeTime Series Clustering Using k-Shape on ECGFiveDaysData PreparationTraining and EvaluationTime Series Clustering Using k-Shape on ECG5000Data PreparationTraining and EvaluationTime Series Clustering Using k-Means on ECG5000Time Series Clustering Using Hierarchical DBSCAN on ECG5000Comparing the Time Series Clustering AlgorithmsFull Run with k-ShapeFull Run with k-MeansFull Run with HDBSCANComparing All Three Time Series Clustering ApproachesConclusion
Supervised LearningUnsupervised LearningScikit-LearnTensorFlow and KerasReinforcement LearningMost Promising Areas of Unsupervised Learning TodayThe Future of Unsupervised LearningFinal Words

Content preview from Hands-On Unsupervised Learning Using Python

Chapter 6. Group Segmentation

In Chapter 5, we introduced clustering, an unsupervised learning approach to identify the underlying structure in data and grouping points based on similarity. These groups (known as clusters) should be homogeneous and distinct. In other words, the members within a group should be very similar to each other and very distinct from members of any other group.

From an applied perspective, the ability to segment members into groups based on similarity and without any guidance from labels is very powerful. For example, such a technique could be applied to find different consumer groups for online retailers, customizing a marketing strategy for each of the distinct groups (i.e., budget shoppers, fashionistas, sneakerheads, techies, audiophiles, etc.). Group segmentation could improve targeting in online advertising and improve recommendations in recommender systems for movies, music, news, social networking, dating, etc.

In this chapter, we will build an applied unsupervised learning solution using the clustering algorithms from the previous chapter—more specifically, we will perform group segmentation.

Lending Club Data

For this chapter, we will use loan data from Lending Club, a US peer-to-peer lending company. Borrowers on the platform can borrow between $1,000 to $40,000 in the form of unsecured personal loans, for a term of either three or five years.

Investors can browse the loan applications and choose to finance the loans based on the credit history ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Hands-On Unsupervised Learning with Python

Publisher Resources

ISBN: 9781492035633Errata Page Supplemental Content

Hands-On Unsupervised Learning Using Python

by Ankur A. Patel

Chapter 6. Group Segmentation

Lending Club Data

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Hands-On Unsupervised Learning with Python

Deep Learning with Python

Machine Learning for Time-Series with Python

Machine Learning with Python Cookbook

Publisher Resources

Chapter 6. Group Segmentation

Lending Club Data

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Hands-On Unsupervised Learning with Python

Deep Learning with Python

Machine Learning for Time-Series with Python

Machine Learning with Python Cookbook

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.