book

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits

Name: Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Author: Tarek Amr
ISBN: 9781838826048

by Tarek Amr

July 2020

Intermediate to advanced

384 pages

8h 38m

English

Packt Publishing

Read now

Unlock full access

Title Page
Copyright and Credits
Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
About Packt
Why subscribe?
Contributors
About the authorAbout the reviewersPackt is searching for authors like you
Preface
Who this book is forWhat this book coversTo get the most out of this bookDownload the example code filesDownload the color imagesConventions usedGet in touchReviews
Section 1: Supervised Learning
Introduction to Machine Learning
Understanding machine learningTypes of machine learning algorithmsSupervised learningClassification versus regressionSupervised learning evaluationUnsupervised learningReinforcement learningThe model development life cycleUnderstanding a problemSplitting our dataFinding the best manner to split the dataMaking sure the training and the test datasets are separateDevelopment setEvaluating our modelDeploying in production and monitoringIteratingWhen to use machine learningIntroduction to scikit-learnIt plays well with the Python data ecosystemPractical level of abstractionWhen not to use scikit-learnInstalling the packages you needIntroduction to pandasPython's scientific computing ecosystem conventionsSummaryFurther reading
Making Decisions with Trees
Understanding decision treesWhat are decision trees?Iris classificationLoading the Iris datasetSplitting the dataTraining the model and using it for predictionEvaluating our predictionsWhich features were more important?Displaying the internal tree decisions How do decision trees learn? Splitting criteriaPreventing overfittingPredictionsGetting a more reliable scoreWhat to do now to get a more reliable scoreShuffleSplitTuning the hyperparameters for higher accuracySplitting the dataTrying different hyperparameter valuesComparing the accuracy scoresVisualizing the tree's decision boundariesFeature engineeringBuilding decision tree regressorsPredicting people's heightsRegressor's evaluation Setting sample weightsSummary
Making Decisions with Linear Equations
Understanding linear modelsLinear equationsLinear regressionEstimating the amount paid to the taxi driverPredicting house prices in BostonData explorationSplitting the dataCalculating a baseline Training the linear regressorEvaluating our model's accuracyShowing feature coefficients Scaling for more meaningful coefficientsAdding polynomial featuresFitting the linear regressor with the derived featuresRegularizing the regressorTraining the lasso regressorFinding the optimum regularization parameterFinding regression intervalsGetting to know additional linear regressorsUsing logistic regression for classificationUnderstanding the logistic functionPlugging the logistic function into a linear modelObjective functionRegularizationSolversConfiguring the logistic regression classifierClassifying the Iris dataset using logistic regressionUnderstanding the classifier's decision boundariesGetting to know additional linear classifiersSummary
Preparing Your Data
Imputing missing valuesSetting missing values to 0Setting missing values to the meanUsing informed estimations for missing valuesEncoding non-numerical columnsOne-hot encodingOrdinal encodingTarget encodingHomogenizing the columns' scaleThe standard scalerThe MinMax scalerRobustScalerSelecting the most useful featuresVarianceThresholdFiltersf-regression and f-classifMutual informationComparing and using the different filtersEvaluating multiple features at a timeSummary

Image Processing with Nearest Neighbors
Nearest neighborsLoading and displaying imagesImage classificationUsing a confusion matrix to understand the model's mistakesPicking a suitable metricSetting the correct KHyperparameter tuning using GridSearchCVUsing custom distancesUsing nearest neighbors for regressionMore neighborhood algorithms Radius neighbors Nearest centroid classifierReducing the dimensions of our image dataPrincipal component analysisNeighborhood component analysisComparing PCA to NCAPicking the most informative components Using the centroid classifier with PCA Restoring the original image from its components Finding the most informative pixels Summary
Classifying Text Using Naive Bayes
Splitting sentences into tokensTokenizing with string splitTokenizing using regular expressionsUsing placeholders before tokenizingVectorizing text into matricesVector space modelBag of wordsDifferent sentences, same representationN-gramsUsing characters instead of wordsCapturing important words with TF-IDFRepresenting meanings with word embeddingWord2VecUnderstanding Naive BayesThe Bayes rule Calculating the likelihood naively Naive Bayes implementationsAdditive smoothingClassifying text using a Naive Bayes classifierDownloading the dataPreparing the dataPrecision, recall, and F1 scorePipelinesOptimizing for different scoresCreating a custom transformerSummary
Section 2: Advanced Supervised Learning
Neural Networks – Here Comes Deep Learning
Getting to know MLPUnderstanding the algorithm's architecture Training the neural networkConfiguring the solvers Classifying items of clothing Downloading the Fashion-MNIST datasetPreparing the data for classificationExperiencing the effects of the hyperparameters Learning not too quickly and not too slowlyPicking a suitable batch sizeChecking whether more training samples are neededChecking whether more epochs are neededChoosing the optimum architecture and hyperparameters Adding your own activation functionUntangling the convolutionsExtracting features by convolvingReducing the dimensionality of the data via max poolingPutting it all togetherMLP regressorsSummary
Ensembles – When One Model Is Not Enough
Answering the question why ensembles? Combining multiple estimators via averagingBoosting multiple biased estimators Downloading the UCI Automobile datasetDealing with missing valuesDifferentiating between numerical features and categorical onesSplitting the data into training and test setsImputing the missing values and encoding the categorical featuresUsing random forest for regressionChecking the effect of the number of treesUnderstanding the effect of each training featureUsing random forest for classificationThe ROC curveUsing bagging regressorsPreparing a mixture of numerical and categorical featuresCombining KNN estimators using a bagging meta-estimatorUsing gradient boosting to predict automobile pricesPlotting the learning devianceComparing the learning rate settingsUsing different sample sizesStopping earlier and adapting the learning rateRegression ranges Using AdaBoost ensembles Exploring more ensemblesVoting ensembles Stacking ensembles Random tree embeddingSummary
The Y is as Important as the X
Scaling your regression targetsEstimating multiple regression targets Building a multi-output regressor Chaining multiple regressors Dealing with compound classification targetsConverting a multi-class problem into a set of binary classifiersEstimating multiple classification targets Calibrating a classifier's probabilities Calculating the precision at kSummary
Imbalanced Learning – Not Even 1% Win the Lottery
Getting the click prediction dataset Installing the imbalanced-learn libraryPredicting the CTRWeighting the training samples differentlyThe effect of the weighting on the ROCSampling the training dataUndersampling the majority classOversampling the minority classCombining data sampling with ensembles Equal opportunity scoreSummary
Section 3: Unsupervised Learning and More
Clustering – Making Sense of Unlabeled Data
Understanding clusteringK-means clusteringCreating a blob-shaped datasetVisualizing our sample dataClustering with K-meansThe silhouette scoreChoosing the initial centroidsAgglomerative clusteringTracing the agglomerative clustering's childrenThe adjusted Rand indexChoosing the cluster linkage DBSCANSummary
Anomaly Detection – Finding Outliers in Data
Unlabeled anomaly detectionGenerating sample dataDetecting anomalies using basic statisticsUsing percentiles for multi-dimensional dataDetecting outliers using EllipticEnvelopeOutlier and novelty detection using LOFNovelty detection using LOFDetecting outliers using isolation forestSummary
Recommender System – Getting to Know Their Taste
The different recommendation paradigmsDownloading surprise and the dataset Downloading the KDD Cup 2012 datasetProcessing and splitting the datasetCreating a random recommenderUsing KNN-inspired algorithmsUsing baseline algorithmsUsing singular value decompositionExtracting latent information via SVD Comparing the similarity measures for the two matricesClick prediction using SVDDeploying machine learning models in productionSummary
Other Books You May Enjoy
Leave a review - let other readers know what you think

Overview

This book, "Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits", serves as a detailed, practical guide for implementing supervised and unsupervised machine learning algorithms in Python. You will learn to design, implement, and deploy robust machine learning models tailored to solve real-world problems effectively.

What this Book will help me do

Differentiate between supervised, unsupervised, and reinforcement learning and recognize the appropriate scenarios for each.
Prepare datasets correctly for machine learning tasks and ensure proper preprocessing techniques are applied.
Resolve issues with imbalanced data distributions and effectively tune models for either bias or variance tradeoffs.
Apply cutting-edge supervised and unsupervised learning algorithms to address a variety of challenges.
Build scalable machine learning solutions, evaluate their performance, and deploy them into production environments.

Author(s)

Tarek Amr is a seasoned data scientist and machine learning expert with extensive experience in leveraging scikit-learn and the Python ecosystem to develop practical solutions. He brings a clear and approachable writing style, emphasizing hands-on learning and application of concepts to empower readers in their machine learning journeys.

Who is it for?

This book is aimed at data scientists, machine learning practitioners, as well as developers and analysts who want to broaden their expertise in machine learning. Readers should have a working knowledge of Python and a basic understanding of mathematics and statistics. This book helps its audience advance their skills, enabling them to design and implement data-driven solutions in a professional context.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Interpretable Machine Learning with Python

Publisher Resources

ISBN: 9781838826048

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills