Book description
Learn to use scikitlearn operations and functions for Machine Learning and deep learning applications.
About This Book
 Handle a variety of machine learning tasks effortlessly by leveraging the power of scikitlearn
 Perform supervised and unsupervised learning with ease, and evaluate the performance of your model
 Practical, easy to understand recipes aimed at helping you choose the right machine learning algorithm
Who This Book Is For
Data Analysts already familiar with Python but not so much with scikitlearn, who want quick solutions to the common machine learning problems will find this book to be very useful. If you are a Python programmer who wants to take a dive into the world of machine learning in a practical manner, this book will help you too.
What You Will Learn
 Build predictive models in minutes by using scikitlearn
 Understand the differences and relationships between Classification and Regression, two types of Supervised Learning.
 Use distance metrics to predict in Clustering, a type of Unsupervised Learning
 Find points with similar characteristics with Nearest Neighbors.
 Use automation and crossvalidation to find a best model and focus on it for a data product
 Choose among the best algorithm of many or use them together in an ensemble.
 Create your own estimator with the simple syntax of sklearn
 Explore the feedforward neural networks available in scikitlearn
In Detail
Python is quickly becoming the goto language for analysts and data scientists due to its simplicity and flexibility, and within the Python data space, scikitlearn is the unequivocal choice for machine learning. This book includes walk throughs and solutions to the common as well as the notsocommon problems in machine learning, and how scikitlearn can be leveraged to perform various machine learning tasks effectively.
The second edition begins with taking you through recipes on evaluating the statistical properties of data and generates synthetic data for machine learning modelling. As you progress through the chapters, you will comes across recipes that will teach you to implement techniques like data preprocessing, linear regression, logistic regression, KNN, Naïve Bayes, classification, decision trees, Ensembles and much more. Furthermore, you’ll learn to optimize your models with multiclass classification, cross validation, model evaluation and dive deeper in to implementing deep learning with scikitlearn. Along with covering the enhanced features on model section, API and new features like classifiers, regressors and estimators the book also contains recipes on evaluating and finetuning the performance of your model.
By the end of this book, you will have explored plethora of features offered by scikitlearn for Python to solve any machine learning problem you come across.
Style and Approach
This book consists of practical recipes on scikitlearn that target novices as well as intermediate users. It goes deep into the technical issues, covers additional protocols, and many more reallive examples so that you are able to implement it in your daily life scenarios.
Table of contents
 Preface

HighPerformance Machine Learning – NumPy
 Introduction
 NumPy basics
 Loading the iris dataset
 Viewing the iris dataset
 Viewing the iris dataset with Pandas
 Plotting with NumPy and matplotlib
 A minimal machine learning recipe – SVM classification
 Introducing crossvalidation
 Putting it all together
 Machine learning overview – classification versus regression

PreModel Workflow and PreProcessing
 Introduction
 Creating sample data for toy analysis
 Scaling data to the standard normal distribution
 Creating binary features through thresholding
 Working with categorical variables
 Imputing missing values through various strategies
 A linear model in the presence of outliers
 Putting it all together with pipelines
 Using Gaussian processes for regression
 Using SGD for regression

Dimensionality Reduction
 Introduction
 Reducing dimensionality with PCA
 Using factor analysis for decomposition
 Using kernel PCA for nonlinear dimensionality reduction
 Using truncated SVD to reduce dimensionality
 Using decomposition to classify with DictionaryLearning
 Doing dimensionality reduction with manifolds – tSNE
 Testing methods to reduce dimensionality with pipelines

Linear Models with scikitlearn
 Introduction
 Fitting a line through data
 Fitting a line through data with machine learning
 Evaluating the linear regression model
 Using ridge regression to overcome linear regression's shortfalls
 Optimizing the ridge regression parameter
 Using sparsity to regularize models
 Taking a more fundamental approach to regularization with LARS
 References

Linear Models – Logistic Regression
 Introduction
 Loading data from the UCI repository
 Viewing the Pima Indians diabetes dataset with pandas
 Looking at the UCI Pima Indians dataset web page
 Machine learning with logistic regression
 Examining logistic regression errors with a confusion matrix
 Varying the classification threshold in logistic regression
 Receiver operating characteristic – ROC analysis
 Plotting an ROC curve without context
 Putting it all together – UCI breast cancer dataset

Building Models with Distance Metrics
 Introduction
 Using kmeans to cluster data
 Optimizing the number of centroids
 Assessing cluster correctness
 Using MiniBatch kmeans to handle more data
 Quantizing an image with kmeans clustering
 Finding the closest object in the feature space
 Probabilistic clustering with Gaussian mixture models
 Using kmeans for outlier detection
 Using KNN for regression

CrossValidation and PostModel Workflow
 Introduction
 Selecting a model with crossvalidation
 Kfold cross validation
 Balanced crossvalidation
 Crossvalidation with ShuffleSplit
 Time series crossvalidation
 Grid search with scikitlearn
 Randomized search with scikitlearn
 Classification metrics
 Regression metrics
 Clustering metrics
 Using dummy estimators to compare results
 Feature selection
 Feature selection on L1 norms
 Persisting models with joblib or pickle
 Support Vector Machines

Tree Algorithms and Ensembles
 Introduction
 Doing basic classifications with decision trees
 Visualizing a decision tree with pydot
 Tuning a decision tree
 Using decision trees for regression
 Reducing overfitting with crossvalidation
 Implementing random forest regression
 Bagging regression with nearest neighbors
 Tuning gradient boosting trees
 Tuning an AdaBoost regressor
 Writing a stacking aggregator with scikitlearn
 Text and Multiclass Classification with scikitlearn
 Neural Networks
 Create a Simple Estimator
Product information
 Title: scikitlearn Cookbook  Second Edition
 Author(s):
 Release date: November 2017
 Publisher(s): Packt Publishing
 ISBN: 9781787286382
You might also like
book
Python Machine Learning Cookbook  Second Edition
Discover powerful ways to effectively solve realworld machine learning problems using key libraries including scikitlearn, TensorFlow, …
book
Python Cookbook, 2nd Edition
Portable, powerful, and a breeze to use, Python is the popular open source objectoriented programming language …
book
Pandas 1.x Cookbook  Second Edition
Use the power of pandas to solve most complex scientific computing problems with ease. Revised for …
book
Python Feature Engineering Cookbook
Extract accurate information from data to train and improve machine learning models using NumPy, SciPy, pandas, …