Book description
Implement scikitlearn into every step of the data science pipeline
About This Book
 Use Python and scikitlearn to create intelligent applications
 Discover how to apply algorithms in a variety of situations to tackle common and notso common challenges in the machine learning domain
 A practical, examplebased guide to help you gain expertise in implementing and evaluating machine learning systems using scikitlearn
Who This Book Is For
If you are a programmer and want to explore machine learning and databased methods to build intelligent applications and enhance your programming skills, this is the course for you. No previous experience with machinelearning algorithms is required.
What You Will Learn
 Review fundamental concepts including supervised and unsupervised experiences, common tasks, and performance metrics
 Classify objects (from documents to human faces and flower species) based on some of their features, using a variety of methods from Support Vector Machines to Naïve Bayes
 Use Decision Trees to explain the main causes of certain phenomena such as passenger survival on the Titanic
 Evaluate the performance of machine learning systems in common tasks
 Master algorithms of various levels of complexity and learn how to analyze data at the same time
 Learn just enough math to think about the connections between various algorithms
 Customize machine learning algorithms to fit your problem, and learn how to modify them when the situation calls for it
 Incorporate other packages from the Python ecosystem to munge and visualize your dataset
 Improve the way you build your models using parallelization techniques
In Detail
Machine learning, the art of creating applications that learn from experience and data, has been around for many years. Python is quickly becoming the goto language for analysts and data scientists due to its simplicity and flexibility; moreover, within the Python data space, scikitlearn is the unequivocal choice for machine learning. The course combines an introduction to some of the main concepts and methods in machine learning with practical, handson examples of realworld problems. The course starts by walking through different methods to prepare your data  be it a dataset with missing values or text columns that require the categories to be turned into indicator variables. After the data is ready, you'll learn different techniques aligned with different objectives  be it a dataset with known outcomes such as sales by state, or more complicated problems such as clustering similar customers. Finally, you'll learn how to polish your algorithm to ensure that it's both accurate and resilient to new datasets. You will learn to incorporate machine learning in your applications. Ranging from handwritten digit recognition to document classification, examples are solved stepbystep using scikitlearn and Python. By the end of this course you will have learned how to build applications that learn from experience, by applying the main concepts and techniques of machine learning.
Style and Approach
Implement scikitlearn using engaging examples and fun exercises, and with a gentle and friendly but comprehensive "learnbydoing" approach. This is a practical course, which analyzes compelling data about life, health, and death with the help of tutorials. It offers you a useful way of interpreting the data that's specific to this course, but that can also be applied to any other data. This course is designed to be both a guide and a reference for moving beyond the basics of scikitlearn.
Table of contents

scikitlearn: Machine Learning Simplified
 Table of Contents
 Credits
 Preface
 1. Module 1

2. Module 2

1. Premodel Workflow
 Introduction
 Getting sample data from external sources
 Creating sample data for toy analysis
 Scaling data to the standard normal
 Creating binary features through thresholding
 Working with categorical variables
 Binarizing label features
 Imputing missing values through various strategies
 Using Pipelines for multiple preprocessing steps
 Reducing dimensionality with PCA
 Using factor analysis for decomposition
 Kernel PCA for nonlinear dimensionality reduction
 Using truncated SVD to reduce dimensionality
 Decomposition to classify with DictionaryLearning
 Putting it all together with Pipelines
 Using Gaussian processes for regression
 Defining the Gaussian process object directly
 Using stochastic gradient descent for regression

2. Working with Linear Models
 Introduction
 Fitting a line through data
 Evaluating the linear regression model
 Using ridge regression to overcome linear regression's shortfalls
 Optimizing the ridge regression parameter
 Using sparsity to regularize models
 Taking a more fundamental approach to regularization with LARS
 Using linear methods for classification – logistic regression
 Directly applying Bayesian ridge regression
 Using boosting to learn from errors

3. Building Models with Distance Metrics
 Introduction
 Using KMeans to cluster data
 Optimizing the number of centroids
 Assessing cluster correctness
 Using MiniBatch KMeans to handle more data
 Quantizing an image with KMeans clustering
 Finding the closest objects in the feature space
 Probabilistic clustering with Gaussian Mixture Models
 Using KMeans for outlier detection
 Using kNN for regression

4. Classifying Data with scikitlearn
 Introduction
 Doing basic classifications with Decision Trees
 Tuning a Decision Tree model
 Using many Decision Trees – random forests
 Tuning a random forest model
 Classifying data with support vector machines
 Generalizing with multiclass classification
 Using LDA for classification
 Working with QDA – a nonlinear LDA
 Using Stochastic Gradient Descent for classification
 Classifying documents with Naïve Bayes
 Label propagation with semisupervised learning

5. Postmodel Workflow
 Introduction
 Kfold cross validation
 Automatic cross validation
 Cross validation with ShuffleSplit
 Stratified kfold
 Poor man's grid search
 Brute force grid search
 Using dummy estimators to compare results
 Regression model evaluation
 Feature selection
 Feature selection on L1 norms
 Persisting models with joblib

1. Premodel Workflow

3. Module 3
 1. The Fundamentals of Machine Learning
 2. Linear Regression
 3. Feature Extraction and Preprocessing
 4. From Linear Regression to Logistic Regression
 5. Nonlinear Classification and Regression with Decision Trees
 6. Clustering with KMeans
 7. Dimensionality Reduction with PCA
 8. The Perceptron
 9. From the Perceptron to Support Vector Machines
 10. From the Perceptron to Artificial Neural Networks
 Bibliography
 Index
Product information
 Title: scikitlearn : Machine Learning Simplified
 Author(s):
 Release date: November 2017
 Publisher(s): Packt Publishing
 ISBN: 9781788833479
You might also like
book
TensorFlow Machine Learning Projects
Implement TensorFlow's offerings such as TensorBoard, TensorFlow.js, TensorFlow Probability, and TensorFlow Lite to build smart automation …
book
Python Machine Learning Cookbook  Second Edition
Discover powerful ways to effectively solve realworld machine learning problems using key libraries including scikitlearn, TensorFlow, …
book
HandsOn Automated Machine Learning
Automate data and model pipelines for faster machine learning applications About This Book Build automated modules …
book
Python Deep Learning Cookbook
Solve different problems in modelling deep neural networks using Python, Tensorflow, and Keras with this practical …