Book description
Implement scikit-learn into every step of the data science pipeline
About This Book
- Use Python and scikit-learn to create intelligent applications
- Discover how to apply algorithms in a variety of situations to tackle common and not-so common challenges in the machine learning domain
- A practical, example-based guide to help you gain expertise in implementing and evaluating machine learning systems using scikit-learn
Who This Book Is For
If you are a programmer and want to explore machine learning and data-based methods to build intelligent applications and enhance your programming skills, this is the course for you. No previous experience with machine-learning algorithms is required.
What You Will Learn
- Review fundamental concepts including supervised and unsupervised experiences, common tasks, and performance metrics
- Classify objects (from documents to human faces and flower species) based on some of their features, using a variety of methods from Support Vector Machines to Naïve Bayes
- Use Decision Trees to explain the main causes of certain phenomena such as passenger survival on the Titanic
- Evaluate the performance of machine learning systems in common tasks
- Master algorithms of various levels of complexity and learn how to analyze data at the same time
- Learn just enough math to think about the connections between various algorithms
- Customize machine learning algorithms to fit your problem, and learn how to modify them when the situation calls for it
- Incorporate other packages from the Python ecosystem to munge and visualize your dataset
- Improve the way you build your models using parallelization techniques
In Detail
Machine learning, the art of creating applications that learn from experience and data, has been around for many years. Python is quickly becoming the go-to language for analysts and data scientists due to its simplicity and flexibility; moreover, within the Python data space, scikit-learn is the unequivocal choice for machine learning. The course combines an introduction to some of the main concepts and methods in machine learning with practical, hands-on examples of real-world problems. The course starts by walking through different methods to prepare your data - be it a dataset with missing values or text columns that require the categories to be turned into indicator variables. After the data is ready, you'll learn different techniques aligned with different objectives - be it a dataset with known outcomes such as sales by state, or more complicated problems such as clustering similar customers. Finally, you'll learn how to polish your algorithm to ensure that it's both accurate and resilient to new datasets. You will learn to incorporate machine learning in your applications. Ranging from handwritten digit recognition to document classification, examples are solved step-by-step using scikit-learn and Python. By the end of this course you will have learned how to build applications that learn from experience, by applying the main concepts and techniques of machine learning.
Style and Approach
Implement scikit-learn using engaging examples and fun exercises, and with a gentle and friendly but comprehensive "learn-by-doing" approach. This is a practical course, which analyzes compelling data about life, health, and death with the help of tutorials. It offers you a useful way of interpreting the data that's specific to this course, but that can also be applied to any other data. This course is designed to be both a guide and a reference for moving beyond the basics of scikit-learn.
Table of contents
-
scikit-learn: Machine Learning Simplified
- Table of Contents
- Credits
- Preface
- 1. Module 1
-
2. Module 2
-
1. Premodel Workflow
- Introduction
- Getting sample data from external sources
- Creating sample data for toy analysis
- Scaling data to the standard normal
- Creating binary features through thresholding
- Working with categorical variables
- Binarizing label features
- Imputing missing values through various strategies
- Using Pipelines for multiple preprocessing steps
- Reducing dimensionality with PCA
- Using factor analysis for decomposition
- Kernel PCA for nonlinear dimensionality reduction
- Using truncated SVD to reduce dimensionality
- Decomposition to classify with DictionaryLearning
- Putting it all together with Pipelines
- Using Gaussian processes for regression
- Defining the Gaussian process object directly
- Using stochastic gradient descent for regression
-
2. Working with Linear Models
- Introduction
- Fitting a line through data
- Evaluating the linear regression model
- Using ridge regression to overcome linear regression's shortfalls
- Optimizing the ridge regression parameter
- Using sparsity to regularize models
- Taking a more fundamental approach to regularization with LARS
- Using linear methods for classification – logistic regression
- Directly applying Bayesian ridge regression
- Using boosting to learn from errors
-
3. Building Models with Distance Metrics
- Introduction
- Using KMeans to cluster data
- Optimizing the number of centroids
- Assessing cluster correctness
- Using MiniBatch KMeans to handle more data
- Quantizing an image with KMeans clustering
- Finding the closest objects in the feature space
- Probabilistic clustering with Gaussian Mixture Models
- Using KMeans for outlier detection
- Using k-NN for regression
-
4. Classifying Data with scikit-learn
- Introduction
- Doing basic classifications with Decision Trees
- Tuning a Decision Tree model
- Using many Decision Trees – random forests
- Tuning a random forest model
- Classifying data with support vector machines
- Generalizing with multiclass classification
- Using LDA for classification
- Working with QDA – a nonlinear LDA
- Using Stochastic Gradient Descent for classification
- Classifying documents with Naïve Bayes
- Label propagation with semi-supervised learning
-
5. Postmodel Workflow
- Introduction
- K-fold cross validation
- Automatic cross validation
- Cross validation with ShuffleSplit
- Stratified k-fold
- Poor man's grid search
- Brute force grid search
- Using dummy estimators to compare results
- Regression model evaluation
- Feature selection
- Feature selection on L1 norms
- Persisting models with joblib
-
1. Premodel Workflow
-
3. Module 3
- 1. The Fundamentals of Machine Learning
- 2. Linear Regression
- 3. Feature Extraction and Preprocessing
- 4. From Linear Regression to Logistic Regression
- 5. Nonlinear Classification and Regression with Decision Trees
- 6. Clustering with K-Means
- 7. Dimensionality Reduction with PCA
- 8. The Perceptron
- 9. From the Perceptron to Support Vector Machines
- 10. From the Perceptron to Artificial Neural Networks
- Bibliography
- Index
Product information
- Title: scikit-learn : Machine Learning Simplified
- Author(s):
- Release date: November 2017
- Publisher(s): Packt Publishing
- ISBN: 9781788833479
You might also like
book
Interpretable Machine Learning with Python
A deep and detailed dive into the key aspects and challenges of machine learning interpretability, complete …
book
Python Machine Learning Cookbook - Second Edition
Discover powerful ways to effectively solve real-world machine learning problems using key libraries including scikit-learn, TensorFlow, …
book
Machine Learning Using TensorFlow Cookbook
Master TensorFlow to create powerful machine learning algorithms, with valuable insights on Keras, Boosted Trees, Tabular …
book
Mastering Machine Learning Algorithms
Explore and master the most important algorithms for solving complex machine learning problems. About This Book …