Book description
This practical guide provides nearly 200 self-contained recipes to help you solve machine learning challenges you may encounter in your daily work. If you’re comfortable with Python and its libraries, including pandas and scikit-learn, you’ll be able to address specific problems such as loading data, handling text or numerical data, model selection, and dimensionality reduction and many other topics.
Each recipe includes code that you can copy and paste into a toy dataset to ensure that it actually works. From there, you can insert, combine, or adapt the code to help construct your application. Recipes also include a discussion that explains the solution and provides meaningful context. This cookbook takes you beyond theory and concepts by providing the nuts and bolts you need to construct working machine learning applications.
You’ll find recipes for:
- Vectors, matrices, and arrays
- Handling numerical and categorical data, text, images, and dates and times
- Dimensionality reduction using feature extraction or feature selection
- Model evaluation and selection
- Linear and logical regression, trees and forests, and k-nearest neighbors
- Support vector machines (SVM), naïve Bayes, clustering, and neural networks
- Saving and loading trained models
Publisher resources
Table of contents
- Preface
-
1. Vectors, Matrices, and Arrays
- 1.0. Introduction
- 1.1. Creating a Vector
- 1.2. Creating a Matrix
- 1.3. Creating a Sparse Matrix
- 1.4. Selecting Elements
- 1.5. Describing a Matrix
- 1.6. Applying Operations to Elements
- 1.7. Finding the Maximum and Minimum Values
- 1.8. Calculating the Average, Variance, and Standard Deviation
- 1.9. Reshaping Arrays
- 1.10. Transposing a Vector or Matrix
- 1.11. Flattening a Matrix
- 1.12. Finding the Rank of a Matrix
- 1.13. Calculating the Determinant
- 1.14. Getting the Diagonal of a Matrix
- 1.15. Calculating the Trace of a Matrix
- 1.16. Finding Eigenvalues and Eigenvectors
- 1.17. Calculating Dot Products
- 1.18. Adding and Subtracting Matrices
- 1.19. Multiplying Matrices
- 1.20. Inverting a Matrix
- 1.21. Generating Random Values
- 2. Loading Data
-
3. Data Wrangling
- 3.0. Introduction
- 3.1. Creating a Data Frame
- 3.2. Describing the Data
- 3.3. Navigating DataFrames
- 3.4. Selecting Rows Based on Conditionals
- 3.5. Replacing Values
- 3.6. Renaming Columns
- 3.7. Finding the Minimum, Maximum, Sum, Average, and Count
- 3.8. Finding Unique Values
- 3.9. Handling Missing Values
- 3.10. Deleting a Column
- 3.11. Deleting a Row
- 3.12. Dropping Duplicate Rows
- 3.13. Grouping Rows by Values
- 3.14. Grouping Rows by Time
- 3.15. Looping Over a Column
- 3.16. Applying a Function Over All Elements in a Column
- 3.17. Applying a Function to Groups
- 3.18. Concatenating DataFrames
- 3.19. Merging DataFrames
-
4. Handling Numerical Data
- 4.0. Introduction
- 4.1. Rescaling a Feature
- 4.2. Standardizing a Feature
- 4.3. Normalizing Observations
- 4.4. Generating Polynomial and Interaction Features
- 4.5. Transforming Features
- 4.6. Detecting Outliers
- 4.7. Handling Outliers
- 4.8. Discretizating Features
- 4.9. Grouping Observations Using Clustering
- 4.10. Deleting Observations with Missing Values
- 4.11. Imputing Missing Values
- 5. Handling Categorical Data
- 6. Handling Text
-
7. Handling Dates and Times
- 7.0. Introduction
- 7.1. Converting Strings to Dates
- 7.2. Handling Time Zones
- 7.3. Selecting Dates and Times
- 7.4. Breaking Up Date Data into Multiple Features
- 7.5. Calculating the Difference Between Dates
- 7.6. Encoding Days of the Week
- 7.7. Creating a Lagged Feature
- 7.8. Using Rolling Time Windows
- 7.9. Handling Missing Data in Time Series
-
8. Handling Images
- 8.0. Introduction
- 8.1. Loading Images
- 8.2. Saving Images
- 8.3. Resizing Images
- 8.4. Cropping Images
- 8.5. Blurring Images
- 8.6. Sharpening Images
- 8.7. Enhancing Contrast
- 8.8. Isolating Colors
- 8.9. Binarizing Images
- 8.10. Removing Backgrounds
- 8.11. Detecting Edges
- 8.12. Detecting Corners
- 8.13. Creating Features for Machine Learning
- 8.14. Encoding Mean Color as a Feature
- 8.15. Encoding Color Histograms as Features
- 9. Dimensionality Reduction Using Feature Extraction
- 10. Dimensionality Reduction Using Feature Selection
-
11. Model Evaluation
- 11.0. Introduction
- 11.1. Cross-Validating Models
- 11.2. Creating a Baseline Regression Model
- 11.3. Creating a Baseline Classification Model
- 11.4. Evaluating Binary Classifier Predictions
- 11.5. Evaluating Binary Classifier Thresholds
- 11.6. Evaluating Multiclass Classifier Predictions
- 11.7. Visualizing a Classifier’s Performance
- 11.8. Evaluating Regression Models
- 11.9. Evaluating Clustering Models
- 11.10. Creating a Custom Evaluation Metric
- 11.11. Visualizing the Effect of Training Set Size
- 11.12. Creating a Text Report of Evaluation Metrics
- 11.13. Visualizing the Effect of Hyperparameter Values
-
12. Model Selection
- 12.0. Introduction
- 12.1. Selecting Best Models Using Exhaustive Search
- 12.2. Selecting Best Models Using Randomized Search
- 12.3. Selecting Best Models from Multiple Learning Algorithms
- 12.4. Selecting Best Models When Preprocessing
- 12.5. Speeding Up Model Selection with Parallelization
- 12.6. Speeding Up Model Selection Using Algorithm-Specific Methods
- 12.7. Evaluating Performance After Model Selection
- 13. Linear Regression
-
14. Trees and Forests
- 14.0. Introduction
- 14.1. Training a Decision Tree Classifier
- 14.2. Training a Decision Tree Regressor
- 14.3. Visualizing a Decision Tree Model
- 14.4. Training a Random Forest Classifier
- 14.5. Training a Random Forest Regressor
- 14.6. Identifying Important Features in Random Forests
- 14.7. Selecting Important Features in Random Forests
- 14.8. Handling Imbalanced Classes
- 14.9. Controlling Tree Size
- 14.10. Improving Performance Through Boosting
- 14.11. Evaluating Random Forests with Out-of-Bag Errors
- 15. K-Nearest Neighbors
- 16. Logistic Regression
- 17. Support Vector Machines
- 18. Naive Bayes
- 19. Clustering
-
20. Neural Networks
- 20.0. Introduction
- 20.1. Preprocessing Data for Neural Networks
- 20.2. Designing a Neural Network
- 20.3. Training a Binary Classifier
- 20.4. Training a Multiclass Classifier
- 20.5. Training a Regressor
- 20.6. Making Predictions
- 20.7. Visualize Training History
- 20.8. Reducing Overfitting with Weight Regularization
- 20.9. Reducing Overfitting with Early Stopping
- 20.10. Reducing Overfitting with Dropout
- 20.11. Saving Model Training Progress
- 20.12. k-Fold Cross-Validating Neural Networks
- 20.13. Tuning Neural Networks
- 20.14. Visualizing Neural Networks
- 20.15. Classifying Images
- 20.16. Improving Performance with Image Augmentation
- 20.17. Classifying Text
- 21. Saving and Loading Trained Models
- Index
Product information
- Title: Machine Learning with Python Cookbook
- Author(s):
- Release date: March 2018
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491989388
You might also like
book
Machine Learning with Python Cookbook, 2nd Edition
This practical guide provides more than 200 self-contained recipes to help you solve machine learning challenges …
book
Python Machine Learning Cookbook - Second Edition
Discover powerful ways to effectively solve real-world machine learning problems using key libraries including scikit-learn, TensorFlow, …
book
Deep Learning with Python
Deep Learning with Python introduces the field of deep learning using the Python language and the …
book
Python Machine Learning By Example - Third Edition
A comprehensive guide to get you up to speed with the latest developments of practical machine …