Book description
Gain expert guidance on how to successfully develop machine learning models in Python and build your own unique data platforms
Key Features
- Gain a full understanding of the model production and deployment process
- Build your first machine learning model in just five minutes and get a hands-on machine learning experience
- Understand how to deal with common challenges in data science projects
Book Description
Where there’s data, there’s insight. With so much data being generated, there is immense scope to extract meaningful information that’ll boost business productivity and profitability. By learning to convert raw data into game-changing insights, you’ll open new career paths and opportunities.
The Data Science Workshop begins by introducing different types of projects and showing you how to incorporate machine learning algorithms in them. You’ll learn to select a relevant metric and even assess the performance of your model. To tune the hyperparameters of an algorithm and improve its accuracy, you’ll get hands-on with approaches such as grid search and random search.
Next, you’ll learn dimensionality reduction techniques to easily handle many variables at once, before exploring how to use model ensembling techniques and create new features to enhance model performance. In a bid to help you automatically create new features that improve your model, the book demonstrates how to use the automated feature engineering tool. You’ll also understand how to use the orchestration and scheduling workflow to deploy machine learning models in batch.
By the end of this book, you’ll have the skills to start working on data science projects confidently. By the end of this book, you’ll have the skills to start working on data science projects confidently.
What you will learn
- Explore the key differences between supervised learning and unsupervised learning
- Manipulate and analyze data using scikit-learn and pandas libraries
- Understand key concepts such as regression, classification, and clustering
- Discover advanced techniques to improve the accuracy of your model
- Understand how to speed up the process of adding new features
- Simplify your machine learning workflow for production
Who this book is for
This is one of the most useful data science books for aspiring data analysts, data scientists, database engineers, and business analysts. It is aimed at those who want to kick-start their careers in data science by quickly learning data science techniques without going through all the mathematics behind machine learning algorithms. Basic knowledge of the Python programming language will help you easily grasp the concepts explained in this book.
Table of contents
- The Data Science Workshop
- Second Edition
- Preface
- 1. Introduction to Data Science in Python
-
2. Regression
- Introduction
- Simple Linear Regression
- Multiple Linear Regression
-
Conducting Regression Analysis Using Python
- Exercise 2.01: Loading and Preparing the Data for Analysis
- The Correlation Coefficient
- Exercise 2.02: Graphical Investigation of Linear Relationships Using Python
- Exercise 2.03: Examining a Possible Log-Linear Relationship Using Python
- The Statsmodels formula API
- Exercise 2.04: Fitting a Simple Linear Regression Model Using the Statsmodels formula API
- Analyzing the Model Summary
- The Model Formula Language
- Intercept Handling
- Activity 2.01: Fitting a Log-Linear Model Using the Statsmodels Formula API
- Multiple Regression Analysis
- Assumptions of Regression Analysis
- Explaining the Results of Regression Analysis
- Summary
-
3. Binary Classification
- Introduction
-
Understanding the Business Context
- Business Discovery
- Exercise 3.01: Loading and Exploring the Data from the Dataset
- Testing Business Hypotheses Using Exploratory Data Analysis
- Visualization for Exploratory Data Analysis
- Exercise 3.02: Business Hypothesis Testing for Age versus Propensity for a Term Loan
- Intuitions from the Exploratory Analysis
- Activity 3.01: Business Hypothesis Testing to Find Employment Status versus Propensity for Term Deposits
- Feature Engineering
- Data-Driven Feature Engineering
-
Correlation Matrix and Visualization
- Exercise 3.05: Finding the Correlation in Data to Generate a Correlation Plot Using Bank Data
- Skewness of Data
- Histograms
- Density Plots
- Other Feature Engineering Methods
- Summarizing Feature Engineering
- Building a Binary Classification Model Using the Logistic Regression Function
- Logistic Regression Demystified
- Metrics for Evaluating Model Performance
- Confusion Matrix
- Accuracy
- Classification Report
- Data Preprocessing
- Exercise 3.06: A Logistic Regression Model for Predicting the Propensity of Term Deposit Purchases in a Bank
- Activity 3.02: Model Iteration 2 – Logistic Regression Model with Feature Engineered Variables
- Next Steps
- Summary
- 4. Multiclass Classification with RandomForest
- 5. Performing Your First Cluster Analysis
-
6. How to Assess Performance
- Introduction
- Splitting Data
- Assessing Model Performance for Regression Models
- Assessing Model Performance for Classification Models
-
The Confusion Matrix
- Exercise 6.06: Generating a Confusion Matrix for the Classification Model
- More on the Confusion Matrix
- Precision
- Exercise 6.07: Computing Precision for the Classification Model
- Recall
- Exercise 6.08: Computing Recall for the Classification Model
- F1 Score
- Exercise 6.09: Computing the F1 Score for the Classification Model
- Accuracy
- Exercise 6.10: Computing Model Accuracy for the Classification Model
- Logarithmic Loss
- Exercise 6.11: Computing the Log Loss for the Classification Model
- Receiver Operating Characteristic Curve
- Area Under the ROC Curve
- Saving and Loading Models
- Summary
- 7. The Generalization of Machine Learning Models
- 8. Hyperparameter Tuning
- 9. Interpreting a Machine Learning Model
- 10. Analyzing a Dataset
- 11. Data Preparation
-
12. Feature Engineering
-
Introduction
- Merging Datasets
- Exercise 12.01: Merging the ATO Dataset with the Postcode Data
- Binning Variables
- Exercise 12.02: Binning the YearBuilt Variable from the AMES Housing Dataset
- Manipulating Dates
- Exercise 12.03: Date Manipulation on Financial Services Consumer Complaints
- Performing Data Aggregation
- Exercise 12.04: Feature Engineering Using Data Aggregation on the AMES Housing Dataset
- Activity 12.01: Feature Engineering on a Financial Dataset
- Summary
-
Introduction
-
13. Imbalanced Datasets
- Introduction
- Understanding the Business Context
- Challenges of Imbalanced Datasets
- Strategies for Dealing with Imbalanced Datasets
-
Generating Synthetic Samples
- Implementation of SMOTE and MSMOTE
- Exercise 13.03: Implementing SMOTE on Our Banking Dataset to Find the Optimal Result
- Exercise 13.04: Implementing MSMOTE on Our Banking Dataset to Find the Optimal Result
- Applying Balancing Techniques on a Telecom Dataset
- Activity 13.01: Finding the Best Balancing Technique by Fitting a Classifier on the Telecom Churn Dataset
- Summary
-
14. Dimensionality Reduction
- Introduction
- Creating a High-Dimensional Dataset
-
Strategies for Addressing High-Dimensional Datasets
- Backward Feature Elimination (Recursive Feature Elimination)
- Exercise 14.02: Dimensionality Reduction Using Backward Feature Elimination
- Forward Feature Selection
- Exercise 14.03: Dimensionality Reduction Using Forward Feature Selection
- Principal Component Analysis (PCA)
- Exercise 14.04: Dimensionality Reduction Using PCA
- Independent Component Analysis (ICA)
- Exercise 14.05: Dimensionality Reduction Using Independent Component Analysis
- Factor Analysis
- Exercise 14.06: Dimensionality Reduction Using Factor Analysis
- Comparing Different Dimensionality Reduction Techniques
- Summary
- 15. Ensemble Learning
Product information
- Title: The Data Science Workshop - Second Edition
- Author(s):
- Release date: August 2020
- Publisher(s): Packt Publishing
- ISBN: 9781800566927
You might also like
book
The Applied Data Science Workshop - Second Edition
Designed with beginners in mind, this workshop helps you make the most of Python libraries and …
book
The Data Science Workshop
Cut through the noise and get real results with a step-by-step approach to data science Key …
book
The Data Analysis Workshop
Learn how to analyze data using Python models with the help of real-world use cases and …
book
The Data Wrangling Workshop - Second Edition
A beginner's guide to simplifying Extract, Transform, Load (ETL) processes with the help of hands-on tips, …