Data Science with Python

Book description

Leverage the power of the Python data science libraries and advanced machine learning techniques to analyse large unstructured datasets and predict the occurrence of a particular future event.

Key Features

  • Explore the depths of data science, from data collection through to visualization
  • Learn pandas, scikit-learn, and Matplotlib in detail
  • Study various data science algorithms using real-world datasets

Book Description

Data Science with Python begins by introducing you to data science and teaches you to install the packages you need to create a data science coding environment. You will learn three major techniques in machine learning: unsupervised learning, supervised learning, and reinforcement learning. You will also explore basic classification and regression techniques, such as support vector machines, decision trees, and logistic regression.

As you make your way through chapters, you will study the basic functions, data structures, and syntax of the Python language that are used to handle large datasets with ease. You will learn about NumPy and pandas libraries for matrix calculations and data manipulation, study how to use Matplotlib to create highly customizable visualizations, and apply the boosting algorithm XGBoost to make predictions. In the concluding chapters, you will explore convolutional neural networks (CNNs), deep learning algorithms used to predict what is in an image. You will also understand how to feed human sentences to a neural network, make the model process contextual information, and create human language processing systems to predict the outcome.

By the end of this book, you will be able to understand and implement any new data science algorithm and have the confidence to experiment with tools or libraries other than those covered in the book.

What you will learn

  • Pre-process data to make it ready to use for machine learning
  • Create data visualizations with Matplotlib
  • Use scikit-learn to perform dimension reduction using principal component analysis (PCA)
  • Solve classification and regression problems
  • Get predictions using the XGBoost library
  • Process images and create machine learning models to decode them
  • Process human language for prediction and classification
  • Use TensorBoard to monitor training metrics in real time
  • Find the best hyperparameters for your model with AutoML

Who this book is for

Data Science with Python is designed for data analysts, data scientists, database engineers, and business analysts who want to move towards using Python and machine learning techniques to analyze data and predict outcomes. Basic knowledge of Python and data analytics will prove beneficial to understand the various concepts explained through this book.

Downloading the example code for this ebook: You can download the example code files for this ebook on GitHub at the following link: https://github.com/TrainingByPackt/Data-Science-with-Python. If you require support please email: customercare@packt.com

Table of contents

  1. Preface
    1. About the Book
      1. About the Authors 
      2. Learning Objectives
      3. Audience
      4. Approach
      5. Minimum Hardware Requirements
      6. Software Requirements
      7. Installation and Setup
      8. Using Kaggle for Faster Experimentation
      9. Conventions
      10. Installing the Code Bundle
  2. Chapter 1
  3. Introduction to Data Science and Data Pre-Processing
    1. Introduction
    2. Python Libraries
    3. Roadmap for Building Machine Learning Models
    4. Data Representation
      1. Independent and Target Variables
      2. Exercise 1: Loading a Sample Dataset and Creating the Feature Matrix and Target Matrix
    5. Data Cleaning
      1. Exercise 2: Removing Missing Data
      2. Exercise 3: Imputing Missing Data
      3. Exercise 4: Finding and Removing Outliers in Data
    6. Data Integration
      1. Exercise 5: Integrating Data
    7. Data Transformation
      1. Handling Categorical Data
      2. Exercise 6: Simple Replacement of Categorical Data with a Number
      3. Exercise 7: Converting Categorical Data to Numerical Data Using Label Encoding
      4. Exercise 8: Converting Categorical Data to Numerical Data Using One-Hot Encoding
    8. Data in Different Scales
      1. Exercise 9: Implementing Scaling Using the Standard Scaler Method
      2. Exercise 10: Implementing Scaling Using the MinMax Scaler Method
    9. Data Discretization
      1. Exercise 11: Discretization of Continuous Data
    10. Train and Test Data
      1. Exercise 12: Splitting Data into Train and Test Sets
      2. Activity 1: Pre-Processing Using the Bank Marketing Subscription Dataset
    11. Supervised Learning
    12. Unsupervised Learning
    13. Reinforcement Learning
    14. Performance Metrics
    15. Summary
  4. Chapter 2
  5. Data Visualization
    1. Introduction
    2. Functional Approach
      1. Exercise 13: Functional Approach – Line Plot
      2. Exercise 14: Functional Approach – Add a Second Line to the Line Plot
      3. Activity 2: Line Plot
      4. Exercise 15: Creating a Bar Plot
      5. Activity 3: Bar Plot
      6. Exercise 16: Functional Approach – Histogram
      7. Exercise 17: Functional Approach – Box-and-Whisker plot
      8. Exercise 18: Scatterplot
    3. Object-Oriented Approach Using Subplots
      1. Exercise 19: Single Line Plot using Subplots
      2. Exercise 20: Multiple Line Plots Using Subplots
      3. Activity 4: Multiple Plot Types Using Subplots
    4. Summary
  6. Chapter 3
  7. Introduction to Machine Learning via Scikit-Learn
    1. Introduction
    2. Introduction to Linear and Logistic Regression
      1. Simple Linear Regression
      2. Exercise 21: Preparing Data for a Linear Regression Model
      3. Exercise 22: Fitting a Simple Linear Regression Model and Determining the Intercept and Coefficient
      4. Exercise 23: Generating Predictions and Evaluating the Performance of a Simple Linear Regression Model
    3. Multiple Linear Regression
      1. Exercise 24: Fitting a Multiple Linear Regression Model and Determining the Intercept and Coefficients
      2. Activity 5: Generating Predictions and Evaluating the Performance of a Multiple Linear Regression Model
    4. Logistic Regression
      1. Exercise 25: Fitting a Logistic Regression Model and Determining the Intercept and Coefficients
      2. Exercise 26: Generating Predictions and Evaluating the Performance of a Logistic Regression Model
      3. Exercise 27: Tuning the Hyperparameters of a Multiple Logistic Regression Model
      4. Activity 6: Generating Predictions and Evaluating Performance of a Tuned Logistic Regression Model
    5. Max Margin Classification Using SVMs
      1. Exercise 28: Preparing Data for the Support Vector Classifier (SVC) Model
      2. Exercise 29: Tuning the SVC Model Using Grid Search
      3. Activity 7: Generating Predictions and Evaluating the Performance of the SVC Grid Search Model
    6. Decision Trees
      1. Activity 8: Preparing Data for a Decision Tree Classifier
      2. Exercise 30: Tuning a Decision Tree Classifier Using Grid Search
      3. Exercise 31: Programmatically Extracting Tuned Hyperparameters from a Decision Tree Classifier Grid Search Model
      4. Activity 9: Generating Predictions and Evaluating the Performance of a Decision Tree Classifier Model
    7. Random Forests
      1. Exercise 32: Preparing Data for a Random Forest Regressor
      2. Activity 10: Tuning a Random Forest Regressor
      3. Exercise 33: Programmatically Extracting Tuned Hyperparameters and Determining Feature Importance from a Random Forest Regressor Grid Search Model
      4. Activity 11: Generating Predictions and Evaluating the Performance of a Tuned Random Forest Regressor Model
    8. Summary
  8. Chapter 4
  9. Dimensionality Reduction and Unsupervised Learning
    1. Introduction
    2. Hierarchical Cluster Analysis (HCA)
      1. Exercise 34: Building an HCA Model
      2. Exercise 35: Plotting an HCA Model and Assigning Predictions
    3. K-means Clustering
      1. Exercise 36: Fitting k-means Model and Assigning Predictions
      2. Activity 12: Ensemble k-means Clustering and Calculating Predictions
      3. Exercise 37: Calculating Mean Inertia by n_clusters
      4. Exercise 38: Plotting Mean Inertia by n_clusters
    4. Principal Component Analysis (PCA)
      1. Exercise 39: Fitting a PCA Model
      2. Exercise 40: Choosing n_components using Threshold of Explained Variance
      3. Activity 13: Evaluating Mean Inertia by Cluster after PCA Transformation
      4. Exercise 41: Visual Comparison of Inertia by n_clusters
    5. Supervised Data Compression using Linear Discriminant Analysis (LDA)
      1. Exercise 42: Fitting LDA Model
      2. Exercise 43: Using LDA Transformed Components in Classification Model
    6. Summary
  10. Chapter 5
  11. Mastering Structured Data
    1. Introduction
    2. Boosting Algorithms
      1. Gradient Boosting Machine (GBM)
      2. XGBoost (Extreme Gradient Boosting)
      3. Exercise 44: Using the XGBoost library to Perform Classification
    3. XGBoost Library
      1. Controlling Model Overfitting
      2. Handling Imbalanced Datasets
      3. Activity 14: Training and Predicting the Income of a Person
    4. External Memory Usage
    5. Cross-validation
      1. Exercise 45: Using Cross-validation to Find the Best Hyperparameters
    6. Saving and Loading a Model
      1. Exercise 46: Creating a Python Pcript that Predicts Based on Real-time Input
      2. Activity 15: Predicting the Loss of Customers
    7. Neural Networks
      1. What Is a Neural Network?
      2. Optimization Algorithms
      3. Hyperparameters
    8. Keras
      1. Exercise 47: Installing the Keras library for Python and Using it to Perform Classification
      2. Keras Library
      3. Exercise 48: Predicting Avocado Price Using Neural Networks
    9. Categorical Variables
      1. One-hot Encoding
      2. Entity Embedding
      3. Exercise 49: Predicting Avocado Price Using Entity Embedding
      4. Activity 16: Predicting a Customer's Purchase Amount
    10. Summary
  12. Chapter 6
  13. Decoding Images
    1. Introduction
    2. Images
      1. Exercise 50: Classify MNIST Using a Fully Connected Neural Network
    3. Convolutional Neural Networks
      1. Convolutional Layer
    4. Pooling Layer
    5. Adam Optimizer
    6. Cross-entropy Loss
      1. Exercise 51: Classify MNIST Using a CNN
    7. Regularization
      1. Dropout Layer
      2. L1 and L2 Regularization
      3. Batch Normalization
      4. Exercise 52: Improving Image Classification Using Regularization Using CIFAR-10 images
    8. Image Data Preprocessing
      1. Normalization
      2. Converting to Grayscale
      3. Getting All Images to the Same Size
      4. Other Useful Image Operations
      5. Activity 17: Predict if an Image Is of a Cat or a Dog
    9. Data Augmentation
    10. Generators
      1. Exercise 53: Classify CIFAR-10 Images with Image Augmentation
      2. Activity 18: Identifying and Augmenting an Image
    11. Summary
  14. Chapter 7
  15. Processing Human Language
    1. Introduction
    2. Text Data Processing
      1. Regular Expressions
      2. Exercise 54: Using RegEx for String Cleaning
      3. Basic Feature Extraction
      4. Text Preprocessing
      5. Exercise 55: Preprocessing the IMDB Movie Review Dataset
      6. Text Processing
      7. Exercise 56: Creating Word Embeddings Using Gensim
      8. Activity 19: Predicting Sentiments of Movie Reviews
    3. Recurrent Neural Networks (RNNs)
      1. LSTMs
      2. Exercise 57: Performing Sentiment Analysis Using LSTM
      3. Activity 20: Predicting Sentiments from Tweets
    4. Summary
  16. Chapter 8
  17. Tips and Tricks of the Trade
    1. Introduction
    2. Transfer Learning
      1. Transfer Learning for Image Data
      2. Exercise 58: Using InceptionV3 to Compare and Classify Images
      3. Activity 21: Classifying Images using InceptionV3
    3. Useful Tools and Tips
      1. Train, Development, and Test Datasets
      2. Working with Unprocessed Datasets
      3. pandas Profiling
      4. TensorBoard
    4. AutoML
      1. Exercise 59: Get a Well-Performing Network Using Auto-Keras
      2. Model Visualization Using Keras
      3. Activity 22: Using Transfer Learning to Predict Images
    5. Summary
  18. Appendix
    1. Chapter 1: Introduction to Data Science and Data Preprocessing
      1. Activity 1: Pre-Processing Using the Bank Marketing Subscription Dataset
    2. Chapter 2: Data Visualization
      1. Activity 2: Line Plot
      2. Activity 3: Bar Plot
      3. Activity 4: Multiple Plot Types Using Subplots
    3. Chapter 3: Introduction to Machine Learning via Scikit-Learn
      1. Activity 5: Generating Predictions and Evaluating the Performance of a Multiple Linear Regression Model
      2. Activity 6: Generating Predictions and Evaluating Performance of a Tuned Logistic Regression Model
      3. Activity 7: Generating Predictions and Evaluating the Performance of the SVC Grid Search Model
      4. Activity 8: Preparing Data for a Decision Tree Classifier
      5. Activity 9: Generating Predictions and Evaluating the Performance of a Decision Tree Classifier Model
      6. Activity 10: Tuning a Random Forest Regressor
      7. Activity 11: Generating Predictions and Evaluating the Performance of a Tuned Random Forest Regressor Model
    4. Chapter 4: Dimensionality Reduction and Unsupervised Learning
      1. Activity 12: Ensemble k-means Clustering and Calculating Predictions
      2. Activity 13: Evaluating Mean Inertia by Cluster after PCA Transformation
    5. Chapter 5: Mastering Structured Data
      1. Activity 14: Training and Predicting the Income of a Person
      2. Activity 15: Predicting the Loss of Customers
      3. Activity 16: Predicting a Customer's Purchase Amount
    6. Chapter 6: Decoding Images
      1. Activity 17: Predict if an Image Is of a Cat or a Dog
      2. Activity 18: Identifying and Augmenting an Image
    7. Chapter 7: Processing Human Language
      1. Activity 19: Predicting Sentiments of Movie Reviews
      2. Activity 20: Predicting Sentiments from Tweets
    8. Chapter 8: Tips and Tricks of the Trade
      1. Activity 21: Classifying Images using InceptionV3
      2. Activity 22: Using Transfer Learning to Predict Images

Product information

  • Title: Data Science with Python
  • Author(s): Rohan Chopra, Aaron England, Mohamed Noordeen Alaudeen
  • Release date: July 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781838552862