Hands-On Predictive Analytics with Python

Book description

Step-by-step guide to build high performing predictive applications

Key Features

  • Use the Python data analytics ecosystem to implement end-to-end predictive analytics projects
  • Explore advanced predictive modeling algorithms with an emphasis on theory with intuitive explanations
  • Learn to deploy a predictive model's results as an interactive application

Book Description

Predictive analytics is an applied field that employs a variety of quantitative methods using data to make predictions. It involves much more than just throwing data onto a computer to build a model. This book provides practical coverage to help you understand the most important concepts of predictive analytics. Using practical, step-by-step examples, we build predictive analytics solutions while using cutting-edge Python tools and packages.

The book's step-by-step approach starts by defining the problem and moves on to identifying relevant data. We will also be performing data preparation, exploring and visualizing relationships, building models, tuning, evaluating, and deploying model.

Each stage has relevant practical examples and efficient Python code. You will work with models such as KNN, Random Forests, and neural networks using the most important libraries in Python's data science stack: NumPy, Pandas, Matplotlib, Seaborn, Keras, Dash, and so on. In addition to hands-on code examples, you will find intuitive explanations of the inner workings of the main techniques and algorithms used in predictive analytics.

By the end of this book, you will be all set to build high-performance predictive analytics solutions using Python programming.

What you will learn

  • Get to grips with the main concepts and principles of predictive analytics
  • Learn about the stages involved in producing complete predictive analytics solutions
  • Understand how to define a problem, propose a solution, and prepare a dataset
  • Use visualizations to explore relationships and gain insights into the dataset
  • Learn to build regression and classification models using scikit-learn
  • Use Keras to build powerful neural network models that produce accurate predictions
  • Learn to serve a model's predictions as a web application

Who this book is for

This book is for data analysts, data scientists, data engineers, and Python developers who want to learn about predictive modeling and would like to implement predictive analytics solutions using Python's data stack. People from other backgrounds who would like to enter this exciting field will greatly benefit from reading this book. All you need is to be proficient in Python programming and have a basic understanding of statistics and college-level algebra.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Publisher resources

View/Submit Errata

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Hands-On Predictive Analytics with Python
  3. About Packt
    1. Why subscribe?
    2. Packt.com
  4. Contributors
    1. About the author
    2. About the reviewer
    3. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. The Predictive Analytics Process
    1. Technical requirements
    2. What is predictive analytics?
    3. Reviewing important concepts of predictive analytics
    4. The predictive analytics process
      1. Problem understanding and definition
      2. Data collection and preparation
      3. Dataset understanding using EDA
      4. Model building
      5. Model evaluation
      6. Communication and/or deployment
      7. CRISP-DM and other approaches
    5. A quick tour of Python's data science stack
      1. Anaconda
      2. Jupyter
      3. NumPy
        1. A mini NumPy tutorial
      4. SciPy
      5. pandas
      6. Matplotlib
      7. Seaborn
      8. Scikit-learn
      9. TensorFlow and Keras
      10. Dash
    6. Summary
    7. Further reading
  7. Problem Understanding and Data Preparation
    1. Technical requirements
    2. Understanding the business problem and proposing a solution
      1. Context is everything
      2. Define what is going to be predicted
      3. Make explicit the data that will be required
      4. Think about access to the data
      5. Proposing a solution
        1. Define your methodology
        2. Define key metrics of model performance
        3. Define the deliverables of the project
    3. Practical project – diamond prices
      1. Diamond prices – problem understanding and definition
      2. Getting more context
      3. Diamond prices – proposing a solution at a high level
        1. Goal
        2. Methodology
        3. Metrics for the model
        4. Deliverables for the project
      4. Diamond prices – data collection and preparation
        1. Dealing with missing values
    4. Practical project – credit card default
      1. Credit card default – problem understanding and definition
      2. Credit card default – proposing a solution
        1. Goal
        2. Methodology
        3. Metrics for the model
        4. Deliverables of the project
      3. Credit card default – data collection and preparation
        1. Credit card default – numerical features
        2. Encoding categorical features
        3. Low variance features
        4. Near collinearity
        5. One-hot encoding with pandas
        6. A brief introduction to feature engineering
    5. Summary
    6. Further reading
  8. Dataset Understanding – Exploratory Data Analysis
    1. Technical requirements
    2. What is EDA?
    3. Univariate EDA
      1. Univariate EDA for numerical features
      2. Univariate EDA for categorical features
    4. Bivariate EDA
      1. Two numerical features
        1. Scatter plots
        2. The Pearson correlation coefficient
      2. Two categorical features
        1. Cross tables
        2. Barplots for two categorical variables
      3. One numerical feature and one categorical feature
    5. Introduction to graphical multivariate EDA
    6. Summary
    7. Further reading
  9. Predicting Numerical Values with Machine Learning
    1. Technical requirements
    2. Introduction to ML
      1. Tasks in supervised learning
      2. Creating your first ML model
      3. The goal of ML models – generalization
      4. Overfitting
      5. Evaluation function and optimization
    3. Practical considerations before modeling
      1. Introducing scikit-learn
      2. Further feature transformations
        1. Train-test split
        2. Dimensionality reduction using PCA
        3. Standardization – centering and scaling
    4. MLR
    5. Lasso regression
    6. KNN
    7. Training versus testing error
    8. Summary
    9. Further reading
  10. Predicting Categories with Machine Learning
    1. Technical requirements
    2. Classification tasks
      1. Predicting categories and probabilities
    3. Credit card default dataset
    4. Logistic regression
      1. A simple logistic regression model
      2. A complete logistic regression model
    5. Classification trees
      1. How trees work
      2. The good and the bad of trees
      3. Training a larger classification tree
    6. Random forests
    7. Training versus testing error
    8. Multiclass classification
    9. Naive Bayes classifiers
      1. Conditional probability
      2. Bayes' theorem
        1. Using Bayesian terms
      3. Back to the classification problem
      4. Gaussian Naive Bayes
        1. Gaussian Naive Bayes with scikit-learn
    10. Summary
    11. Further reading
  11. Introducing Neural Nets for Predictive Analytics
    1. Technical requirements
    2. Introducing neural network models
      1. Deep learning
      2. Anatomy of an MLP – elements of a neural network model
      3. How MLPs learn
    3. Introducing TensorFlow and Keras
      1. TensorFlow
      2. Keras – deep learning for humans
    4. Regressing with neural networks
      1. Building the MLP for predicting diamond prices
      2. Training the MLP
      3. Making predictions with the neural network
    5. Classification with neural networks
      1. Building the MLP for predicting credit card default
      2. Evaluating predictions
    6. The dark art of training neural networks
      1. So many decisions; so little time
      2. Regularization for neural networks
        1. Using a validation set
        2. Early stopping
        3. Dropout
      3. Practical advice on training neural networks
    7. Summary
    8. Further reading
  12. Model Evaluation
    1. Technical requirements
    2. Evaluation of regression models
      1. Metrics for regression models
        1. MSE and Root Mean Squared Error (RMSE)
        2. MAE
        3. R-squared (R2)
        4. Defining a custom metric
      2. Visualization methods for evaluating regression models
    3. Evaluation for classification models
      1. Confusion matrix and related metrics
      2. Visualization methods for evaluating classification models
        1. Visualizing probabilities
        2. Receiver Operating Characteristic (ROC) and precision-recall curves
        3. Defining a custom metric for classification
    4. The k-fold cross-validation
    5. Summary
    6. Further reading
  13. Model Tuning and Improving Performance
    1. Technical requirements
    2. Hyperparameter tuning
      1. Optimizing a single hyperparameter
      2. Optimizing more than one parameter
    3. Improving performance
      1. Improving our diamond price predictions
        1. Fitting a neural network
        2. Transforming the target
        3. Analyzing the results
      2. Not only a technical problem but a business problem
    4. Summary
  14. Implementing a Model with Dash
    1. Technical requirements
    2. Model communication and/or deployment phase
      1. Using a technical report
      2. A feature of an existing product
      3. Using an analytic application
    3. Introducing Dash
      1. What is Dash?
      2. Plotly
      3. Installation
      4. The application layout
      5. Building a basic static app
      6. Building a basic interactive app
    4. Implementing a predictive model as a web application
      1. Producing the predictive model objects
      2. Building the web application
    5. Summary
    6. Further reading
  15. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Hands-On Predictive Analytics with Python
  • Author(s): Alvaro Fuentes
  • Release date: December 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781789138719