Training Systems Using Python Statistical Modeling

Book description

Leverage the power of Python and statistical modeling techniques for building accurate predictive models

Key Features

  • Get started with Python's rich suite of libraries for statistical modeling
  • Implement regression and clustering, and train neural networks from scratch
  • Discover real-world examples on training end-to-end machine learning systems in Python

Book Description

Python's ease-of-use and multi-purpose nature has made it one of the most popular tools for data scientists and machine learning developers. Its rich libraries are widely used for data analysis, and more importantly, for building state-of-the-art predictive models. This book is designed to guide you through using these libraries to implement effective statistical models for predictive analytics.

You'll start by delving into classical statistical analysis, where you will learn to compute descriptive statistics using pandas. You will focus on supervised learning, which will help you explore the principles of machine learning and train different machine learning models from scratch. Next, you will work with binary prediction models, such as data classification using k-nearest neighbors, decision trees, and random forests. The book will also cover algorithms for regression analysis, such as ridge and lasso regression, and their implementation in Python. In later chapters, you will learn how neural networks can be trained and deployed for more accurate predictions, and understand which Python libraries can be used to implement them.

By the end of this book, you will have the knowledge you need to design, build, and deploy enterprise-grade statistical models for machine learning using Python and its rich ecosystem of libraries for predictive analytics.

What you will learn

  • Understand the importance of statistical modeling
  • Learn about the different Python packages for statistical analysis
  • Implement algorithms such as Naive Bayes and random forests
  • Build predictive models from scratch using Python's scikit-learn library
  • Implement regression analysis and clustering
  • Learn how to train a neural network in Python

Who this book is for

If you are a data scientist, a statistician or a machine learning developer looking to train and deploy effective machine learning models using popular statistical techniques, then this book is for you. Knowledge of Python programming is required to get the most out of this book.

Downloading the example code for this ebook: You can download the example code files for this ebook on GitHub at the following link: https://github.com/PacktPublishing/Training-Systems-Using-Python-Statistical-Modeling. If you require support please email: customercare@packt.com

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Training Systems Using Python Statistical Modeling
  3. About Packt
    1. Why subscribe?
    2. Packt.com
  4. Contributors
    1. About the author
    2. Packt is searching for authors like you
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  6. Classical Statistical Analysis
    1. Technical requirements
    2. Computing descriptive statistics
      1. Preprocessing the data
      2. Computing basic statistics
    3. Classical inference for proportions
      1. Computing confidence intervals for proportions
      2. Hypothesis testing for proportions
      3. Testing for common proportions
    4. Classical inference for means
      1. Computing confidence intervals for means
      2. Hypothesis testing for means
      3. Testing with two samples
      4. One-way analysis of variance (ANOVA)
    5. Diving into Bayesian analysis
      1. How Bayesian analysis works
      2. Using Bayesian analysis to solve a hit-and-run
    6. Bayesian analysis for proportions
      1. Conjugate priors for proportions
      2. Credible intervals for proportions
      3. Bayesian hypothesis testing for proportions
      4. Comparing two proportions
    7. Bayesian analysis for means
      1. Credible intervals for means
      2. Bayesian hypothesis testing for means
      3. Testing with two samples
    8. Finding correlations
      1. Testing for correlation
    9. Summary
  7. Introduction to Supervised Learning
    1. Principles of machine learning
      1. Checking the variables using the iris dataset
      2. The goal of supervised learning
    2. Training models
      1. Issues in training supervised learning models
      2. Splitting data
      3. Cross-validation
    3. Evaluating models
      1. Accuracy
      2. Precision
      3. Recall
      4. F1 score
      5. Classification report
      6. Bayes factor
    4. Summary
  8. Binary Prediction Models
    1. K-nearest neighbors classifier
      1. Training a kNN classifier
      2. Hyperparameters in kNN classifiers
    2. Decision trees
      1. Fitting the decision tree
      2. Visualizing the tree
      3. Restricting tree depth
    3. Random forests
      1. Optimizing hyperparameters
    4. Naive Bayes classifier
      1. Preprocessing the data
      2. Training the classifier
    5. Support vector machines
      1. Training a SVM
    6. Logistic regression
      1. Fitting a logit model
    7. Extending beyond binary classifiers
      1. Multiple outcomes for decision trees
      2. Multiple outcomes for random forests
      3. Multiple outcomes for Naive Bayes
      4. One-versus-all and one-versus-one classification
    8. Summary
  9. Regression Analysis and How to Use It
    1. Linear models
      1. Fitting a linear model with OLS
        1. Performing cross-validation
    2. Evaluating linear models
      1. Using AIC to pick models
    3. Bayesian linear models
      1. Choosing a polynomial
      2. Performing Bayesian regression
    4. Ridge regression
      1. Finding the right alpha value
    5. LASSO regression
    6. Spline interpolation
      1. Using SciPy for interpolation
      2. 2D interpolation
    7. Summary
  10. Neural Networks
    1. An introduction to perceptrons
    2. Neural networks
      1. The structure of a neural network
      2. Types of neural networks
      3. The MLP model
    3. MLPs for classification
      1. Optimization techniques
      2. Training the network
      3. Fitting an MLP to the iris dataset
      4. Fitting an MLP to the digits dataset
    4. MLP for regression
    5. Summary
  11. Clustering Techniques
    1. Introduction to clustering
      1. Computing distances
    2. Exploring the k-means algorithm
      1. Clustering the iris dataset
      2. Compressing images with k-means
    3. Evaluating clusters
      1. The elbow method
      2. The silhouette method
    4. Hierarchical clustering
      1. Clustering the iris dataset
      2. Clustering the Headlines dataset
    5. Spectral clustering
      1. Clustering the Headlines dataset
    6. Summary
  12. Dimensionality Reduction
    1. Introducing dimensionality reduction
      1. Uses of dimensionality reduction
    2. Principal component analysis
      1. Demonstration of PCA
        1. Choosing the number of components
    3. Singular value decomposition
      1. SVD for image compression
        1. Low-rank approximation
        2. Reconstructing the image using compact SVD
    4. Low-dimensional representation
      1. Example of MDS
      2. MDS in action
        1. How MDS comes into the picture
      3. Constructing distances
    5. Summary
  13. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Training Systems Using Python Statistical Modeling
  • Author(s): Curtis Miller
  • Release date: May 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781838823733