Building Statistical Models in Python

Book description

Make data-driven, informed decisions and enhance your statistical expertise in Python by turning raw data into meaningful insights Purchase of the print or Kindle book includes a free PDF eBook

Key Features

  • Gain expertise in identifying and modeling patterns that generate success
  • Explore the concepts with Python using important libraries such as stats models
  • Learn how to build models on real-world data sets and find solutions to practical challenges

Book Description

The ability to proficiently perform statistical modeling is a fundamental skill for data scientists and essential for businesses reliant on data insights. Building Statistical Models with Python is a comprehensive guide that will empower you to leverage mathematical and statistical principles in data assessment, understanding, and inference generation.

This book not only equips you with skills to navigate the complexities of statistical modeling, but also provides practical guidance for immediate implementation through illustrative examples. Through emphasis on application and code examples, you’ll understand the concepts while gaining hands-on experience. With the help of Python and its essential libraries, you’ll explore key statistical models, including hypothesis testing, regression, time series analysis, classification, and more.

By the end of this book, you’ll gain fluency in statistical modeling while harnessing the full potential of Python's rich ecosystem for data analysis.

What you will learn

  • Explore the use of statistics to make decisions under uncertainty
  • Answer questions about data using hypothesis tests
  • Understand the difference between regression and classification models
  • Build models with stats models in Python
  • Analyze time series data and provide forecasts
  • Discover Survival Analysis and the problems it can solve

Who this book is for

If you are looking to get started with building statistical models for your data sets, this book is for you! Building Statistical Models in Python bridges the gap between statistical theory and practical application of Python. Since you’ll take a comprehensive journey through theory and application, no previous knowledge of statistics is required, but some experience with Python will be useful.

Table of contents

  1. Building Statistical Models in Python
  2. Contributors
  3. About the authors
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Conventions used
    6. Get in touch
    7. Share Your Thoughts
    8. Download a free PDF copy of this book
  6. Part 1:Introduction to Statistics
  7. Chapter 1: Sampling and Generalization
    1. Software and environment setup
    2. Population versus sample
    3. Population inference from samples
      1. Randomized experiments
      2. Observational study
    4. Sampling strategies – random, systematic, stratified, and clustering
      1. Probability sampling
      2. Non-probability sampling
    5. Summary
  8. Chapter 2: Distributions of Data
    1. Technical requirements
    2. Understanding data types
      1. Nominal data
      2. Ordinal data
      3. Interval data
      4. Ratio data
      5. Visualizing data types
    3. Measuring and describing distributions
      1. Measuring central tendency
      2. Measuring variability
      3. Measuring shape
    4. The normal distribution and central limit theorem
      1. The Central Limit Theorem
    5. Bootstrapping
      1. Confidence intervals
      2. Standard error
      3. Correlation coefficients (Pearson’s correlation)
    6. Permutations
      1. Permutations and combinations
      2. Permutation testing
    7. Transformations
    8. Summary
    9. References
  9. Chapter 3: Hypothesis Testing
    1. The goal of hypothesis testing
      1. Overview of a hypothesis test for the mean
      2. Scope of inference
      3. Hypothesis test steps
    2. Type I and Type II errors
      1. Type I errors
      2. Type II errors
    3. Basics of the z-test – the z-score, z-statistic, critical values, and p-values
      1. The z-score and z-statistic
      2. A z-test for means
      3. z-test for proportions
      4. Power analysis for a two-population pooled z-test
    4. Summary
  10. Chapter 4: Parametric Tests
    1. Assumptions of parametric tests
      1. Normally distributed population data
      2. Equal population variance
    2. T-test – a parametric hypothesis test
      1. T-test for means
      2. Two-sample t-test – pooled t-test
      3. Two-sample t-test – Welch’s t-test
      4. Paired t-test
    3. Tests with more than two groups and ANOVA
      1. Multiple tests for significance
      2. ANOVA
      3. Pearson’s correlation coefficient
      4. Power analysis examples
    4. Summary
    5. References
  11. Chapter 5: Non-Parametric Tests
    1. When parametric test assumptions are violated
      1. Permutation tests
    2. The Rank-Sum test
      1. The test statistic procedure
      2. Normal approximation
      3. Rank-Sum example
    3. The Signed-Rank test
    4. The Kruskal-Wallis test
    5. Chi-square distribution
    6. Chi-square goodness-of-fit
    7. Chi-square test of independence
    8. Chi-square goodness-of-fit test power analysis
    9. Spearman’s rank correlation coefficient
    10. Summary
  12. Part 2:Regression Models
  13. Chapter 6: Simple Linear Regression
    1. Simple linear regression using OLS
    2. Coefficients of correlation and determination
      1. Coefficients of correlation
      2. Coefficients of determination
    3. Required model assumptions
      1. A linear relationship between the variables
      2. Normality of the residuals
      3. Homoscedasticity of the residuals
      4. Sample independence
    4. Testing for significance and validating models
      1. Model validation
    5. Summary
  14. Chapter 7: Multiple Linear Regression
    1. Multiple linear regression
      1. Adding categorical variables
      2. Evaluating model fit
      3. Interpreting the results
    2. Feature selection
      1. Statistical methods for feature selection
      2. Performance-based methods for feature selection
      3. Recursive feature elimination
    3. Shrinkage methods
      1. Ridge regression
      2. LASSO regression
      3. Elastic Net
    4. Dimension reduction
      1. PCA – a hands-on introduction
      2. PCR – a hands-on salary prediction study
    5. Summary
  15. Part 3:Classification Models
  16. Chapter 8: Discrete Models
    1. Probit and logit models
    2. Multinomial logit model
    3. Poisson model
      1. The Poisson distribution
      2. Modeling count data
    4. The negative binomial regression model
      1. Negative binomial distribution
    5. Summary
  17. Chapter 9: Discriminant Analysis
    1. Bayes’ theorem
      1. Probability
      2. Conditional probability
      3. Discussing Bayes’ Theorem
    2. Linear Discriminant Analysis
      1. Supervised dimension reduction
    3. Quadratic Discriminant Analysis
    4. Summary
  18. Part 4:Time Series Models
  19. Chapter 10: Introduction to Time Series
    1. What is a time series?
    2. Goals of time series analysis
    3. Statistical measurements
      1. Mean
      2. Variance
      3. Autocorrelation
      4. Cross-correlation
    4. The white-noise model
    5. Stationarity
    6. Summary
    7. References
  20. Chapter 11: ARIMA Models
    1. Technical requirements
    2. Models for stationary time series
      1. Autoregressive (AR) models
      2. Moving average (MA) models
      3. Autoregressive moving average (ARMA) models
    3. Models for non-stationary time series
      1. ARIMA models
    4. Seasonal ARIMA models
    5. More on model evaluation
    6. Summary
    7. References
  21. Chapter 12: Multivariate Time Series
    1. Multivariate time series
      1. Time-series cross-correlation
    2. ARIMAX
      1. Preprocessing the exogenous variables
      2. Fitting the model
      3. Assessing model performance
    3. VAR modeling
      1. Step 1 – visual inspection
      2. Step 2 – selecting the order of AR(p)
      3. Step 3 – assessing cross-correlation
      4. Step 4 – building the VAR(p,q) model
      5. Step 5 – testing the forecast
      6. Step 6 – building the forecast
    4. Summary
    5. References
  22. Part 5:Survival Analysis
  23. Chapter 13: Time-to-Event Variables – An Introduction
    1. What is censoring?
      1. Left censoring
      2. Right censoring
      3. Interval censoring
      4. Type I and Type II censoring
    2. Survival data
    3. Survival Function, Hazard and Hazard Ratio
    4. Summary
  24. Chapter 14: Survival Models
    1. Technical requirements
    2. Kaplan-Meier model
      1. Model definition
      2. Model example
    3. Exponential model
      1. Model example
    4. Cox Proportional Hazards regression model
      1. Step 1
      2. Step 2
      3. Step 3
      4. Step 4
      5. Step 5
    5. Summary
  25. Index
    1. Why subscribe?
  26. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
    3. Download a free PDF copy of this book

Product information

  • Title: Building Statistical Models in Python
  • Author(s): Huy Hoang Nguyen, Paul N Adams, Stuart J Miller
  • Release date: August 2023
  • Publisher(s): Packt Publishing
  • ISBN: 9781804614280