5 Statistical Modeling

5.1 Concepts in Regression

What is statistical modeling?

  • It is a formalization of relationships between variables in the form of mathematical equations.
  • It describes how one or more random variables are related to one or more other variables.
  • The variables are not deterministically but stochastically related.

Reading Statistical Modeling: The Two Cultures http://projecteuclid.org/download/pdf_1/euclid.ss/1009213726


  • Height and age are probabilistically distributed among humans.
  • They are stochastically related; when you know that a person is of age 30 years, this influences the chance of this person of being 4‐feet tall. When you know that a person is of age 13 years, this influences the chance of this person of being 6 feet tall.
  • Model 1
    • heighti = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of height, ε is the error term, and i is the subject.
  • Model 2
    • heighti = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous.

Regression models involve the following variables:

  • The unknown parameters
  • The independent variables, X
  • The dependent variable, Y
  • Y = a + BX is the simplest form of regression
  • Linear regression Y = a + Bx + (E)
  • Multivariate regression Y = a + bx + cy + (E)
  • Logistic regression ln(p/1 − p) = a + bX


Okun’s LawThe relationship between an economy’s unemployment rate and its gross national product (GNP). Economist Arthur Okun developed this idea, ...

Get Python for R Users now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.