O'Reilly logo
live online training icon Live Online training

Probability from Scratch

enter image description here

Topic: Data
Thomas Nield

Probability is a foundational pillar of data science, machine learning, analytics, and statistics. While measuring random events can seem difficult and abstract, there are practical and intuitive ways to discover probability with simple Python code—no libraries required.

Join expert Thomas Nield for a tour of foundational concepts in probability, including conditional probability, Bayes’s theorem, discrete distributions, and continuous distributions. While balancing frequentist and Bayesian ideas, you’ll learn how to write models completely from scratch, without needing to use libraries like NumPy. You’ll even discover how to fit a normal distribution to a set of data using simple hill climbing algorithms and approximate areas under curves without calculus. Statistics libraries don’t need to be black boxes—learn how to use them with more insight and confidence.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • The fundamentals of probability math, as well as nuances and pitfalls
  • How continuous and discrete probability distributions work (including PDF, CDF, and quantile functions)
  • Foundational Bayesian techniques and their practical applications

And you’ll be able to: - Build probability distributions completely from scratch, gaining insight in how these models work - Apply intuition toward real-life problems and measure uncertainty more objectively - Leverage clever hacks to extract probabilities, confidence intervals, and random numbers from distributions with ease

This training course is for you because...

  • You’re a budding data science professional who wants to build foundational knowledge in probability before diving into statistics and machine learning.
  • You’re a programmer interested in random numbers and probabilistic modeling.
  • You want to see what Bayes’s theorem is all about.

Prerequisites

  • A working knowledge of Python (e.g., variables, functions, loops, generators, and classes)
  • A basic understanding of mathematical functions, logarithms, and exponents

Recommended preparation: - Read “A Crash Course in Python” (chapter 2 in Data Science from Scratch, second edition)

Recommended follow-up: - Read Think Stats, second edition (book) - Read Bayesian Statistics the Fun Way (book) - Read chapters 1–7 in Data Science from Scratch, second edition (book)

About your instructor

  • Thomas Nield is an operations research consultant as well as a writer, conference speaker, and trainer. He enjoys making technical content relatable and relevant to those unfamiliar or intimidated by it. Thomas regularly teaches classes on analytics, machine learning, and mathematical optimization. He’s authored two books, including Getting Started with SQL (O'Reilly) and Learning RxJava (Packt), and has written several popular articles, including “How It Feels to Learn Data Science in 2019” and “Is Deep Learning Already Hitting Its Limitations?”

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction to probability (20 minutes) - Katacoda interactive exercise: Explore the Monty Hall problem - Group discussion: What is probability?; What is probability versus statistics?; - What’s the difference between frequentism and Bayesianism?

Probability fundamental (40 minutes) - Presentation: Adding probabilities; multiplying probabilities; logarithmic addition; mutually exclusive events; independent events; conditional probability; splitting an event - Katacoda interactive exercises: Perform probability addition and multiplication; explore coin flip and dice outcome; perform conditional probability - Q&A

Break (5 minutes)

Bayes’s theorem (5 minutes) - Presentation: Bayes’s theorem; revisiting the Monty Hall problem; the prosecutor's fallacy; naive Bayes - Hands-on exercise: Determine if your diagnosis is incorrect - Q&A

Discovering the binomial and beta distribution (50 minutes) - Presentation: Bernoulli distribution; binomial distribution; unfair coin flip problem; beta distribution; PDF, CDF, and quantile; updating predictions with beta - Katacoda interactive exercises: Predict airline no-shows; predict confidence intervals - Q&A

Break (5 minutes)

Discovering the normal distribution (45 minutes) - Presentation: Uniform distribution; discovering the normal distribution; PDF, CDF, and quantile; fitting a normal distribution; the central limit theorem - Katacoda interactive exercises: Determine if this observation is unlikely; fit and analyze data

Wrap-up and Q&A (10 minutes)