O'Reilly logo
live online training icon Live Online training

Why and What If – Causal Analysis for Everyone

Online Experiments, Bayesian Networks, Causal Models and Interventions

Topic: Data
Bruno Gonçalves

How do causes lead to effects? Can you associate the cause leading to the observed effect? Big Data opens the doors for us to be able to answer questions such as this, but before we are able to do so, we must go beyond classical probability theory and dive into the field of Causal Inference.

In this course, we will explore the three steps in the ladder of causation: 1. Association 2. Intervention 3. Counterfactuals with simple rules and techniques to move up the ladder from simple correlational studies to fully causal analyses. We will cover the fundamentals of this powerful set of techniques allowing us to answer practical causal questions such as “Does A cause B?” and “If I change A how does that impact B?”

What you'll learn-and how you can apply it

  • Understand the fundamental difference between correlation and causation
  • Be able to identify and handle confounders
  • Build and evaluate simple causal models
  • Adopt a causal frame of mind
  • Combine Machine Learning and Causal approaches

This training course is for you because...

  • You’re a data scientist who wants to go beyond association analyses to answer causal questions
  • You want to be able to identify the causal mechanisms at work in your data
  • You want to take advantage of causal structure to improve your understanding of your dataset and speed you your computations

Prerequisites

  • Basic Python
  • Jupyter

Course Set-up

  • Scientific Python distribution like Anaconda

Recommended Preparation

Recommended Follow-up

About your instructor

  • Bruno Gonçalves is currently a Senior Data Scientist working at the intersection of Data Science and Finance. Previously, he was a Data Science fellow at NYU's Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Université. Since completing his PhD in the Physics of Complex Systems in 2008 he has been pursuing the use of Data Science and Machine Learning to study Human Behavior. Using large datasets from Twitter, Wikipedia, web access logs, and Yahoo! Meme he studied how we can observe both large scale and individual human behavior in an obtrusive and widespread manner. The main applications have been to the study of Computational Linguistics, Information Diffusion, Behavioral Change and Epidemic Spreading. In 2015 he was awarded the Complex Systems Society's 2015 Junior Scientific Award for "outstanding contributions in Complex Systems Science" and in 2018 is was named a Science Fellow of the Institute for Scientific Interchange in Turin, Italy.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Segment 1 – Approaches to Causality (55 min)

  • Probability Theory
  • Simpsons Paradox
  • A/B Testing
  • Granger Causality
  • Graphical Models
  • The Ladder of Causality

Break (10 min)

Segment 2 – Properties of Graphical models (50 min)

  • Chains
  • Forks
  • Colliders
  • d-separation

Break (10 min)

Segment 3 – Interventions (50 min)

  • Interventions
  • Back-door criterion
  • Front-door criterion
  • Mediation

Break (10 min)

Segment 4 – Counterfactuals (30 min)

  • The fundamental laws of counterfactuals
  • Graphical representation
  • Practical Applications

Break (5 min)

Segment 5 – Connections to Machine Learning (30 min)

  • Structure Identifiability
  • Semi-Supervised learning
  • Applications to time-series