Skip to Content
View all events

World Cup Analytics with ML

Published by O'Reilly Media, Inc.

Beginner content levelBeginner

Engineer high-performance predictive models with soccer data

What you’ll learn and how you can apply it

  • Execute a complete end-to-end soccer analytics project, from loading and preparing raw StatsBomb event data to interpreting the output of multiple predictive models
  • Create a variety of soccer-specific visualizations (e.g., shot maps, pass networks) to explore the tactical dynamics of a single high-stakes match
  • Build and apply fundamental machine learning models for both classification (expected goals) and regression (predicting player performance) in a soccer context
  • Implement a simple deep learning model in PyTorch or TensorFlow to predict match outcomes

Course description

With the 2026 FIFA World Cup on the horizon, the soccer world is buzzing with excitement. Join soccer fans Guanyu Hu and Ari Joury for a practical way to leverage your passion as you wait for the tournament: learning the core modeling techniques that are transforming the sport. The course is built around the analysis, from start to finish, of a classic World Cup match. You’ll follow a complete data science workflow, beginning with raw event data from StatsBomb and culminating in the application of classification, regression, and even deep learning models to understand and predict the game’s key moments.

This project-based approach ensures that you immediately apply and contextualize what you’re learning. You’ll understand how to create compelling, soccer-specific visualizations to uncover tactical patterns, build a classic expected goals (xG) model to evaluate shot quality, and explore advanced methods for predicting outcomes. The course content is curated from the companion book, Soccer Analytics with Machine Learning, and provides a cohesive and in-depth learning experience that’s directly applicable to real-world sports analytics challenges.

This live event is for you because...

  • You’re a beginner or advanced data scientist, data analyst, or machine learning engineer who wants to apply existing technical skills to the dynamic and growing domain of sports.
  • You work with sports data in a professional capacity and want to deepen your analytical toolkit.
  • You want to become a sports data analyst and are looking to add a unique specialization that sets you apart in the job market.
  • You work with other data (e.g., healthcare) but want to grow and develop your skills in a new way.

Prerequisites

  • Python 3.9 or higher installed on your machine (the Anaconda Distribution is recommended, as it includes Jupyter Notebook and the most common data science libraries)
  • Jupyter Notebook or JupyterLab installed for interactive coding and visualization

Recommended preparation:

  • The course notebooks and requirements.txt file will be made available in a public GitHub repository prior to the course (please install the required Python libraries by running pip install -r requirements.txt)
  • Verify your setup by running a short test notebook provided in the GitHub repository to confirm that all libraries load correctly and that you can access the StatsBomb open data

Recommended follow-up:

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

The project and the landscape (20 minutes)

  • Presentation: Introduction to the course project—a deep dive into a classic World Cup match; core concepts and the data-driven revolution in soccer; outline of the plan to dissect the match using a full suite of analytical tools

First look at the data (40 minutes)

  • Presentation: Understanding the structure and richness of professional event data (the raw material for the project); using the statsbombpy library to load the competition data and find the match_id; loading the event data for that single match; filtering the data for all shots and using mplsoccer to plot them on a pitch, color-coding by team
  • Hands-on exercises: Load the match; create a shot map
  • Group discussion: A quick thumbs up/down to see if you’ve successfully created the visualization; brief discussion of common errors or interesting findings
  • Q&A
  • Break

Building an expected goal model (35 minutes)

  • Presentation: Classification and the expected goal (xG) metric; exploring the theory of binary classification and why it's a superior measure of performance
  • Hands-on exercises: Engineer two features for an xG model; train a simple xG model; train a logistic regression model on features and evaluate its accuracy
  • Q&A

Regression, deep learning, and optimization (50 minutes)

  • Presentation: Predicting continuous values; using neural networks; regression techniques for counting goals; using a simple neural network for match outcome prediction
  • Hands-on exercise: Build a basic Poisson regression model to predict the number of goals each team might score based on their non-shot xG
  • Group discussion: What each model (xG, Poisson) tells you about the match and how their predictions differ; how to use optimization techniques to go further
  • Q&A
  • Break

The full picture and your path forward (35 minutes)

  • Presentation: Reviewing the complete worked example; paths for continued learning
  • Group discussion: Career opportunities and open source communities
  • Hands-on exercise: Answer survey
  • Q&A

Your Instructors

  • Dr. Guanyu Hu

    Guanyu Hu is an associate professor at Michigan State University, where his work centers on sports analytics and statistics. His publications span sports-related methodological and applied research. Previously, he served as chair of the American Statistical Association’s Statistics in Sports Section, contributing leadership and service to the broader sports analytics community. He’s also an organizer of the American Soccer Insights Summit, helping bring together researchers, analysts, and practitioners to advance innovation in soccer analytics.

  • Ari Joury

    Ari Joury is a machine learning engineer and CEO of Wangari Global, working at the intersection of AI, decision-making systems, and real-world production environments. With a background spanning theoretical physics, applied ML, and enterprise analytics, Ari’s focus is on building AI systems that move from prediction to action—and helping teams integrate AI assistants into serious engineering workflows.

Skill covered

Data Science