O'Reilly logo
live online training icon Live Online training

Introduction to Machine Learning for Algorithmic Trading

Train algorithms to discover trading signals using Python

Deepak Kanungo

In the 20th century, traders and system developers worked together to explicitly formulate all the rules that were executed by their algorithmic trading systems. In the 21st century, financial data scientists are training computer algorithms to discover complex functional relationships from multiple data sources to augment the insights of traders. These ML models are now generating many of the rules used in all aspects of the trading process, from idea generation to execution and portfolio management. ML-based algorithmic trading has contributed significantly to the frenetic pace of automation in the investment management industry where over 75% of the daily trading in equities is done algorithmically.

Linear models play a pivotal role in modern financial research and practice. These types of models have the longest history in the industry and are seen as the baseline financial model for making inferences and predictions. Furthermore, linear models are intuitive and transparent. That’s why we focus on supervised linear ML models for regression and classification in this introductory course. This course provides the fundamental concepts, process and technological tools for applying machine learning models to algorithmic trading strategies. Note that live trading is out of scope for the course.

This is part of a four-course series on algorithms in finance, trading, and investing. After this course, we recommend taking the following courses, in this order:

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • The benefits and challenges of applying machine learning models to algorithmic trading and investing
  • The various types of machine learning models used in algorithmic trading
  • The concepts, process and tools used for researching, designing and developing them
  • How to manage the trade-off between bias and variance in ML models
  • Pitfalls of cross-validation and backtesting for evaluating their performance
  • The paramount importance of domain expertise and feature engineering

And you’ll be able to:

  • Use the Scikit-learn library to analyze, design and develop linear ML models for regression and classification
  • Leverage the Statsmodels library to diagnose the robustness of your ML models
  • Train and test linear ML models for algorithmic trading
  • Evaluate the performance of ML models
  • Fine tune the hyperparameters of your ML models to improve their performance

This training course is for you because...

  • You’re a retail equity investor, financial analyst, or trader who wants to use machine learning models to help them discover new trading signals.


  • Basic experience trading and investing in equities
  • Familiarity with Python and pandas data frames

Recommended preparation:

Recommended follow-up:

About your instructor

  • Deepak Kanungo is the founder and CEO of Hedged Capital LLC, an AI-powered trading and advisory firm. Previously, Deepak was a financial advisor at Morgan Stanley, a Silicon Valley fintech entrepreneur and a Director in the Global Planning Department at MasterCard International. Deepak was educated at Princeton University (Astrophysics) and The London School of Economics (Finance and Information Systems). Hedged Capital’s trading algorithms use probabilistic models and technologies. In 2005, Deepak invented a project portfolio management system using Bayesian Inference, the foundation of all probabilistic programming languages.


The timeframes are only estimates and may vary according to how the class is progressing

1. Overview of various types of ML models and development process (55 minutes)

  • Poll
  • Presentation: Brief overview of the different types of machine learning models including supervised, unsupervised, deep learning and reinforcement learning models used in finance. The need for domain knowledge to curate data sources. The paramount importance of feature engineering. The trade off between bias and variance in ML models.
  • Discussion: ML concepts, benefits and issues
  • Presentation: Overview of the ML development process for algorithmic trading
  • Exercise: Setup Colab notebook. Create pandas dataframes to concatenate data from freely available public sources such as FRED (economic), Yahoo (equity), Quandl (various).
  • Q&A
  • Break (5 minutes)

2. Using linear regression models to forecast stock price returns (55 minutes)

  • Presentation: Training ordinary least squares linear regression model. Using lasso and ridge regression to prevent overfitting to noisy financial data.
  • Exercise: Use Scikit-learn module to train and test three types of linear regression models to predict stock price returns.
  • Q&A
  • Break (5 minutes)

3. Using linear classification models to predict an economic recession (55 minutes)

  • Presentation: Training logistic regression model. Using the lasso and ridge regularization to prevent overfitting of noisy financial data. Understanding how the classifier assigns probabilities.
  • Exercise: Use Scikit-learn module to train and test the logistic classification models to predict an economic recession.
  • Q&A
  • Break (5 minutes)

4. Evaluating and improving linear regression and classification ML models (60 minutes)

  • Presentation: Using cross-validation and grid search to improve performance. Issues with using cross-validation techniques with financial data. Risk-adjusted business, binary classification and regression performance metrics. Backtesting and forward testing algorithms using market data. Discussion of pitfalls and how to try to remedy them.
  • Exercise: Use Scikit-learn and Statsmodels to evaluate and diagnose both types of linear models and fine them to improve their performance.
  • Q&A