O'Reilly logo
live online training icon Live Online training

Quantitative Trading Next Steps

Using alternative data and machine learning to build trading signals in Python

Topic: Data
Chakri Cherukuri

Investors have long developed quantitative trading strategies using structured financial datasets like stock price time series and fundamental data. Recently, unstructured datasets (such as text and images) and corresponding machine learning methods to process them have grown in popularity. These unstructured datasets are called “alternative datasets” or “alt data” for short. With an exponential increase in the amount of data available and advances in machine learning/deep learning, it is now possible to process these alternative datasets and try to use them for building trading signals.

In this training course, we will start with a brief overview of quantitative trading and look at some examples of backtesting simple strategies. We will then cover different flavors of alt data and focus on one particular dataset in detail: text data based on news stories and tweets. We’ll explore the machine learning models needed to process these datasets, and then look at quantitative approaches for constructing portfolios of stocks based on sentiment scores. Along the way, we will learn how to construct long-short factor portfolios, backtest them, and compute performance statistics.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • What “alternative data” is
  • Different flavors of alternative datasets
  • How to apply machine learning algorithms to process alt data
  • How to build “factor scores” from these datasets
  • How to construct factor portfolios and backtest them

And you’ll be able to:

  • Build machine learning models and NLP techniques in Python to process text datasets
  • Learn about constructing long-short factor portfolios
  • Backtest the trading strategies and compute performance statistics

This training course is for you because...

  • You’re a retail equity investor or a trader who wants to build quantitative strategies
  • You work at a buy side trading firm and wants to know how to use machine learning to build trading signals using alternative data


  • Experience in equities trading and investing
  • Comfort with Python, including familiarity with numpy, pandas, and building classifiers with scikit-learn
  • Familiarity with the principles of of machine learning, text processing, and natural language processing

Recommended preparation:

Recommended follow-up:

About your instructor

  • Chakri Cherukuri is a senior researcher in the Quantitative Financial Research group at Bloomberg LP. His research interests include quantitative portfolio management, algorithmic trading strategies and applied machine learning/deep learning. Previously, he built analytical tools for the trading desks at Goldman Sachs and Lehman Brothers. Before that he worked in the Silicon Valley for startups building enterprise software applications. He has extensive experience in scientific computing and software development. He is a core contributor to bqplot, a 2D plotting library for the Jupyter notebook. He holds an undergraduate degree in mechanical engineering from Indian Institute of Technology (IIT), Madras, an MS in computer science from Arizona State University and another MS in computational finance from Carnegie Mellon University.


The timeframes are only estimates and may vary according to how the class is progressing

Overview of quantitative trading and alternative data (55 min)

  • Poll (5 min)
  • Presentation: Overview of quantitative trading and signal generation (20 min)
  • Exercise: Backtesting a simple moving average crossover strategy using pandas (10 min)
  • Presentation: Overview of alternative data and its popularity in finance (20 min)
  • Break (5 minutes)

Machine learning for processing alternative data (55 min)

  • Presentation: Machine learning models for text processing (20 min)
  • Exercise: Building a sentiment classifier on text data using scikit-learn (15 min)
  • Image classification models (10 min)
  • Vendors offering machine learning analytics on alt data (5 min)
  • Q&A (5 min)
  • Break (5 minutes)

Factor Investing and backtesting (55 minutes)

  • Presentation: Basics of factor investing, Fama-French factors (30 min)
  • Exercise: Factor scoring of S&P stocks based on size (20 min)
  • Q&A (5 min)
  • Break (5 minutes)

Twitter sentiment model and sentiment portfolios (60 minutes)

  • Presentation: Twitter sentiment ML model (15 min)
  • Demo: Twitter sentiment model (20 min)
  • Presentation: Constructing sentiment portfolios and backtesting (20 min)
  • Q&A (5 min)