O'Reilly logo
live online training icon Live Online training

Unsupervised Learning for Algorithmic Trading

Enable algorithms to find structure in unlabeled data and engineer features using Python

Topic: Business
Deepak Kanungo

Traders and investors are deluged with unending streams of financial data with very low signal-to-noise ratios. Often traders don’t know what features or patterns they should be looking for in these voluminous streams of noisy data, especially if market conditions are unfamiliar or have changed substantially. These problems make data processing and engineering models for algorithmic trading extremely challenging, even for experts.

To tackle these challenges, traders and investors are increasingly enlisting the help of advanced machine learning algorithms and techniques that don’t require direct supervision of the learning process. These unsupervised learning algorithms not only help traders discover and visualize the structure of unlabeled financial data but also help them get rid of redundant features in their models, greatly enhancing the efficiency and effectiveness of their algorithmic trading systems.

Expert Deepak Kanungo takes you through the ins and outs of unsupervised machine learning in algorithmic trading and investing. You’ll examine fundamental concepts, the pros and cons of using unsupervised learning, how to use algorithms like k-means clustering and principal component analysis (PCA), and more. Join in to learn how to use unsupervised learning to engineer new features in your trading models and redesign your investment portfolio to reduce correlations in asset price returns.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • The advantages and disadvantages of unsupervised machine learning in algorithmic trading and investing
  • The fundamental concepts underlying unsupervised data exploration algorithms such as k-means clustering
  • How data exploration algorithms enable the understanding and processing of unlabeled financial data
  • The fundamental concepts underlying unsupervised data transformation algorithms such as principal component analysis (PCA)
  • How data transformation algorithms help in engineering the features of models used in algorithmic trading strategies and investment portfolios

And you’ll be able to:

  • Use scikit-learn to explore and transform input data without providing supervision via labeled output data to your algorithmic trading models
  • Allow unsupervised algorithms to assist you in finding similarities and hierarchies among securities, sectors, and styles
  • Enable unsupervised algorithms to engineer new features and eliminate others in your trading models
  • Empower unsupervised algorithms to redesign your investment portfolio to reduce correlations in asset price returns
  • Visualize complex patterns in your financial data with dendrograms and heatmaps

This training course is for you because...

  • You’re an investor, trader, or financial analyst or work closely with them.
  • You want to take advantage of unsupervised machine learning technologies.


  • Basic experience trading and investing in equities
  • Familiarity with Python and pandas DataFrames

Recommended preparation:

Recommended follow-up:

About your instructor

  • Deepak Kanungo is the founder and CEO of Hedged Capital LLC, an AI-powered, proprietary trading and analytics firm. Previously, Deepak was a financial advisor at Morgan Stanley, a Silicon Valley fintech entrepreneur, and a director in the Global Planning Department at Mastercard International. He was educated at Princeton University (astrophysics) and the London School of Economics (finance and information systems).


The timeframes are only estimates and may vary according to how the class is progressing

Unsupervised machine learning in algorithmic trading and investing (55 minutes)

  • Presentation: Conceptual overview of unsupervised data exploration and transformation machine learning algorithms; the paramount importance of feature engineering in finance and how these algorithms help; the differences in theoretical and data-driven model building in trading and investing; how algorithms improve on theoretical asset pricing models, portfolio management techniques, and investment style classifications
  • Hands-on exercises: Set up your Colab notebook; create pandas DataFrames to concatenate data from freely available public sources such as FRED (economic), Yahoo (equity), Quandl (various); preprocess and standardize data for unsupervised learning algorithms
  • Q&A
  • Break (5 minutes)

Using clustering algorithms to explore equity data for pairs trading (55 minutes)

  • Presentation: Overview of pairs trading and clustering algorithms, including k-means, hierarchical, and affinity propagation; enabling clustering algorithms to explore and select equities for pairs trading; visualizing clustering hierarchies using dendrograms
  • Hands-on exercise: Use scikit-learn’s clustering algorithms to select equity pairs for trading
  • Q&A
  • Break (5 minutes)

Using the PCA algorithm to build data-driven asset pricing models (55 minutes)

  • Presentation: Overview of various theoretical asset pricing models; how the PCA algorithm builds data-driven factor models for asset pricing without requiring any preconceived economic theory
  • Hands-on exercises: Use scikit-learn’s PCA algorithm to build factor models for equity prices; visualize factors using heatmaps
  • Q&A
  • Break (5 minutes)

Using the PCA algorithm to increase diversification in investment portfolios (55 minutes)

  • Presentation: Overview of Markowitz’s mean-variance portfolio management theory; how the PCA algorithm improves on it by reducing correlations among asset price returns
  • Hands-on exercises: Use scikit-learn’s PCA algorithm to design investment portfolios with reduced correlations among assets; derive portfolio weights with principal components

Wrap-up and Q&A (5 minutes)