O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Advances in Financial Machine Learning

Book Description

Machine learning (ML) is changing virtually every aspect of our lives. Today ML algorithms accomplish tasks that until recently only expert humans could perform. As it relates to finance, this is the most exciting time to adopt a disruptive technology that will transform how everyone invests for generations. Readers will learn how to structure Big data in a way that is amenable to ML algorithms; how to conduct research with ML algorithms on that data; how to use supercomputing methods; how to backtest your discoveries while avoiding false positives. The book addresses real-life problems faced by practitioners on a daily basis, and explains scientifically sound solutions using math, supported by code and examples. Readers become active users who can test the proposed solutions in their particular setting. Written by a recognized expert and portfolio manager, this book will equip investment professionals with the groundbreaking tools needed to succeed in modern finance.

Table of Contents

  1. About the Author
  2. PREAMBLE
    1. Chapter 1 Financial Machine Learning as a Distinct Subject
      1. 1.1 Motivation
      2. 1.2 The Main Reason Financial Machine Learning Projects Usually Fail
      3. 1.3 Book Structure
      4. 1.4 Target Audience
      5. 1.5 Requisites
      6. 1.6 FAQs
      7. 1.7 Acknowledgments
      8. Exercises
      9. References
      10. Bibliography
      11. Notes
  3. PART 1 DATA ANALYSIS
    1. Chapter 2 Financial Data Structures
      1. 2.1 Motivation
      2. 2.2 Essential Types of Financial Data
      3. 2.3 Bars
      4. 2.4 Dealing with Multi-Product Series
      5. 2.5 Sampling Features
      6. Exercises
      7. References
    2. Chapter 3 Labeling
      1. 3.1 Motivation
      2. 3.2 The Fixed-Time Horizon Method
      3. 3.3 Computing Dynamic Thresholds
      4. 3.4 The Triple-Barrier Method
      5. 3.5 Learning Side and Size
      6. 3.6 Meta-Labeling
      7. 3.7 How to Use Meta-Labeling
      8. 3.8 The Quantamental Way
      9. 3.9 Dropping Unnecessary Labels
      10. Exercises
      11. Bibliography
      12. Note
    3. Chapter 4 Sample Weights
      1. 4.1 Motivation
      2. 4.2 Overlapping Outcomes
      3. 4.3 Number of Concurrent Labels
      4. 4.4 Average Uniqueness of a Label
      5. 4.5 Bagging Classifiers and Uniqueness
      6. 4.6 Return Attribution
      7. 4.7 Time Decay
      8. 4.8 Class Weights
      9. Exercises
      10. References
      11. Bibliography
    4. Chapter 5 Fractionally Differentiated Features
      1. 5.1 Motivation
      2. 5.2 The Stationarity vs. Memory Dilemma
      3. 5.3 Literature Review
      4. 5.4 The Method
      5. 5.5 Implementation
      6. 5.6 Stationarity with Maximum Memory Preservation
      7. 5.7 Conclusion
      8. Exercises
      9. References
      10. Bibliography
  4. PART 2 MODELLING
    1. Chapter 6 Ensemble Methods
      1. 6.1 Motivation
      2. 6.2 The Three Sources of Errors
      3. 6.3 Bootstrap Aggregation
      4. 6.4 Random Forest
      5. 6.5 Boosting
      6. 6.6 Bagging vs. Boosting in Finance
      7. 6.7 Bagging for Scalability
      8. Exercises
      9. References
      10. Bibliography
      11. Notes
    2. Chapter 7 Cross-Validation in Finance
      1. 7.1 Motivation
      2. 7.2 The Goal of Cross-Validation
      3. 7.3 Why K-Fold CV Fails in Finance
      4. 7.4 A Solution: Purged K-Fold CV
      5. 7.5 Bugs in Sklearn's Cross-Validation
      6. Exercises
      7. Bibliography
    3. Chapter 8 Feature Importance
      1. 8.1 Motivation
      2. 8.2 The Importance of Feature Importance
      3. 8.3 Feature Importance with Substitution Effects
      4. 8.4 Feature Importance without Substitution Effects
      5. 8.5 Parallelized vs. Stacked Feature Importance
      6. 8.6 Experiments with Synthetic Data
      7. Exercises
      8. References
      9. Note
    4. Chapter 9 Hyper-Parameter Tuning with Cross-Validation
      1. 9.1 Motivation
      2. 9.2 Grid Search Cross-Validation
      3. 9.3 Randomized Search Cross-Validation
      4. 9.4 Scoring and Hyper-parameter Tuning
      5. Exercises
      6. References
      7. Bibliography
      8. Notes
  5. PART 3 BACKTESTING
    1. Chapter 10 Bet Sizing
      1. 10.1 Motivation
      2. 10.2 Strategy-Independent Bet Sizing Approaches
      3. 10.3 Bet Sizing from Predicted Probabilities
      4. 10.4 Averaging Active Bets
      5. 10.5 Size Discretization
      6. 10.6 Dynamic Bet Sizes and Limit Prices
      7. Exercises
      8. References
      9. Bibliography
      10. Notes
    2. Chapter 11 The Dangers of Backtesting
      1. 11.1 Motivation
      2. 11.2 Mission Impossible: The Flawless Backtest
      3. 11.3 Even If Your Backtest Is Flawless, It Is Probably Wrong
      4. 11.4 Backtesting Is Not a Research Tool
      5. 11.5 A Few General Recommendations
      6. 11.6 Strategy Selection
      7. Exercises
      8. References
      9. Bibliography
      10. Note
    3. Chapter 12 Backtesting through Cross-Validation
      1. 12.1 Motivation
      2. 12.2 The Walk-Forward Method
      3. 12.3 The Cross-Validation Method
      4. 12.4 The Combinatorial Purged Cross-Validation Method
      5. 12.5 How Combinatorial Purged Cross-Validation Addresses Backtest Overfitting
      6. Exercises
      7. References
    4. Chapter 13 Backtesting on Synthetic Data
      1. 13.1 Motivation
      2. 13.2 Trading Rules
      3. 13.3 The Problem
      4. 13.4 Our Framework
      5. 13.5 Numerical Determination of Optimal Trading Rules
      6. 13.6 Experimental Results
      7. 13.7 Conclusion
      8. Exercises
      9. References
      10. Notes
    5. Chapter 14 Backtest Statistics
      1. 14.1 Motivation
      2. 14.2 Types of Backtest Statistics
      3. 14.3 General Characteristics
      4. 14.4 Performance
      5. 14.5 Runs
      6. 14.6 Implementation Shortfall
      7. 14.7 Efficiency
      8. 14.8 Classification Scores
      9. 14.9 Attribution
      10. Exercises
      11. References
      12. Bibliography
      13. Notes
    6. Chapter 15 Understanding Strategy Risk
      1. 15.1 Motivation
      2. 15.2 Symmetric Payouts
      3. 15.3 Asymmetric Payouts
      4. 15.4 The Probability of Strategy Failure
      5. Exercises
      6. References
    7. Chapter 16 Machine Learning Asset Allocation
      1. 16.1 Motivation
      2. 16.2 The Problem with Convex Portfolio Optimization
      3. 16.3 Markowitz's Curse
      4. 16.4 From Geometric to Hierarchical Relationships
      5. 16.5 A Numerical Example
      6. 16.6 Out-of-Sample Monte Carlo Simulations
      7. 16.7 Further Research
      8. 16.8 Conclusion
      9. APPENDICES
      10. 16.A.1 Correlation-based Metric
      11. 16.A.2 Inverse Variance Allocation
      12. 16.A.3 Reproducing the Numerical Example
      13. 16.A.4 Reproducing the Monte Carlo Experiment
      14. Exercises
      15. References
      16. Notes
  6. PART 4 USEFUL FINANCIAL FEATURES
    1. Chapter 17 Structural Breaks
      1. 17.1 Motivation
      2. 17.2 Types of Structural Break Tests
      3. 17.3 CUSUM Tests
      4. 17.4 Explosiveness Tests
      5. Exercises
      6. References
    2. Chapter 18 Entropy Features
      1. 18.1 Motivation
      2. 18.2 Shannon's Entropy
      3. 18.3 The Plug-in (or Maximum Likelihood) Estimator
      4. 18.4 Lempel-Ziv Estimators
      5. 18.5 Encoding Schemes
      6. 18.6 Entropy of a Gaussian Process
      7. 18.7 Entropy and the Generalized Mean
      8. 18.8 A Few Financial Applications of Entropy
      9. Exercises
      10. References
      11. Bibliography
      12. Note
    3. Chapter 19 Microstructural Features
      1. 19.1 Motivation
      2. 19.2 Review of the Literature
      3. 19.3 First Generation: Price Sequences
      4. 19.4 Second Generation: Strategic Trade Models
      5. 19.5 Third Generation: Sequential Trade Models
      6. 19.6 Additional Features from Microstructural Datasets
      7. 19.7 What Is Microstructural Information?
      8. Exercises
      9. References
  7. PART 5 HIGH-PERFORMANCE COMPUTING RECIPES
    1. Chapter 20 Multiprocessing and Vectorization
      1. 20.1 Motivation
      2. 20.2 Vectorization Example
      3. 20.3 Single-Thread vs. Multithreading vs. Multiprocessing
      4. 20.4 Atoms and Molecules
      5. 20.5 Multiprocessing Engines
      6. 20.6 Multiprocessing Example
      7. Exercises
      8. Reference
      9. Bibliography
      10. Notes
    2. Chapter 21 Brute Force and Quantum Computers
      1. 21.1 Motivation
      2. 21.2 Combinatorial Optimization
      3. 21.3 The Objective Function
      4. 21.4 The Problem
      5. 21.5 An Integer Optimization Approach
      6. 21.6 A Numerical Example
      7. Exercises
      8. References
    3. Chapter 22 High-Performance Computational Intelligence and Forecasting Technologies
      1. 22.1 Motivation
      2. 22.2 Regulatory Response to the Flash Crash of 2010
      3. 22.3 Background
      4. 22.4 HPC Hardware
      5. 22.5 HPC Software
      6. 22.6 Use Cases
      7. 22.7 Summary and Call for Participation
      8. 22.8 Acknowledgments
      9. References
      10. Notes
  8. Index
  9. EULA