book

Advances in Financial Machine Learning

by Marcos Lopez de Prado

February 2018

Intermediate to advanced

400 pages

10h 17m

English

Wiley

Audiobook available

Read now

Unlock full access

1.1 Motivation1.2 The Main Reason Financial Machine Learning Projects Usually Fail1.3 Book Structure1.4 Target Audience1.5 Requisites1.6 FAQs1.7 AcknowledgmentsExercisesReferencesBibliographyNotes
2.1 Motivation2.2 Essential Types of Financial Data2.3 Bars2.4 Dealing with Multi-Product Series2.5 Sampling FeaturesExercisesReferences
3.1 Motivation3.2 The Fixed-Time Horizon Method3.3 Computing Dynamic Thresholds3.4 The Triple-Barrier Method3.5 Learning Side and Size3.6 Meta-Labeling3.7 How to Use Meta-Labeling3.8 The Quantamental Way3.9 Dropping Unnecessary LabelsExercisesBibliographyNote
4.1 Motivation4.2 Overlapping Outcomes4.3 Number of Concurrent Labels4.4 Average Uniqueness of a Label4.5 Bagging Classifiers and Uniqueness4.6 Return Attribution4.7 Time Decay4.8 Class WeightsExercisesReferencesBibliography
5.1 Motivation5.2 The Stationarity vs. Memory Dilemma5.3 Literature Review5.4 The Method5.5 Implementation5.6 Stationarity with Maximum Memory Preservation5.7 ConclusionExercisesReferencesBibliography
6.1 Motivation6.2 The Three Sources of Errors6.3 Bootstrap Aggregation6.4 Random Forest6.5 Boosting6.6 Bagging vs. Boosting in Finance6.7 Bagging for ScalabilityExercisesReferencesBibliographyNotes

7.1 Motivation7.2 The Goal of Cross-Validation7.3 Why K-Fold CV Fails in Finance7.4 A Solution: Purged K-Fold CV7.5 Bugs in Sklearn's Cross-ValidationExercisesBibliography
8.1 Motivation8.2 The Importance of Feature Importance8.3 Feature Importance with Substitution Effects8.4 Feature Importance without Substitution Effects8.5 Parallelized vs. Stacked Feature Importance8.6 Experiments with Synthetic DataExercisesReferencesNote
9.1 Motivation9.2 Grid Search Cross-Validation9.3 Randomized Search Cross-Validation9.4 Scoring and Hyper-parameter TuningExercisesReferencesBibliographyNotes
10.1 Motivation10.2 Strategy-Independent Bet Sizing Approaches10.3 Bet Sizing from Predicted Probabilities10.4 Averaging Active Bets10.5 Size Discretization10.6 Dynamic Bet Sizes and Limit PricesExercisesReferencesBibliographyNotes
11.1 Motivation11.2 Mission Impossible: The Flawless Backtest11.3 Even If Your Backtest Is Flawless, It Is Probably Wrong11.4 Backtesting Is Not a Research Tool11.5 A Few General Recommendations11.6 Strategy SelectionExercisesReferencesBibliographyNote
12.1 Motivation12.2 The Walk-Forward Method12.3 The Cross-Validation Method12.4 The Combinatorial Purged Cross-Validation Method12.5 How Combinatorial Purged Cross-Validation Addresses Backtest OverfittingExercisesReferences
13.1 Motivation13.2 Trading Rules13.3 The Problem13.4 Our Framework13.5 Numerical Determination of Optimal Trading Rules13.6 Experimental Results13.7 ConclusionExercisesReferencesNotes
14.1 Motivation14.2 Types of Backtest Statistics14.3 General Characteristics14.4 Performance14.5 Runs14.6 Implementation Shortfall14.7 Efficiency14.8 Classification Scores14.9 AttributionExercisesReferencesBibliographyNotes
15.1 Motivation15.2 Symmetric Payouts15.3 Asymmetric Payouts15.4 The Probability of Strategy FailureExercisesReferences
16.1 Motivation16.2 The Problem with Convex Portfolio Optimization16.3 Markowitz's Curse16.4 From Geometric to Hierarchical Relationships16.5 A Numerical Example16.6 Out-of-Sample Monte Carlo Simulations16.7 Further Research16.8 ConclusionAPPENDICES16.A.1 Correlation-based Metric16.A.2 Inverse Variance Allocation16.A.3 Reproducing the Numerical Example16.A.4 Reproducing the Monte Carlo ExperimentExercisesReferencesNotes
17.1 Motivation17.2 Types of Structural Break Tests17.3 CUSUM Tests17.4 Explosiveness TestsExercisesReferences
18.1 Motivation18.2 Shannon's Entropy18.3 The Plug-in (or Maximum Likelihood) Estimator18.4 Lempel-Ziv Estimators18.5 Encoding Schemes18.6 Entropy of a Gaussian Process18.7 Entropy and the Generalized Mean18.8 A Few Financial Applications of EntropyExercisesReferencesBibliographyNote
19.1 Motivation19.2 Review of the Literature19.3 First Generation: Price Sequences19.4 Second Generation: Strategic Trade Models19.5 Third Generation: Sequential Trade Models19.6 Additional Features from Microstructural Datasets19.7 What Is Microstructural Information?ExercisesReferences
20.1 Motivation20.2 Vectorization Example20.3 Single-Thread vs. Multithreading vs. Multiprocessing20.4 Atoms and Molecules20.5 Multiprocessing Engines20.6 Multiprocessing ExampleExercisesReferenceBibliographyNotes
21.1 Motivation21.2 Combinatorial Optimization21.3 The Objective Function21.4 The Problem21.5 An Integer Optimization Approach21.6 A Numerical ExampleExercisesReferences
22.1 Motivation22.2 Regulatory Response to the Flash Crash of 201022.3 Background22.4 HPC Hardware22.5 HPC Software22.6 Use Cases22.7 Summary and Call for Participation22.8 AcknowledgmentsReferencesNotes

Content preview from Advances in Financial Machine Learning

CHAPTER 6 Ensemble Methods

6.1 Motivation

In this chapter we will discuss two of the most popular ML ensemble methods.¹ In the references and footnotes you will find books and articles that introduce these techniques. As everywhere else in this book, the assumption is that you have already used these approaches. The goal of this chapter is to explain what makes them effective, and how to avoid common errors that lead to their misuse in finance.

6.2 The Three Sources of Errors

ML models generally suffer from three errors:²

Bias: This error is caused by unrealistic assumptions. When bias is high, the ML algorithm has failed to recognize important relations between features and outcomes. In this situation, the algorithm is said to be “underfit.”
Variance: This error is caused by sensitivity to small changes in the training set. When variance is high, the algorithm has overfit the training set, and that is why even minimal changes in the training set can produce wildly different predictions. Rather than modelling the general patterns in the training set, the algorithm has mistaken noise with signal.
Noise: This error is caused by the variance of the observed values, like unpredictable changes or measurement errors. This is the irreducible error, which cannot be explained by any model.

Consider a training set of observations {x_i}_{i = 1, …, n} and real-valued outcomes {y_i}_{i = 1, …, n}. Suppose a function f[x] exists, such that y = f[x] + ϵ, where ϵ is white noise with E[ϵ_i] = 0 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781119482086Purchase book

Advances in Financial Machine Learning

by Marcos Lopez de Prado

CHAPTER 6 Ensemble Methods

6.1 Motivation

6.2 The Three Sources of Errors

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Financial Data Engineering

Quantitative Trading Strategies Using Python: Technical Analysis, Statistical Testing, and Machine Learning

Quantitative Trading, 2nd Edition

Machine Learning for Algorithmic Trading - Second Edition

Publisher Resources

6.1 Motivation

6.2 The Three Sources of Errors

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Financial Data Engineering

Quantitative Trading Strategies Using Python: Technical Analysis, Statistical Testing, and Machine Learning

Quantitative Trading, 2nd Edition

Machine Learning for Algorithmic Trading - Second Edition

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.