book

Advances in Financial Machine Learning

by Marcos Lopez de Prado

February 2018

Intermediate to advanced

400 pages

10h 17m

English

Wiley

Audiobook available

Read now

Unlock full access

1.1 Motivation1.2 The Main Reason Financial Machine Learning Projects Usually Fail1.3 Book Structure1.4 Target Audience1.5 Requisites1.6 FAQs1.7 AcknowledgmentsExercisesReferencesBibliographyNotes
2.1 Motivation2.2 Essential Types of Financial Data2.3 Bars2.4 Dealing with Multi-Product Series2.5 Sampling FeaturesExercisesReferences
3.1 Motivation3.2 The Fixed-Time Horizon Method3.3 Computing Dynamic Thresholds3.4 The Triple-Barrier Method3.5 Learning Side and Size3.6 Meta-Labeling3.7 How to Use Meta-Labeling3.8 The Quantamental Way3.9 Dropping Unnecessary LabelsExercisesBibliographyNote
4.1 Motivation4.2 Overlapping Outcomes4.3 Number of Concurrent Labels4.4 Average Uniqueness of a Label4.5 Bagging Classifiers and Uniqueness4.6 Return Attribution4.7 Time Decay4.8 Class WeightsExercisesReferencesBibliography
5.1 Motivation5.2 The Stationarity vs. Memory Dilemma5.3 Literature Review5.4 The Method5.5 Implementation5.6 Stationarity with Maximum Memory Preservation5.7 ConclusionExercisesReferencesBibliography
6.1 Motivation6.2 The Three Sources of Errors6.3 Bootstrap Aggregation6.4 Random Forest6.5 Boosting6.6 Bagging vs. Boosting in Finance6.7 Bagging for ScalabilityExercisesReferencesBibliographyNotes

7.1 Motivation7.2 The Goal of Cross-Validation7.3 Why K-Fold CV Fails in Finance7.4 A Solution: Purged K-Fold CV7.5 Bugs in Sklearn's Cross-ValidationExercisesBibliography
8.1 Motivation8.2 The Importance of Feature Importance8.3 Feature Importance with Substitution Effects8.4 Feature Importance without Substitution Effects8.5 Parallelized vs. Stacked Feature Importance8.6 Experiments with Synthetic DataExercisesReferencesNote
9.1 Motivation9.2 Grid Search Cross-Validation9.3 Randomized Search Cross-Validation9.4 Scoring and Hyper-parameter TuningExercisesReferencesBibliographyNotes
10.1 Motivation10.2 Strategy-Independent Bet Sizing Approaches10.3 Bet Sizing from Predicted Probabilities10.4 Averaging Active Bets10.5 Size Discretization10.6 Dynamic Bet Sizes and Limit PricesExercisesReferencesBibliographyNotes
11.1 Motivation11.2 Mission Impossible: The Flawless Backtest11.3 Even If Your Backtest Is Flawless, It Is Probably Wrong11.4 Backtesting Is Not a Research Tool11.5 A Few General Recommendations11.6 Strategy SelectionExercisesReferencesBibliographyNote
12.1 Motivation12.2 The Walk-Forward Method12.3 The Cross-Validation Method12.4 The Combinatorial Purged Cross-Validation Method12.5 How Combinatorial Purged Cross-Validation Addresses Backtest OverfittingExercisesReferences
13.1 Motivation13.2 Trading Rules13.3 The Problem13.4 Our Framework13.5 Numerical Determination of Optimal Trading Rules13.6 Experimental Results13.7 ConclusionExercisesReferencesNotes
14.1 Motivation14.2 Types of Backtest Statistics14.3 General Characteristics14.4 Performance14.5 Runs14.6 Implementation Shortfall14.7 Efficiency14.8 Classification Scores14.9 AttributionExercisesReferencesBibliographyNotes
15.1 Motivation15.2 Symmetric Payouts15.3 Asymmetric Payouts15.4 The Probability of Strategy FailureExercisesReferences
16.1 Motivation16.2 The Problem with Convex Portfolio Optimization16.3 Markowitz's Curse16.4 From Geometric to Hierarchical Relationships16.5 A Numerical Example16.6 Out-of-Sample Monte Carlo Simulations16.7 Further Research16.8 ConclusionAPPENDICES16.A.1 Correlation-based Metric16.A.2 Inverse Variance Allocation16.A.3 Reproducing the Numerical Example16.A.4 Reproducing the Monte Carlo ExperimentExercisesReferencesNotes
17.1 Motivation17.2 Types of Structural Break Tests17.3 CUSUM Tests17.4 Explosiveness TestsExercisesReferences
18.1 Motivation18.2 Shannon's Entropy18.3 The Plug-in (or Maximum Likelihood) Estimator18.4 Lempel-Ziv Estimators18.5 Encoding Schemes18.6 Entropy of a Gaussian Process18.7 Entropy and the Generalized Mean18.8 A Few Financial Applications of EntropyExercisesReferencesBibliographyNote
19.1 Motivation19.2 Review of the Literature19.3 First Generation: Price Sequences19.4 Second Generation: Strategic Trade Models19.5 Third Generation: Sequential Trade Models19.6 Additional Features from Microstructural Datasets19.7 What Is Microstructural Information?ExercisesReferences
20.1 Motivation20.2 Vectorization Example20.3 Single-Thread vs. Multithreading vs. Multiprocessing20.4 Atoms and Molecules20.5 Multiprocessing Engines20.6 Multiprocessing ExampleExercisesReferenceBibliographyNotes
21.1 Motivation21.2 Combinatorial Optimization21.3 The Objective Function21.4 The Problem21.5 An Integer Optimization Approach21.6 A Numerical ExampleExercisesReferences
22.1 Motivation22.2 Regulatory Response to the Flash Crash of 201022.3 Background22.4 HPC Hardware22.5 HPC Software22.6 Use Cases22.7 Summary and Call for Participation22.8 AcknowledgmentsReferencesNotes

Content preview from Advances in Financial Machine Learning

CHAPTER 9 Hyper-Parameter Tuning with Cross-Validation

9.1 Motivation

Hyper-parameter tuning is an essential step in fitting an ML algorithm. When this is not done properly, the algorithm is likely to overfit, and live performance will disappoint. The ML literature places special attention on cross-validating any tuned hyper-parameter. As we have seen in Chapter 7, cross-validation (CV) in finance is an especially difficult problem, where solutions from other fields are likely to fail. In this chapter we will discuss how to tune hyper-parameters using the purged k-fold CV method. The references section lists studies that propose alternative methods that may be useful in specific problems.

9.2 Grid Search Cross-Validation

Grid search cross-validation conducts an exhaustive search for the combination of parameters that maximizes the CV performance, according to some user-defined score function. When we do not know much about the underlying structure of the data, this is a reasonable first approach. Scikit-learn has implemented this logic in the function GridSearchCV, which accepts a CV generator as an argument. For the reasons explained in Chapter 7, we need to pass our PurgedKFold class (Snippet 7.3) in order to prevent that GridSearchCV overfits the ML estimator to leaked information.

Snippet 9.1 Grid search with purged k-fold cross-validation

Snippet 9.1 lists function ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781119482086Purchase book

Advances in Financial Machine Learning

by Marcos Lopez de Prado

CHAPTER 9 Hyper-Parameter Tuning with Cross-Validation

9.1 Motivation

9.2 Grid Search Cross-Validation

Snippet 9.1 Grid search with purged k-fold cross-validation

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Financial Data Engineering

Quantitative Trading Strategies Using Python: Technical Analysis, Statistical Testing, and Machine Learning

Quantitative Trading, 2nd Edition

Machine Learning for Algorithmic Trading - Second Edition

Publisher Resources

9.1 Motivation

9.2 Grid Search Cross-Validation

Snippet 9.1 Grid search with purged k-fold cross-validation

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Financial Data Engineering

Quantitative Trading Strategies Using Python: Technical Analysis, Statistical Testing, and Machine Learning

Quantitative Trading, 2nd Edition

Machine Learning for Algorithmic Trading - Second Edition

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.