book

Advances in Financial Machine Learning

by Marcos Lopez de Prado

February 2018

Intermediate to advanced

400 pages

10h 17m

English

Wiley

Audiobook available

Read now

Unlock full access

1.1 Motivation1.2 The Main Reason Financial Machine Learning Projects Usually Fail1.3 Book Structure1.4 Target Audience1.5 Requisites1.6 FAQs1.7 AcknowledgmentsExercisesReferencesBibliographyNotes
2.1 Motivation2.2 Essential Types of Financial Data2.3 Bars2.4 Dealing with Multi-Product Series2.5 Sampling FeaturesExercisesReferences
3.1 Motivation3.2 The Fixed-Time Horizon Method3.3 Computing Dynamic Thresholds3.4 The Triple-Barrier Method3.5 Learning Side and Size3.6 Meta-Labeling3.7 How to Use Meta-Labeling3.8 The Quantamental Way3.9 Dropping Unnecessary LabelsExercisesBibliographyNote
4.1 Motivation4.2 Overlapping Outcomes4.3 Number of Concurrent Labels4.4 Average Uniqueness of a Label4.5 Bagging Classifiers and Uniqueness4.6 Return Attribution4.7 Time Decay4.8 Class WeightsExercisesReferencesBibliography
5.1 Motivation5.2 The Stationarity vs. Memory Dilemma5.3 Literature Review5.4 The Method5.5 Implementation5.6 Stationarity with Maximum Memory Preservation5.7 ConclusionExercisesReferencesBibliography
6.1 Motivation6.2 The Three Sources of Errors6.3 Bootstrap Aggregation6.4 Random Forest6.5 Boosting6.6 Bagging vs. Boosting in Finance6.7 Bagging for ScalabilityExercisesReferencesBibliographyNotes

7.1 Motivation7.2 The Goal of Cross-Validation7.3 Why K-Fold CV Fails in Finance7.4 A Solution: Purged K-Fold CV7.5 Bugs in Sklearn's Cross-ValidationExercisesBibliography
8.1 Motivation8.2 The Importance of Feature Importance8.3 Feature Importance with Substitution Effects8.4 Feature Importance without Substitution Effects8.5 Parallelized vs. Stacked Feature Importance8.6 Experiments with Synthetic DataExercisesReferencesNote
9.1 Motivation9.2 Grid Search Cross-Validation9.3 Randomized Search Cross-Validation9.4 Scoring and Hyper-parameter TuningExercisesReferencesBibliographyNotes
10.1 Motivation10.2 Strategy-Independent Bet Sizing Approaches10.3 Bet Sizing from Predicted Probabilities10.4 Averaging Active Bets10.5 Size Discretization10.6 Dynamic Bet Sizes and Limit PricesExercisesReferencesBibliographyNotes
11.1 Motivation11.2 Mission Impossible: The Flawless Backtest11.3 Even If Your Backtest Is Flawless, It Is Probably Wrong11.4 Backtesting Is Not a Research Tool11.5 A Few General Recommendations11.6 Strategy SelectionExercisesReferencesBibliographyNote
12.1 Motivation12.2 The Walk-Forward Method12.3 The Cross-Validation Method12.4 The Combinatorial Purged Cross-Validation Method12.5 How Combinatorial Purged Cross-Validation Addresses Backtest OverfittingExercisesReferences
13.1 Motivation13.2 Trading Rules13.3 The Problem13.4 Our Framework13.5 Numerical Determination of Optimal Trading Rules13.6 Experimental Results13.7 ConclusionExercisesReferencesNotes
14.1 Motivation14.2 Types of Backtest Statistics14.3 General Characteristics14.4 Performance14.5 Runs14.6 Implementation Shortfall14.7 Efficiency14.8 Classification Scores14.9 AttributionExercisesReferencesBibliographyNotes
15.1 Motivation15.2 Symmetric Payouts15.3 Asymmetric Payouts15.4 The Probability of Strategy FailureExercisesReferences
16.1 Motivation16.2 The Problem with Convex Portfolio Optimization16.3 Markowitz's Curse16.4 From Geometric to Hierarchical Relationships16.5 A Numerical Example16.6 Out-of-Sample Monte Carlo Simulations16.7 Further Research16.8 ConclusionAPPENDICES16.A.1 Correlation-based Metric16.A.2 Inverse Variance Allocation16.A.3 Reproducing the Numerical Example16.A.4 Reproducing the Monte Carlo ExperimentExercisesReferencesNotes
17.1 Motivation17.2 Types of Structural Break Tests17.3 CUSUM Tests17.4 Explosiveness TestsExercisesReferences
18.1 Motivation18.2 Shannon's Entropy18.3 The Plug-in (or Maximum Likelihood) Estimator18.4 Lempel-Ziv Estimators18.5 Encoding Schemes18.6 Entropy of a Gaussian Process18.7 Entropy and the Generalized Mean18.8 A Few Financial Applications of EntropyExercisesReferencesBibliographyNote
19.1 Motivation19.2 Review of the Literature19.3 First Generation: Price Sequences19.4 Second Generation: Strategic Trade Models19.5 Third Generation: Sequential Trade Models19.6 Additional Features from Microstructural Datasets19.7 What Is Microstructural Information?ExercisesReferences
20.1 Motivation20.2 Vectorization Example20.3 Single-Thread vs. Multithreading vs. Multiprocessing20.4 Atoms and Molecules20.5 Multiprocessing Engines20.6 Multiprocessing ExampleExercisesReferenceBibliographyNotes
21.1 Motivation21.2 Combinatorial Optimization21.3 The Objective Function21.4 The Problem21.5 An Integer Optimization Approach21.6 A Numerical ExampleExercisesReferences
22.1 Motivation22.2 Regulatory Response to the Flash Crash of 201022.3 Background22.4 HPC Hardware22.5 HPC Software22.6 Use Cases22.7 Summary and Call for Participation22.8 AcknowledgmentsReferencesNotes

Content preview from Advances in Financial Machine Learning

CHAPTER 14 Backtest Statistics

14.1 Motivation

In the previous chapters, we have studied three backtesting paradigms: First, historical simulations (the walk-forward method, Chapters 11 and 12). Second, scenario simulations (CV and CPCV methods, Chapter 12). Third, simulations on synthetic data (Chapter 13). Regardless of the backtesting paradigm you choose, you need to report the results according to a series of statistics that investors will use to compare and judge your strategy against competitors. In this chapter we will discuss some of the most commonly used performance evaluation statistics. Some of these statistics are included in the Global Investment Performance Standards (GIPS),¹ however a comprehensive analysis of performance requires metrics specific to the ML strategies under scrutiny.

14.2 Types of Backtest Statistics

Backtest statistics comprise metrics used by investors to assess and compare various investment strategies. They should help us uncover potentially problematic aspects of the strategy, such as substantial asymmetric risks or low capacity. Overall, they can be categorized into general characteristics, performance, runs/drawdowns, implementation shortfall, return/risk efficiency, classification scores, and attribution.

14.3 General Characteristics

The following statistics inform us about the general characteristics of the backtest:

Time range: Time range specifies the start and end dates. The period used to test the strategy should be sufficiently ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781119482086Purchase book

Advances in Financial Machine Learning

by Marcos Lopez de Prado

CHAPTER 14 Backtest Statistics

14.1 Motivation

14.2 Types of Backtest Statistics

14.3 General Characteristics

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Financial Data Engineering

Quantitative Trading Strategies Using Python: Technical Analysis, Statistical Testing, and Machine Learning

Quantitative Trading, 2nd Edition

Machine Learning for Algorithmic Trading - Second Edition

Publisher Resources

14.1 Motivation

14.2 Types of Backtest Statistics

14.3 General Characteristics

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Financial Data Engineering

Quantitative Trading Strategies Using Python: Technical Analysis, Statistical Testing, and Machine Learning

Quantitative Trading, 2nd Edition

Machine Learning for Algorithmic Trading - Second Edition

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.