book

Advances in Financial Machine Learning

by Marcos Lopez de Prado

February 2018

Intermediate to advanced

400 pages

10h 17m

English

Wiley

Audiobook available

Read now

Unlock full access

1.1 Motivation1.2 The Main Reason Financial Machine Learning Projects Usually Fail1.3 Book Structure1.4 Target Audience1.5 Requisites1.6 FAQs1.7 AcknowledgmentsExercisesReferencesBibliographyNotes
2.1 Motivation2.2 Essential Types of Financial Data2.3 Bars2.4 Dealing with Multi-Product Series2.5 Sampling FeaturesExercisesReferences
3.1 Motivation3.2 The Fixed-Time Horizon Method3.3 Computing Dynamic Thresholds3.4 The Triple-Barrier Method3.5 Learning Side and Size3.6 Meta-Labeling3.7 How to Use Meta-Labeling3.8 The Quantamental Way3.9 Dropping Unnecessary LabelsExercisesBibliographyNote
4.1 Motivation4.2 Overlapping Outcomes4.3 Number of Concurrent Labels4.4 Average Uniqueness of a Label4.5 Bagging Classifiers and Uniqueness4.6 Return Attribution4.7 Time Decay4.8 Class WeightsExercisesReferencesBibliography
5.1 Motivation5.2 The Stationarity vs. Memory Dilemma5.3 Literature Review5.4 The Method5.5 Implementation5.6 Stationarity with Maximum Memory Preservation5.7 ConclusionExercisesReferencesBibliography
6.1 Motivation6.2 The Three Sources of Errors6.3 Bootstrap Aggregation6.4 Random Forest6.5 Boosting6.6 Bagging vs. Boosting in Finance6.7 Bagging for ScalabilityExercisesReferencesBibliographyNotes

7.1 Motivation7.2 The Goal of Cross-Validation7.3 Why K-Fold CV Fails in Finance7.4 A Solution: Purged K-Fold CV7.5 Bugs in Sklearn's Cross-ValidationExercisesBibliography
8.1 Motivation8.2 The Importance of Feature Importance8.3 Feature Importance with Substitution Effects8.4 Feature Importance without Substitution Effects8.5 Parallelized vs. Stacked Feature Importance8.6 Experiments with Synthetic DataExercisesReferencesNote
9.1 Motivation9.2 Grid Search Cross-Validation9.3 Randomized Search Cross-Validation9.4 Scoring and Hyper-parameter TuningExercisesReferencesBibliographyNotes
10.1 Motivation10.2 Strategy-Independent Bet Sizing Approaches10.3 Bet Sizing from Predicted Probabilities10.4 Averaging Active Bets10.5 Size Discretization10.6 Dynamic Bet Sizes and Limit PricesExercisesReferencesBibliographyNotes
11.1 Motivation11.2 Mission Impossible: The Flawless Backtest11.3 Even If Your Backtest Is Flawless, It Is Probably Wrong11.4 Backtesting Is Not a Research Tool11.5 A Few General Recommendations11.6 Strategy SelectionExercisesReferencesBibliographyNote
12.1 Motivation12.2 The Walk-Forward Method12.3 The Cross-Validation Method12.4 The Combinatorial Purged Cross-Validation Method12.5 How Combinatorial Purged Cross-Validation Addresses Backtest OverfittingExercisesReferences
13.1 Motivation13.2 Trading Rules13.3 The Problem13.4 Our Framework13.5 Numerical Determination of Optimal Trading Rules13.6 Experimental Results13.7 ConclusionExercisesReferencesNotes
14.1 Motivation14.2 Types of Backtest Statistics14.3 General Characteristics14.4 Performance14.5 Runs14.6 Implementation Shortfall14.7 Efficiency14.8 Classification Scores14.9 AttributionExercisesReferencesBibliographyNotes
15.1 Motivation15.2 Symmetric Payouts15.3 Asymmetric Payouts15.4 The Probability of Strategy FailureExercisesReferences
16.1 Motivation16.2 The Problem with Convex Portfolio Optimization16.3 Markowitz's Curse16.4 From Geometric to Hierarchical Relationships16.5 A Numerical Example16.6 Out-of-Sample Monte Carlo Simulations16.7 Further Research16.8 ConclusionAPPENDICES16.A.1 Correlation-based Metric16.A.2 Inverse Variance Allocation16.A.3 Reproducing the Numerical Example16.A.4 Reproducing the Monte Carlo ExperimentExercisesReferencesNotes
17.1 Motivation17.2 Types of Structural Break Tests17.3 CUSUM Tests17.4 Explosiveness TestsExercisesReferences
18.1 Motivation18.2 Shannon's Entropy18.3 The Plug-in (or Maximum Likelihood) Estimator18.4 Lempel-Ziv Estimators18.5 Encoding Schemes18.6 Entropy of a Gaussian Process18.7 Entropy and the Generalized Mean18.8 A Few Financial Applications of EntropyExercisesReferencesBibliographyNote
19.1 Motivation19.2 Review of the Literature19.3 First Generation: Price Sequences19.4 Second Generation: Strategic Trade Models19.5 Third Generation: Sequential Trade Models19.6 Additional Features from Microstructural Datasets19.7 What Is Microstructural Information?ExercisesReferences
20.1 Motivation20.2 Vectorization Example20.3 Single-Thread vs. Multithreading vs. Multiprocessing20.4 Atoms and Molecules20.5 Multiprocessing Engines20.6 Multiprocessing ExampleExercisesReferenceBibliographyNotes
21.1 Motivation21.2 Combinatorial Optimization21.3 The Objective Function21.4 The Problem21.5 An Integer Optimization Approach21.6 A Numerical ExampleExercisesReferences
22.1 Motivation22.2 Regulatory Response to the Flash Crash of 201022.3 Background22.4 HPC Hardware22.5 HPC Software22.6 Use Cases22.7 Summary and Call for Participation22.8 AcknowledgmentsReferencesNotes

Content preview from Advances in Financial Machine Learning

CHAPTER 4 Sample Weights

4.1 Motivation

Chapter 3 presented several new methods for labeling financial observations. We introduced two novel concepts, the triple-barrier method and meta-labeling, and explained how they are useful in financial applications, including quantamental investment strategies. In this chapter you will learn how to use sample weights to address another problem ubiquitous in financial applications, namely that observations are not generated by independent and identically distributed (IID) processes. Most of the ML literature is based on the IID assumption, and one reason many ML applications fail in finance is because those assumptions are unrealistic in the case of financial time series.

4.2 Overlapping Outcomes

In Chapter 3 we assigned a label y_i to an observed feature X_i, where y_i was a function of price bars that occurred over an interval [t_{i, 0}, t_{i, 1}]. When t_{i, 1} > t_{j, 0} and i < j, then y_i and y_j will both depend on a common return , that is, the return over the interval [t_{j, 0}, min{t_{i, 1}, t_{j, 1}}]. The implication is that the series of labels, {y_i}_{i = 1, …, I}, are not IID whenever there is an overlap between any two consecutive outcomes, ∃i|t_{i, 1} > t_{i + 1, 0}..

Suppose that we circumvent this problem by restricting the bet horizon to t_{i, 1} ≤ t_{i + 1, 0}. In this case there is no overlap, because every feature outcome is determined before or at the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Publisher Resources

ISBN: 9781119482086Purchase book

Advances in Financial Machine Learning

by Marcos Lopez de Prado

CHAPTER 4 Sample Weights

4.1 Motivation

4.2 Overlapping Outcomes

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Financial Data Engineering

Quantitative Trading Strategies Using Python: Technical Analysis, Statistical Testing, and Machine Learning

Quantitative Trading, 2nd Edition

Machine Learning for Algorithmic Trading - Second Edition

Publisher Resources

4.1 Motivation

4.2 Overlapping Outcomes

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Financial Data Engineering

Quantitative Trading Strategies Using Python: Technical Analysis, Statistical Testing, and Machine Learning

Quantitative Trading, 2nd Edition

Machine Learning for Algorithmic Trading - Second Edition

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.