CHAPTER 3 Labeling

3.1 Motivation

In Chapter 2 we discussed how to produce a matrix X of financial features out of an unstructured dataset. Unsupervised learning algorithms can learn the patterns from that matrix X, for example whether it contains hierarchical clusters. On the other hand, supervised learning algorithms require that the rows in X are associated with an array of labels or values y, so that those labels or values can be predicted on unseen features samples. In this chapter we will discuss ways to label financial data.

3.2 The Fixed-Time Horizon Method

As it relates to finance, virtually all ML papers label observations using the fixed-time horizon method. This method can be described as follows. Consider a features matrix X with I rows, {Xi}i = 1, …, I, drawn from some bars with index t = 1, …, T, where IT. Chapter 2, Section 2.5 discussed sampling methods that produce the set of features {Xi}i = 1, …, I. An observation Xi is assigned a label yi ∈ { − 1, 0, 1},

numbered Display Equation

where τ is a pre-defined constant threshold, ti, 0 is the index of the bar immediately after Xi takes place, ti, 0 + h is the index of the h-th bar after ti, 0, and is the price return over a bar horizon h,

Because the literature almost always works with time bars, h implies a fixed-time horizon. ...

Get Advances in Financial Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.