Chapter 3Supervised Learning

Introduction

Neural networks, discussed in Chapter 2, may fall into supervised, semi-supervised, or unsupervised categories, depending on their design and, thus, required researcher involvement. In this chapter, we discuss other, non-neural, supervised models and their applications.

Supervised learning (SL) is most akin to econometrics. As such, SL models tend to work with perfectly cleaned and organized data. Of course, financial data rarely come to the analyst in a format perfect for econometrics. Figure 3.1 shows a snippet of trading data logs from a BATS equities exchange. The information shown is neither neat nor accessible, save only with the help of a thick instruction manual.

As a result, legions of quants with their advanced degrees costing billions have been deployed in banks and funds to scrub, polish, and organize data in order to make it presentable to their econometrics-trained portfolio managers. Companies like Bloomberg and Reuters amassed fortunes greater than those of many sovereign funds by processing and reselling financial data in econometrics-friendly formats to hedge funds, pension funds, banks, and endowments.

Cleaning and organizing data are traditional pre-processing tasks required to make decisions based on the data. The use of indexing and structured databases speeds up algorithms that search various database languages and, increasingly, Python.

Data science evolved from a different origin than econometrics. Its appetite ...

Get Big Data Science in Finance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.