Chapter 6Big Data Factor Models

As shown in Chapter 5, unsupervised learning delivers essential cleansing of the data, separating “macro” signal components common to all the dataset elements from the idiosyncratic noise of individual data constituents. The singular vectors produced by SVD are orthogonal by design and also serve as data factors.

Factoring financial data has been widely accepted and practiced. Sharpe's (1964) and Lintner's (1965) Capital Asset Pricing Model (CAPM) factorizes returns of financial instruments vis-à-vis market returns. Ross's (1976; 1977) Arbitrage Pricing Theory (APT) factors financial returns over a wide spectrum of diverse explanatory variables. The famed Fama-French factors (Fama and French 1992) explain securities returns by stock characteristics. Harvey, Liu, and Zhu (2016) point out that, over the years, researchers in finance have come up with over 300 factors capable of explaining various aspects of financial returns, all published in the financial research literature. The sheer number of proposed financial factors led Cochrane (2011) to refer to the multitude of variables as a “factor zoo” and to question various factors' validity and relative importance.

As this chapter shows, unsupervised learning techniques deliver the optimal factorization. As such, SVD and PCA are also perfectly positioned to sort through the “zoo” in a fast and efficient manner, extracting most meaningful factors from the explanatory variables proposed to date.

Get Big Data Science in Finance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.