Chapter 7Data as a Signal versus Noise

Introduction

A recent Business Insider article reported that the data spending by financial companies reached a record $7 billion, and is expected to grow further. Even so, the financial firms find themselves struggling to figure out what to do with this data, or even if the data they are buying has much meaning. To address the issue, firms like hedge funds and banks turn to pricey data scientists, employees, and consultants to help them figure out the value of data and data applications. Anecdotally, often the newly hired highly paid data scientist is handed a “bag of data stuff” the company has previously paid to acquire and is asked to tell the firm how to make money from all that.

To someone trained in classical econometrics, the question of “how to make money from all that” can be daunting. Econometrics teaches students how to select a proper distribution and estimation model for the yes/no questions the researcher may have. Econometrics does not, however, work well with open-ended questions of “how” and “why.”

This is where data science comes in quite handy. In particular, Big Data techniques discussed in this chapter can help quickly answer if “the bag of data” is valuable or not.

Random Data Shows in Eigenvalue Distribution

To begin our examination of the data bag, we require several concepts. An n × n symmetric matrix upper A (i.e., ...

Get Big Data Science in Finance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.