CHAPTER 2Taming Big Data

Rado Lipuš and Daryl Smith


Around 20 years ago alternative data and machine learning techniques were being used by a select group of innovative hedge funds and asset managers. In recent years, however, both the number of fund managers using alternative data and the supply of new commercially available data sources have dramatically increased.

We have identified over 600 alternative datasets which have become commercially available in the past few years. Currently, around 40 new and thoroughly vetted alternative datasets are added to the total number of alternative datasets on the Neudata platform per month. We expect the total number of datasets to increase steadily over the next few years as (i) more data exhaust firms monetize their existing data, and (ii) new and existing start‐ups enter the space with fresh and additional alternative data offerings.

2.1.1 Definition: Why ‘alternative’? Opposition with conventional

For the uninitiated, the term ‘alternative data’ refers to novel data sources which can be used for investment management analysis and decision‐making purposes in quantitative and discretionary investment strategies. Essentially, alternative data refers to data which was, in the main, created in the past seven years and which until very recently has not been available to the investment world. In some cases, the original purpose for creating alternative data was to provide an analysis tool ...

Get Big Data and Machine Learning in Quantitative Investment now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.