This section describes a generic 10-step quantitative investment process. This process outlines the basic steps that may aid in building quantitative investment models.

37.3.1 Step 1: Gather Data

Any good model begins with having clean input data. The expression “garbage in, garbage out” resonates poignantly for those who have wasted much time struggling with their models, only to discover that the original raw data set had problems. Some common data-set problems are corrupted data or misaligned time stamps.

Data will typically be a function of available resources within the fund or organization, as gathering and cleaning tick data could be the responsibility of another department or a required responsibility of the group. Note that cleaning tick data is much more difficult than gathering, cleaning, and storing daily trade data.

Many vendors offer numerous data services for a fee. Additionally, most exchanges provide prepackaged bundling of data services. If proprietary data exist, such as a difficult-to-obtain information source, chances are likely that the more unique and unmined the data are, the greater the potential benefits. As such, even the source of the input data is proprietary information.

The industry has recently witnessed some models that incorporate news data from social media sources such as Twitter feeds and popular financial blogs, thereby going beyond the traditional news-feed data available from Reuters or Bloomberg. ...

Get CAIA Level II: Advanced Core Topics in Alternative Investments, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.