6.4. Is There Any Hope for Data Miners?

The central problem in mining financial data like this is that the market has only one past. This will not go away. Some forecasters just ignore this fact, dive in, and hope for the best. This makes about as much sense as the "butter in Bangladesh" story. It would be a healthy idea to take measures to mitigate the risk of data mining. Here are a few suggestions:

  • Avoid the other pitfalls of investment simulations. These include survivor bias, look-ahead bias, use of revised data not available at the time of the forecasts, ignoring transaction costs, and liquidity constraints. There are many ways to fool yourself, even before you set foot in the data mine.[]

  • Use holdback samples, temporal and cross-sectional. Reserve some of the data for out-of-sample testing. This can be hard to do when the history is short, or the frequency is low as is the case for monthly data. Be cautious about going to the holdback set, since with each new visit, you are mining that as well. This approach to temporal holdback samples is easier with higher-frequency data, such as daily information or ticks. In these cases, a three-level holdback protocol using in-sample, out-of-sample, and in-the-vault out-of-sample can be (and is) used.

    When there are multiple securities to analyze, you can also hold back a cross-sectional sample. As an example, if you were developing a model to forecast individual stock returns, keeping back all the stocks with symbols in the second half ...

Get Nerds on Wall Street: Math, Machines, and Wired Markets now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.