CHAPTER 8

Case Study of Rule Data Mining for the S&P 500

This chapter describes a case study in rule data mining, the results of which are reported in Chapter 9. The study evaluates the statistical significance of 6,402 individual TA rules back tested on the S&P 500 Index over the period from November 1, 1980 through July 1, 2005.

DATA MINING BIAS AND RULE EVALUATION

The primary purpose of the case study is to illustrate the application of statistical methods that take into account the effects of data-mining bias. To recap, data mining is a process in which the profitability of many rules is compared so that one or more superior rules can be selected. As pointed out in Chapter 6, this selection process causes an upward bias in the performance of the selected rule(s). In other words, the observed performance of the best rule(s) in the back test overstates its (their) expected performance in the future. This bias complicates the evaluation of statistical significance and may lead a data miner to select a rule with no predictive power (i.e., its past performance was pure luck). This is the fool’s gold of the objective technician.

This problem can be minimized by using specialized statistical-inference tests. The case study illustrates the application of two such methods: an enhanced version of White’s reality check and Masters’s Monte-Carlo permutation method. Both take advantage of a recent improvement,1 which reduces the probability that a good rule will be overlooked (Type II ...

Get Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.