CHAPTER 8
Case Study of Rule Data Mining for the S&P 500
This chapter describes a case study in rule data mining, the results of which are reported in Chapter 9. The study evaluates the statistical significance of 6,402 individual TA rules back tested on the S&P 500 Index over the period from November 1, 1980 through July 1, 2005.
DATA MINING BIAS AND RULE EVALUATION
The primary purpose of the case study is to illustrate the application of statistical methods that take into account the effects of data-mining bias. To recap, data mining is a process in which the profitability of many rules is compared so that one or more superior rules can be selected. As pointed out in Chapter 6, this selection process causes an upward bias in the performance of the selected rule(s). In other words, the observed performance of the best rule(s) in the back test overstates its (their) expected performance in the future. This bias complicates the evaluation of statistical significance and may lead a data miner to select a rule with no predictive power (i.e., its past performance was pure luck). This is the fool’s gold of the objective technician.
This problem can be minimized by using specialized statistical-inference tests. The case study illustrates the application of two such methods: an enhanced version of White’s reality check and Masters’s Monte-Carlo permutation method. Both take advantage of a recent improvement,1 which reduces the probability that a good rule will be overlooked (Type II ...