WORKING WITH DECISION TREES

When you work with decision trees, some validation of the data is always needed in order to ensure that the patterns you identify in the data are generic for all customers within your scope, not just a local phenomenon in the data you analyze caused by sampling. The next list describes different ways you can validate your decision tree. Although some of the methods might seem very complex, they are standard functionalities in professional data mining tools. The first one, however, is the best one and known as common sense:

  • Ask subject matter experts to make a sanity check on your model.
  • Use other algorithms to find the same results.
  • Split the data in two, and then see if the same results come out in both cases.
  • Split the data in two, and then see whether the decision tree developed via the first half of the data could predict the buyers in the second half of the data.

For further information about data mining and how to validate data, see Chapter 4 of my previous book, Business Analytics for Managers, which provides a detailed discussion about the difference between data mining and statistics.1

Very often manually guiding the decision tree is also recommended, which means that it is the analyst, not the algorithm, who makes the split in the decision tree. You can discuss whether this is a relevant way of dividing the customers with subject matter experts as you are developing the decision tree. Perhaps you want to make the first split on turnover, since ...

Get Business Analytics for Sales and Marketing Managers: How to Compete in the Information Age now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.