Business Analytics for Sales and Marketing Managers: How to Compete in the Information Age
by Gert H.N. Laursen
WORKING WITH DECISION TREES
When you work with decision trees, some validation of the data is always needed in order to ensure that the patterns you identify in the data are generic for all customers within your scope, not just a local phenomenon in the data you analyze caused by sampling. The next list describes different ways you can validate your decision tree. Although some of the methods might seem very complex, they are standard functionalities in professional data mining tools. The first one, however, is the best one and known as common sense:
- Ask subject matter experts to make a sanity check on your model.
- Use other algorithms to find the same results.
- Split the data in two, and then see if the same results come out in both cases.
- Split the data in two, and then see whether the decision tree developed via the first half of the data could predict the buyers in the second half of the data.
For further information about data mining and how to validate data, see Chapter 4 of my previous book, Business Analytics for Managers, which provides a detailed discussion about the difference between data mining and statistics.1
Very often manually guiding the decision tree is also recommended, which means that it is the analyst, not the algorithm, who makes the split in the decision tree. You can discuss whether this is a relevant way of dividing the customers with subject matter experts as you are developing the decision tree. Perhaps you want to make the first split on turnover, since ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access