6.4 DECISION TREES

6.4.1 Overview

It is often necessary to ask a series of questions before coming to a decision. The answers to one question may lead to another question or may lead to a decision being reached. For example, you may visit a doctor and your doctor may ask you to describe your symptoms. You respond by saying you have a stuffy nose. In trying to diagnose your condition the doctor may ask you further questions such as whether you are suffering from extreme exhaustion. Answering yes would suggest you have the flu, whereas answering no would suggest you have a cold. This line of questioning is common to many decision making processes and can be shown visually as a decision tree, as shown in Figure 6.31.

Decision trees are often generated by hand to precisely and consistently define a decision making process. However, they can also be generated automatically from the data. They consist of a series of decision points based on certain variables. Figure 6.32 illustrates a simple decision tree. This decision tree was generated based on a data set of cars which included a variable number of cylinders (Cylinders) along with the car fuel efficiency (MPG). The decision tree attempts to group cars based on the number of cylinders (Cylinders) in order to classify the observations according to their fuel efficiency. At the top of the tree is a node representing the entire data set of 392 observations (Size = 392). The data set is initially divided into two subsets, on the left of ...

Get Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.