O'Reilly logo

The R Book by Michael J. Crawley

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Classification trees for replicated data

In this next example from plant taxonomy, the response variable is a four-level, categorical variable called Taxon (it is a label expressed as Roman numerals I to IV). The aim is to use the measurements from the seven morphological explanatory variables to construct the best key to separate these four taxa (the ‘best’ key is the one with the lowest error rate – the key that misclassifies the smallest possible number of cases).

taxonomy<-read.table("c:\\temp\\taxonomy.txt",header=T)
attach(taxonomy)
names(taxonomy)

[1] "Taxon" "Petals" "Internode" "Sepal" "Bract" "Petiole"
[7] "Leaf" "Fruit"

Using the tree model for classification could not be simpler:

model1<-tree(Taxon~.,taxonomy)

We begin by looking at the plot of the tree:

plot(model1)
text(model1)

With only a small degree of rounding on the suggested break points, the tree model suggests a simple (and for these 120 plants, completely error-free) key for distinguishing the four taxa:

images

1. Sepal length > 4.0 Taxon IV
1. Sepal length < =4.0 2.
2. Leaf width > 2.0 Taxon III
2. Leaf width < = 2.0 3.
3. Petiole length < 10 Taxon II
3. Petiole length > =10 Taxon I

The summary option for classification trees produces the following:

summary(model1) Classification tree: tree(formula = Taxon ~ ., data = taxonomy) Variables actually used in tree construction: [1] "Sepal" "Leaf" ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required