Performing the analyses in R

Now that we have our data ready, we will focus on performing the analyses in R.

Classification with C4.5

We will first predict the income of the participants using C4.5.

The unpruned tree

We will start by examining the unpruned tree. This is configured using the Weka_Control(U= TRUE). J48() argument in RWeka, which uses the formula notation we have seen previously. The dot (.) after the tilde indicates that all attributes except the class attribute have to be used. We used the control argument to tell R that we want an unpruned tree (we will discuss pruning later):

C45tree = J48(income ~ . , data= AdultTrain,
   control= Weka_control(U=TRUE))

You can examine the tree by typing:


We will not display it here as it is very ...

Get R: Predictive Analysis now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.