O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Discovering the important features

We will now introduce the OneR package to discover some of the important features of the dataset. The OneR package will produce a single decision rule for each of the features and then rank them in terms of accuracy. Accuracy is defined as the probability of classifying the outcome correctly and can be expressed as a confusion or error matrix, which we have seen before in the previous chapters. The OneR package has some other nice features, such as the ability to bin integer variables optimally in order to yield the best predictor.

The OneR package does not run natively on Spark, so we first need to use the collect() and sample() functions to perform a 95% sample of the Spark dataframe and then move it to ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required