O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Running an alternative model in Python

In this example, we ran a decision tree in R by extracting a sample from the Spark dataframe and running the tree model using base R. While that is perfectly acceptable (since it forced you to think about sampling), in many instances it would be more efficient to run the models directly on the Spark dataframe using a MLlib package or equivalent.

For the version of Spark, you should be working with (2.1); decision tree algorithms are not available to be run under R. Fortunately, native Spark decision trees are already implemented in Python and Scala. We will illustrate the example using Python so that you can see that there are options available. If you will be following algorithm development in Spark ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required