O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Creating new columns

Usually it's necessary to create some new transformation based on existing variables which will improve a prediction. We have already seen that binning a variable is often done to create a nominal variable from a quantitative one.

Let's create a new column, called agecat, which divides age into two segments. To keep things simple, we will start off by rounding the age to the nearest integer.

filtered <- SparkR::filter(out_sd, "age > 0 AND insulin > 0") filtered$age <- round(filtered$age,0) filtered$agecat <- ifelse(filtered$age <= 35,"<= 35","35 Or Older") SparkR::head(SparkR::select(filtered, "age","agecat")) 

In the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required