Usually it's necessary to create some new transformation based on existing variables which will improve a prediction. We have already seen that binning a variable is often done to create a nominal variable from a quantitative one.
Let's create a new column, called agecat, which divides age into two segments. To keep things simple, we will start off by rounding the age to the nearest integer.
filtered <- SparkR::filter(out_sd, "age > 0 AND insulin > 0") filtered$age <- round(filtered$age,0) filtered$agecat <- ifelse(filtered$age <= 35,"<= 35","35 Or Older") SparkR::head(SparkR::select(filtered, "age","agecat"))
In the ...