O'Reilly logo

R Data Analysis Cookbook - Second Edition by Kuntal Ganguly

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Classification with SparkR

Download the data files for this chapter from the book's website, and place the boston-housing-logistic.csv file in your R working directory:

> df <- read.df("boston-housing-logistic.csv", "csv", header = "true", inferSchema = "true", na.strings = "NA")
> traindata <- sample(df,FALSE,0.8)> testdata <- except(df,traindata)> model <- glm(CLASS ~ NOX+DIS+RAD+TAX+PTRATIO,data = traindata, family = "binomial")> predictions <- predict(model, newData = testdata)> head(predictions) NOX DIS RAD TAX PTRATIO B CLASS label prediction1 0.538 4.2579 4 307 21.0 386.75 0 0 0.102006452 0.437 4.2515 5 398 18.7 394.92 1 1 0.778295713 0.871 1.4191 5 403 14.7 172.91 0 0 0.036801774 0.464 4.4290 3 223 18.6 396.90 1 1 0.875031525 0.585 ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required