June 2017
Beginner to intermediate
576 pages
15h 22m
English
Now that we have constructed our test and training datasets, we will begin by building a logistic regression model which will predict the outcome 1 or 0. As you will recall, 1 designates diabetes detected, while 0 designates diabetes not detected.
The syntax of a Spark glm is very similar to a normal glm. Specify the model using formula notation. Be sure to specify family = "binomial" to indicate that the outcome variable has only two outcomes:
# run glm model on Training dataset and assign it to object named "model"model <- spark.glm(outcome ~ pregnant + glucose + pressure + triceps + insulin + pedigree + age,family = "binomial", maxIter=100, data = df) summary(model)