June 2017
Beginner to intermediate
576 pages
15h 22m
English
Logistic regression in SparkR lacks some of the cross-validation and other features that you may be used to in base R. However, it is a starting point to enable you to start running large-scale models. If you need to employ some of the cross-validation techniques that have already been covered, you can certainly extract a sample of the data (via collect) and run the regression in base R.
However, there are some techniques that you can use to produce pseudo R-Squares and other diagnostics while continuing to work within Spark, which we will demonstrate.