O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Generating the new test data with errors

Now we can introduce some variation into the dataset by adding a random percentage of each variable to itself. This is done by first sampling from the error distribution (x).

Note that we need to preface base: to the sample function, since the base sample function has a different syntax from the Spark sample function. If you do not do this, you will get an error:

 # alter the test data set by sampling from the 'x' distribution and adding or subtracting the introduced error adjustment. test$age = test$age + test$age*base::sample(x, 1, replace = FALSE, prob = NULL) test$pregnant = test$pregnant + test$pregnant*base::sample(x, 1, replace = FALSE, prob = NULL) test$glucose = test$glucose + test$glucose*base::sample(x, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required