Now we can introduce some variation into the dataset by adding a random percentage of each variable to itself. This is done by first sampling from the error distribution (x).
Note that we need to preface base: to the sample function, since the base sample function has a different syntax from the Spark sample function. If you do not do this, you will get an error:
# alter the test data set by sampling from the 'x' distribution and adding or subtracting the introduced error adjustment. test$age = test$age + test$age*base::sample(x, 1, replace = FALSE, prob = NULL) test$pregnant = test$pregnant + test$pregnant*base::sample(x, 1, replace = FALSE, prob = NULL) test$glucose = test$glucose + test$glucose*base::sample(x, ...