O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Generating the training datasets

Since we want 80% of our data to be training data, first take all of the sample_bin numbers which lie between the high and low cutoff values. We can define the cutoff range as 20% of the difference between the highest and lowest value of sample_bin.

Set the low cutoff as the lowest value plus the cutoff range defined previously, and the high cutoff as the highest value minus the cutoff range:

#compute the minimum and maximum values of sample bin set.seed(123) sample_bin_min <- as.integer(collect(select(out_sd, min(out_sd$sample_bin)))) sample_bin_max <- as.integer(collect(select(out_sd, max(out_sd$sample_bin))))  Cutoff <- .20*(sample_bin_max - sample_bin_min) Cutoff_low <- sample_bin_min + Cutoff Cutoff_high ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required