June 2017
Beginner to intermediate
576 pages
15h 22m
English
Since we want 80% of our data to be training data, first take all of the sample_bin numbers which lie between the high and low cutoff values. We can define the cutoff range as 20% of the difference between the highest and lowest value of sample_bin.
Set the low cutoff as the lowest value plus the cutoff range defined previously, and the high cutoff as the highest value minus the cutoff range:
#compute the minimum and maximum values of sample bin set.seed(123) sample_bin_min <- as.integer(collect(select(out_sd, min(out_sd$sample_bin)))) sample_bin_max <- as.integer(collect(select(out_sd, max(out_sd$sample_bin)))) Cutoff <- .20*(sample_bin_max - sample_bin_min) Cutoff_low <- sample_bin_min + Cutoff Cutoff_high ...