We now have two separate Spark dataframes corresponding to positive and negative cases. For certain types of analysis, it would make sense to keep the outcomes separate; however, for illustration purposes, we will combine them into one single dataset using the unionAll() function.
out_sd <- unionAll(out_sd1, out_sd2) nrow(out_sd)
The output from nrow indicates a total of 768,000 rows. This number represents our original 768 rows which has been multiplied by a factor of 1000: