June 2017
Beginner to intermediate
576 pages
15h 22m
English
We can see from the prior summary that we have no missing values; however, we can see that there are quite a few variables with values of zero, which do not make any sense. For example, it is impossible to have a zero reading for blood pressure, but it is OK to have a 0 for the number of months pregnant. So, for most of these variables, we will assume that zero was recorded for NAs and we will map the data accordingly:
# we see that there are 0's which are really NA's #some 0's are really NA's, we will change them in Spark # keep pregnant = 0 PimaIndians <- PimaIndiansDiabetes ...