March 2018
Beginner to intermediate
570 pages
13h 42m
English
Another data integrity impropriety that is unfortunately very common is the mislabeling of categorical variables. There are two types of mislabeling of categories that can occur: an observation's class is mis-entered/mis-recorded/mistaken for that of another class, or the observation's class is labeled in a way that is not consistent with the rest of the labels. To see an example of what we can do to combat the former case, read assertr vignette. The latter case covers instances where, for example, the species of iris could be misspelled (such as versicolour or verginica) or cases where the pattern established by the majority of class names is ignored (iris setosa, i. setosa, or SETOSA). Either way, these ...