O'Reilly logo

Data Analysis with R - Second Edition by Tony Fischetti

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Checking for unexpected categories

Another data integrity impropriety that is unfortunately very common is the mislabeling of categorical variables. There are two types of mislabeling of categories that can occur: an observation's class is mis-entered/mis-recorded/mistaken for that of another class, or the observation's class is labeled in a way that is not consistent with the rest of the labels. To see an example of what we can do to combat the former case, read assertr vignette. The latter case covers instances where, for example, the species of iris could be misspelled (such as versicolour or verginica) or cases where the pattern established by the majority of class names is ignored (iris setosa, i. setosa, or SETOSA). Either way, these ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required