O'Reilly logo

Data Analysis with R - Second Edition by Tony Fischetti

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Summary

Messy data, no matter what definition you use, presents a huge roadblock for people who work with data. This chapter focused on two of the most notorious and prolific culprits: missing data and data that has not been cleaned or audited for quality.

On unsanitized data, we saw that the perhaps optimal solution (visually auditing the data) was untenable for moderately sized datasets or larger. We discovered that the grammar of the package assertr provides a mechanism to offload this auditing process to R. You now have a few assertr checking recipes under your belt for some of the more common manifestations of the mistakes that plague data that have not been scrutinized.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required