Data scrubbing
It is a commonly repeated statistic that at least 80 percent of a data scientist's work is data scrubbing. This is the process of detecting potentially corrupt or incorrect data and either correcting or filtering it out.
Note
Data scrubbing is one of the most important (and time-consuming) aspects of working with data. It's a key step to ensuring that subsequent analysis is performed on data that is valid, accurate, and consistent.
The nil value at the end of the election year column may indicate dirty data that ought to be removed. We've already seen that filtering columns of data can be accomplished with Incanter's i/$ function. For filtering rows of data we can use Incanter's i/query-dataset function.
We let Incanter know which rows ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access