10. Cleaning and Categorizing the Collection

Standard: 2‐DA‐08: Collect data using computational tools and transform the data to make it more useful and reliable

Down on your hands and knees in the dirt next to the stream, you spend several hours sifting through the pile of rocks, noting their color, weighing them, and trying not to disturb their placement so that you can accurately note their position in the stream. With the help of some obliging frogs who live in the stream, you start building a catalog of the stones as a table, with each stone listed by weight, size, placement, color, and other attributes. Rapidly inputting data, you discover that you have amassed a sizable amount of information. Pausing for breath, you look at your work.

Dismayed, you notice that your table of data has grown large but has gaps, inconsistencies, missing values, and other errors. Maybe you shouldn't have worked so quickly, or maybe you were tired and made mistakes. Maybe the rocks you categorized as brown are only brown when wet but turn a chalky gray when dry.

“Don't worry,” says your Guide, noticing your expression. “There are good ways to clean up your data, and we can help you do it. In the process, you'll also be able to discover some interesting facts about our rock collection and how the stones relate to each other.” The helpful frogs nod wisely, croaking “Rub it, rub it” as an encouragement to you to clean up the data.

Do Some Research

In this chapter, you'll continue to work ...

Get Computer Science for Kids now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.