CHAPTER 8Search for Hidden Groups

“If you mine the data hard enough, you can find messages from God.”

—Dilbert1

Imagine you get a call from a friend. They're looking for help categorizing their music collection—a vintage set of vinyl records. You agree to help.

As you drive to your friend's house, you wonder how you would organize such a collection. You could start with some obvious categories. For instance, music is often organized into genres and subgenres. Or, you could group them by the musical periods in which they came out. This information is readily available on an album cover.

When you arrive at your friend's house, however, you are handed a tall stack of black vinyl records—no album covers to be found.

You learn your friend purchased the stack of records at a yard sale and has no clue which (or how many) genres, artists, or musical periods it contains. Now you must leave your preconceived notions about how to categorize the records at the door—you no longer have predefined groups on the album cover to guide you. The task of categorizing records is suddenly much more difficult than you anticipated.

Determined, you and your friend break out the record player, listen to each album, and start grouping them into categories based on how similar they sound. As you listen to the records, new groups emerge, small groups might combine into one, and occasionally, a record moves from one group to another after a spirited debate about which group it sounds “closest” to.

In the ...

Get Becoming a Data Head now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.