Chapter 6

Identifying Similarities in Data

In This Chapter

arrow Clustering data

arrow Identifying hidden groups of similar information in your data

arrow Finding associations among data items

arrow Organizing data with biologically inspired clustering

There is so much data around us that it can feel overwhelming. Large amounts of information are constantly being generated, organized, analyzed, and stored. Data clustering is the process that can help you make sense of this flood of data by discovering hidden groupings of similar data items. Data clustering provides a description of your data that says, in essence, your data contains x number of groups of similar data objects.

Clustering — in the form of grouping similar things — is part of our daily activities. You use clustering any time you group similar items together. For example, when you store groceries in your fridge, you group the vegetables by themselves in the crisper, put frozen foods in their own section (the freezer), so on. When you organize currency in your wallet, you arrange the bills by denomination — larger with larger, smaller with ...

Get Predictive Analytics For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.