Chapter 7. Clustering and Segmentation in R

Big Bonanza Warehouse is at the beginning of a big change: they’re going to upgrade their current SAP system to S/4HANA. Furthermore, they’ve decided they will not migrate all of their old data unless necessary. Each department has been tasked with identifying its own crucial data. Rod works as a national account rep and his responsibility is to identify which customers in their system should be migrated. They have decades of customer data, much of which is obsolete.

Rod has long wanted to understand his customers better so this process will be rewarding for him. Which customers are the highest value? Does this exercise entail a simple calculation of the top N sales by customer? Is it the frequency of a customer purchase? Maybe it is a combination of factors. He turns to Duane, his SAP Sales and Distribution Analyst, for suggestions on how to approach this. Duane, having read this book, thinks immediately, “This is a task for clustering and segmentation!”

Clustering is any one of several algorithmic approaches to dividing a dataset into smaller, meaningful groups. There’s no predetermined notion of what dimension (or dimensions) best allow that grouping. Practically speaking, you’ll almost always have some idea what dimension (or features) you want to analyze. For example, we have sales data and you want to know customer value. Well, clearly overall purchase history and dollar value is important. What about the frequency of a customer ...

Get Practical Data Science with SAP now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.