Cluster Analysis Part I: Using K-Means to Segment Your Customer Base

I work in the e-mail marketing industry for a website called MailChimp.com. We help customers send e-mail newsletters to their audience, and every time someone uses the term “e-mail blast,” a little part of me dies.

Why? Because e-mail addresses are no longer black boxes that you lob “blasts” at like flash grenades. No, in e-mail marketing (as with many other forms of online engagement, including tweets, Facebook posts, and Pinterest campaigns), a business receives feedback on how their audience is engaging at the individual level through click tracking, online purchases, social sharing, and so on. This data is not noise. It characterizes your audience. But to the uninitiated, it might as well be Greek. Or Esperanto.

How do you take a bunch of transactional data from your customers (or audience, users, subscribers, citizens, and so on) and use it to understand them? When you're dealing with lots of people, it's hard to understand each customer personally, especially if they all have their own different ways in which they've engaged with you. Even if you could understand everyone at a personal level, that can be tough to act on.

You need to take this customer base and find a happy medium between “blasting” everyone as if they were the same faceless entity and understanding everything about everyone to create personalized marketing for each individual recipient. One way to strike this balance is to use clustering ...

Get Data Smart: Using Data Science to Transform Information into Insight now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.