CHAPTER 13 Building Customer Signatures for Further Analysis

The combination of SQL and Excel is powerful for manipulating data, visualizing trends, exploring interesting features, and finding patterns. However, SQL is still a language designed for data access, and Excel is still a spreadsheet designed for investigating relatively small amounts of data. Although powerful, the combination has its limits.

The solution is to use more powerful data mining and statistical tools, such as SAS, SPSS, R, and Python (among others) or even special purpose code. Assuming that the source data resides in a relational database, SQL still plays an important role in transforming it into the format needed for further analysis. Even NoSQL databases often use SQL-like syntax for accessing and processing data.

Preparing data for such applications is where customer signatures fit in. A customer signature contains summarized attributes of customers, putting important information in one place. This is useful both for building models and for scoring them, as well as for reporting and ad hoc analyses. The model sets discussed in the previous two chapters are examples of customer signature tables.

Signatures are useful beyond sophisticated modeling, having their roots in customer information files developed for reporting purposes. However, signatures are summaries designed for analytic purposes rather than reporting purposes, taking special care with regards to the naming of columns, the time frame of ...

Get Data Analysis Using SQL and Excel, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.