CHAPTER 11 Data Mining Models in SQL

Data mining is the process of finding meaningful patterns in large quantities of data. Traditionally, the subject is introduced through statistics and statistical modeling. This chapter takes an alternative approach that introduces data mining concepts using databases. This perspective presents the important concepts, sidestepping the rigor of theoretical statistics to focus instead on the most important practical aspect: data.

The next two chapters extend the discussion that this chapter begins. Chapter 12 covers linear regression, a more traditional starting point for modeling and data mining. Chapter 13 focuses on data preparation, often the most challenging part of a data mining endeavor.

Earlier chapters have already shown some powerful techniques implemented using SQL. Snobs may feel that data mining is more advanced than mere SQL queries. This sentiment downplays the importance of data manipulation, which lies at the heart of even the most advanced techniques. Some powerful techniques adapt well to databases, and learning how they work—both in terms of their application to business problems and their implementation on real data—provides a good foundation for understanding modeling. Some techniques do not adapt as well to databases, so they require more specialized software. The fundamental ideas about how to use models and how to evaluate the results remain the same, regardless of the sophistication of the modeling technique.

Earlier ...

Get Data Analysis Using SQL and Excel, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.