Chapter 10. Data Mining Models in SQL

Data mining is the process of finding meaningful patterns in large quantities of data. Traditionally, the subject is introduced through statistics and statistical modeling. This chapter takes an alternative approach that introduces data mining concepts using databases. This perspective presents the important concepts, sidestepping the rigor of theoretical statistics to focus instead on the most important practical aspect: data.

The next two chapters extend the discussion begun in this chapter. Chapter 11 explains linear regression, a more traditional starting point for modeling, from the perspective of data mining. The final chapter focuses on data preparation. Whether the modeling techniques are within a database or in another tool, data preparation is often the most challenging part of a data mining endeavor.

Although earlier chapters have already shown the powerful techniques that are possible using SQL, snobs may feel that data mining is more advanced than mere querying of databases. Such a sentiment downplays the importance of data manipulation, which lies at the heart of even the most advanced techniques. Some powerful techniques adapt well to databases, and learning how they work —both in terms of their application to business problems and their implementation on real data —provides a good foundation for understanding modeling. Some techniques do not adapt as well to databases, so they require more specialized software. However, the fundamental ...

Get Data Analysis Using SQL and Excel now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.