Chapter 1. A Data Miner Looks at SQL

Everywhere data is being collected, every transaction, every web page visit, every payment —all these and much, much more are filling relational databases with raw data. Computing power and storage have been growing more cost effective over the past decades, a trend destined to continue in the future. Databases are no longer merely a platform for storing data. They are increasingly becoming powerful engines for transforming data into information, useful information about customers and products and business practices.

The focus on data mining has historically been on complex algorithms developed by statisticians and machine learning specialists. Not too long ago, data mining required downloading source code from a research lab or university, compiling the code to get it to run, sometimes even debugging it. By the time the data and software were ready, the business problem had lost urgency.

This book takes a different approach because it starts with the data. The billions of transactions that occur every day —credit cards swipes, web page visits, telephone calls, and so on —are now almost always stored in relational databases. This technology, which was only invented in the 1970s, is now the storehouse of the mountains of data available to businesses. Relational database engines count among the most powerful and sophisticated software products in the business world, so they are well suited for the task of extracting useful information.

The ...

Get Data Analysis Using SQL and Excel now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.