Chapter 1

Catching the Data-Mining Train

You’ve picked an exciting moment to become a data miner.

By some estimates, more than 15 exabytes of new data are now produced each year. How much is that? It’s really, ridiculously big — that’s how much! Why is this important? Most organizations have access to only a teeny, tiny fraction of that data, and they aren’t getting much value from what they have.

Data can be a valuable resource for business, government, and nonprofit organizations, but quantity isn’t what’s important about it. A greater quantity of data does not guarantee better understanding or competitive advantage. In fact, used well, a little bit of relevant data provides more value than any poorly used gargantuan database. As a data miner, it’s your mission to make the most of the data you have.

This chapter goes over the basics of data mining. Here I explain what data miners do and the tools and methods they use to do it.

Getting Real about Data Mining

Maybe you’ve heard news reports or ads hinting that all you need to make valuable information pop out like magic is a big database and the latest software. That’s nonsense. Data miners have to work and think to make valuable discoveries.

Maybe you’ve heard that to get results out of your database, you must first hire one of a special breed of people who have nearly super-human knowledge of data, people known to be very expensive, nearly impossible to find, and absolutely necessary to your success. That’s nonsense, too. Data ...

Get Data Mining For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.