Chapter 3

Data Mining Processes and Knowledge Discovery

In order to conduct data mining analysis, a general process is useful. This chapter describes an industry standard process, which is often used, and a shorter vendor process. While each step is not needed in every analysis, this process provides a good coverage of the steps needed, starting with data exploration, data collection, data processing, analysis, inferences drawn, and implementation.

There are two standard processes for data mining that have been presented. CRISP-DM (cross-industry standard process for data mining) is an industry standard, and SEMMA (sample, explore, modify, model, and assess) was developed by the SAS Institute Inc., a leading vendor of data mining software (and ...

Get Data Mining Models, Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.