Algorithms
When working with data mining, it is useful to understand mining algorithm basics and when to apply each algorithm. Table 57.2 summarizes common algorithms used for the problem categories presented in this chapter's introduction.
Problem Type | Primary Algorithms |
Segmentation | Clustering, Sequence Clustering |
Classification | Decision Trees, Naive Bayes, Neural Network, Logistic Regression |
Association | Association Rules, Decision Trees |
Estimation | Decision Trees, Linear Regression, Logistic Regression, Neural Network |
Forecasting | Time Series |
Sequence Analysis | Sequence Clustering |
These are guidelines only because not every data mining problem falls into these categories. In addition, there may be other algorithms that you can apply to the listed problem types.
Decision Trees
The decision trees algorithm is the most accurate for many problems. It operates by building a decision tree beginning with the All node, corresponding to all the training cases, as shown in Figure 57.3. Then an attribute is chosen to split those cases into groups, which then separate based on another attribute, and so on. The goal is to generate leaf nodes with a single predictable outcome. For example, if the goal is to identify who will purchase a bike, then leaf nodes should contain cases that are either bike buyers or not bike buyers, but no combinations (or as close to that goal as possible).
Get Microsoft SQL Server 2012 Bible now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.