Microsoft SQL Server 2012 Bible
by Adam Jorgensen, Jorge Segarra, Patrick LeBlanc, Jose Chinchilla, Aaron Nelson
Algorithms
When working with data mining, it is useful to understand mining algorithm basics and when to apply each algorithm. Table 57.2 summarizes common algorithms used for the problem categories presented in this chapter's introduction.
Table 57.2 Common Mining Algorithm Usage
| Problem Type | Primary Algorithms |
| Segmentation | Clustering, Sequence Clustering |
| Classification | Decision Trees, Naive Bayes, Neural Network, Logistic Regression |
| Association | Association Rules, Decision Trees |
| Estimation | Decision Trees, Linear Regression, Logistic Regression, Neural Network |
| Forecasting | Time Series |
| Sequence Analysis | Sequence Clustering |
These are guidelines only because not every data mining problem falls into these categories. In addition, there may be other algorithms that you can apply to the listed problem types.
Decision Trees
The decision trees algorithm is the most accurate for many problems. It operates by building a decision tree beginning with the All node, corresponding to all the training cases, as shown in Figure 57.3. Then an attribute is chosen to split those cases into groups, which then separate based on another attribute, and so on. The goal is to generate leaf nodes with a single predictable outcome. For example, if the goal is to identify who will purchase a bike, then leaf nodes should contain cases that are either bike buyers or not bike buyers, but no combinations (or as close to that goal as possible).
Figure 57.3 This is a great example of the decision tree ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access