10.3. Microsoft Data Mining Algorithms

Data mining algorithms are the logic used to create the mining models. Several standard algorithms in the data mining community have been carefully tested and honed over time. One of the algorithms used to calculate decision trees uses a Bayesian method to determine the score used to split the branches of the tree. The roots of this method (so to speak) trace back to its namesake, Thomas Bayes, who first established a mathematical basis for probability inference in the 1700s.

The Data Mining group at Microsoft has been working diligently to expand the number of algorithms offered in SQL Server 2005 and to improve their accuracy. SQL Server Data Mining includes seven algorithms that cover a large percentage of the common data mining application areas. The seven core algorithms are:

  • Decision Trees (and Linear Regression)

  • Naïve Bayes

  • Clustering

  • Sequence Clustering

  • Time Series

  • Association

  • Neural Network (and Logistic Regression)

The two regression algorithms set parameters on the main algorithm to generate the regression results. Some of these higher-level algorithms include parameters the data miner can use to choose from several underlying algorithms to generate the model. If you plan to do serious data mining, you need to know what these algorithms are and how they work so you can apply them to the appropriate problems and are able to get the best performance. We briefly describe each of these algorithms in the following list. The Books Online topic ...

Get The Microsoft® Data Warehouse Toolkit: With SQL Server™ 2005 and the Microsoft® Business Intelligence Toolset now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.