544 Delivering Business Intelligence with Microsoft SQL Server 2005
Data Mining Extensions
In Chapter 11, we learned about the MDX query language and its use in querying
OLAP cubes. Data mining structures also have their own query language, known as
Data Mining Extensions (DMX).
In this section, we look at the most prominent aspect of DMX, the prediction
query. The prediction query provides a means to have a data mining model create
predictions based on rows in another table. In fact, you already used DMX prediction
queries when you used the Mining Model Prediction tab.
Figure 14-23 The prediction join query result set with filter criteria
Chapter 14: Spelunking—Exploration Using Data Mining 545
Prediction Query Syntax
The prediction query looks much like a SELECT statement in T-SQL. In fact, it has
many of the same clauses as a SELECT statement. We look at each of those clauses
and how they are used.
The SELECT clause in a DMX query serves the same function as the SELECT
clause in a T-SQL statement. The SELECT clause speciﬁ es what ﬁ elds or
calculations are going to be included as columns in the result set. This is done by
placing a ﬁ eld list following the word “SELECT”. For example:
SELECT Fname, Lname, Address1, City, State_Province, Postal_Code
The SELECT statement has two special keywords that may precede the ﬁ eld list:
FLATTENED and TOP n. The FLATTENED keyword causes the prediction query to
return a ﬂ attened result set of rows and columns. When the FLATTENED keyword
is not used, the SELECT statement may return a hierarchical result set.
The TOP n keyword works the same here as it does in the T-SQL SELECT
statement: it limits the number of rows in the result set to the number speciﬁ ed
for n. We can order the result set, perhaps by a probability in descending order,
and then use the TOP n keywords to include only the rows with the highest
Figure 14-24 The save data mining query result dialog box
546 Delivering Business Intelligence with Microsoft SQL Server 2005
The FROM clause speciﬁ es the mining model being used in the query. For example:
FROM [Neural Network – Children At Home]
PREDICTION JOIN Clause
The PREDICTION JOIN clause lets us feed information into the inputs of the
mining model. The content or absence of this clause determines the type of
prediction query being run. See the section “Types of Prediction Queries” for details
on the syntax of the PREDICTION JOIN clause.
Figure 14-25 A DMX prediction join query
Chapter 14: Spelunking—Exploration Using Data Mining 547
The WHERE clause enables us to ﬁ lter the result of the SELECT statement. The
ﬁ lter conditions in the WHERE clause are applied after the predictions are made by
the mining model. Therefore, we can use the prediction result as part of the ﬁ lter.
For example, if the Neural Network - Children At Home mining model is predicting
Num Children At Home, we can create a ﬁ lter to give us only those records where
Num Children At Home was predicted to be 0:
WHERE [Neural Network – Children At Home].[Num Children At Home] = 0
ORDER BY Clause
The ORDER BY clause lets us sort the result set: we can sort in either ascending or
descending order. Adding DESC to the clause provides a descending sort. Adding
ASC to the clause provides an ascending sort. ASC is the default sort order when
neither ASC nor DESC is speciﬁ ed.
The ORDER BY clause in T-SQL includes a comma-separated list of ﬁ elds
to provide the sort columns. The ORDER BY clause in DMX is a bit different. It
allows only a single expression. If we want to sort by more than one column, we
need to create an expression concatenating these two columns, as follows:
ORDER BY [Lname] + [Fname] ASC
Types of Prediction Queries
Several types of prediction queries are available to us. As stated earlier, the format or
absence of the PREDICTION JOIN clause determines the type of prediction query
being created. Let’s look at the format and operation of each type.
The standard prediction join query enables us to take records from another data
source and feed those results into our mining model to create predictions. For this
to happen, we must specify where the joined data is coming from. We also need to
specify how the columns in the data source are mapped to the input columns of the
A typical PREDICTION JOIN clause looks like this:
OPENQUERY([Max Min Sales DM],