198 High-Function Business Intelligence in e-business
The data is copied into Lotus 1-2-3 and charted as shown in Figure 4-39.
Figure 4-39 September stock prices
4.3.8 Project growth rates of Web hits for capacity planning purposes
An online retailer of books and music recently created a new Web site devoted to
discount home electronics, and wants to assess the rate of increase of the Web
site hit rate during the first few months of operation. Typically, the daily hit rate for
a popular new news site grows rapidly for a while, and so it is natural to try and fit
a power (non-linear) curve to the data.
The objective is to project the growth rate in order to perform capacity planning.
Data
The major attributes in this scenario are the number of hits, and the day of the
hits on the Web site.
Chapter 4. Statistics, analytic, OLAP functions in business scenarios 199
BI functions showcased
REGR_COUNT, REGR_SLOPE, REGR_ICPT
Steps
Consider a non-linear equation of the form:
This is equivalent to:
log y = a log x + log b
It can be represented as shown in Example 4-33.
Example 4-33 Representing a non-linear equation
SELECT
REGR_SLOPE(LOG(y), LOG(x)) AS a,
EXP(REGR_ICPT(LOG(y), LOG(x))) AS b
.......
The aforementioned curve fitting is explored in the query shown in Example 4-34.
Example 4-34 Computer slope and intercept
SELECT
REGR_COUNT(hits,days) AS num_days,
REGR_SLOPE(LOG(hits),LOG(days)) AS a,
EXP(REGR_ICPT(LOG(hits),LOG(days))) AS b
FROM traffic_data
The foregoing query lists the number of non-null pairs of hits and day in the table,
and computes the values of a and b in the foregoing equation.
The results of the foregoing query is shown as follows, and indicates that there
were 100 non-null data points in the table, and that the estimated values for a
and b are 1.9874 and 21.4302 respectively.
num_days a b
------------------- ----------------------- -----------------------
100 1.9874 21.4302
R
2
provides the quality of the curve fitting, and is incorrectly
2
computed using
the SQL shown in Example 4-33.
2
This is the incorrect method because we are computing this function on the transformed data using
logarithmic function, rather than on the untransformed date.
ybx
=
a
200 High-Function Business Intelligence in e-business
Example 4-35 Compute
R
2
SELECT
REGR_COUNT(hits,days) AS num_days,
DECIMAL(REGR_SLOPE(log(hits), log(days)),10,4) AS a,
DECIMAL(EXP(REGR_ICPT(log(hits), log(days))),10,4) AS b,
DECIMAL(REGR_R2(log(hits), log(days)),10,4) AS r2
FROM traffic_data
The results of the foregoing query, using the built-in R2 function, are shown as
follows.
num_days a b r2
-------------- ------------- --------------- ----------------
100 1.9874 21.4302 0.9912
However, in order to correctly compute R
2
for the non-linear fit of the original
untransformed data, the SQL shown in Example 4-36 should be used.
Example 4-36 Correct
R
2
computation on original untransformed data
WITH coeffs(a,b) AS
(
SELECT
REGR_SLOPE(LOG(hits),LOG(days)) AS a,
EXP(REGR_ICPT(LOG(hits),LOG(days))) AS b
FROM traffic_data
),
residuals(days,hits,error) AS
(
SELECT
t.days,t.hits,t.hits - c.b*power(t.days,c.a)
FROM traffic_data t, coeffs c
)
SELECT 1e0-(SUM(error*error)/REGR_SYY(hits,days)) AS rr2
FROM residuals
The result of this query is as follows:
-------------------rr2
+9.55408646608249E-001
Note that the correct value R
2
is 0.955 which is lower than 0.991. This is typical.
Computing of R
2
for the transformed data usually results in an overestimate of
the goodness of fit.
The curve fitting data and R
2
value is shown in Figure 4-40 as charted by
Kaleidograph.

Get DB2 UDB's High-Function Business Intelligence in e-business now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.