38 Preparing for DB2 Near-Realtime Business Intelligence
That is where data integration technology comes into play. This technology
provides data access to much more than relational, or even local, data. Once
federated access is set up for these various data sources, they appear as
relational structures that you can access as if they were part of the data
warehouse - and by using the same SQL language used to access the tables in
the data warehouse.
One example of using this information in an analytical application is in the client
service area. For example, as client satisfaction is analyzed, with Web Services,
you may want to integrate e-mails from that client into the analytical analysis of
the client’s buying history. Instead of somehow bringing all client email text into
the data warehouse, you can use federated access from the data warehouse to
search the email sources.
This can be an effective way to bring realtime data into the warehouse. However,
you must properly analyze to understand the data access requirements of this in
light of federated access - in terms of frequency, result size, response time, and
so forth before you decide to either feed it to the warehouse or to provide
federated access to it.
The first two functions that we discussed, Capture and Deliver, are fairly straight
forward and fairly well understood. These are mature technologies that have
existed in IT for quite a while, but just recently began being applied to data
warehousing. As you move toward the optimum of realtime warehousing, the
area of transforming the data into a form suitable for a data warehouse will be a
very important, and possibly difficult, area.
Transformations cover the gamut of simple data type conversions, to heavy duty
cleansing, to very complex auditing requirements. You have to very carefully
weigh the transformation requirements against the latency requirements. The
smaller the latency required, the fewer transformations you will be able to
accomplish. If you absolutely have to perform a significant amount of data
transformation, then you may have to adjust the latency requirement. For
example, when transformations must be batched to enable appropriate
calculations. You may need to calculate total metrics for the batch to enable
derivation of percent-of-total for each item.
The Transform function is a fairly independent approach from the first two
functions, Capture and Deliver. You can capture and deliver the data to the data
warehouse environment in a near-realtime fashion, but you can also collect or
stage this realtime data and process it in batches. Or you can actually do the
opposite. You may still capture and deliver the data in batches, and then process
the batches as soon as the batch of data arrives. The key is to get the data into