Chapter 5. More real-time enterprise enablers 245
5.3 SQL generators and applications
Right-time data warehousing typically implies some type of asynchronous
feeding of the data warehouse from the source systems that will involve some
type of latency greater than zero - with zero being the optimum of real-time. As
you move toward real-time data warehousing, you have to consider how to:
򐂰 Capture the data.
򐂰 Deliver that data to the data warehouse environment.
򐂰 Apply transformations to the data.
򐂰 Apply that data to the data warehouse with concurrent access.
You also have to consider the need to further feed the data to downstream
aggregates or data marts.
In this section we introduce a new acronym, CDTA. It is generated from the ELT
process (extract, load, and transform), and is used to describe the functional
capabilities that will be needed to implement real-time data warehousing. They
are Capture, Deliver, Transform, and Apply. These functions are depicted in
Figure 5-9, as part of the data warehousing process flow. Ideally, these functions
would be designed to be independent of each other. However, in an actual
implementation, some of the functions may be combined, depending on the
technology being implemented.
Figure 5-9 CDTA (ELT) data flow
5.3.1 Capture
There are many techniques that could be employed to capture data as it is
created, some of which are more intrusive to the operational systems than
others. The technologies employed in the capture function affects, or is affected
Data
Warehouse
Data
Source
Data
Source
Deliver
Capture
Apply
ODS
Transform
246 Moving Forward with the On Demand Real-time Enterprise
by, the type of delivery capability that is available. Therefore, the capture function
and the delivery function should go hand-in-hand.
One thing to investigate is whether to use the operational systems log or archive
the changes that occur. Then the information can be captured from those logs or
archives. For example, if the application maintains an audit trail, there could be a
transaction that would be executed and pick up the data changes from the audit
trail and provide them to the delivery function. If not, then perhaps changes could
be made to the operational application system itself. For example, the application
may provide the data changes to a delivery function, such as a message queue.
This is depicted as capability 3 (the number inside the circle) in Figure 5-10.
If a relational DBMS is used for the operational systems, you can take advantage
of capabilities such as triggers, or the DBMS change capture mechanism, to
provide the delivery function with the data changes that occur. Database triggers
work by taking an action whenever specifically defined changes to the data
occur. However, triggers are executed, or fired, during the unit of work for the
transaction that originally changed the data. This could possibly have an impact
on the response time of these transactions. DB2 supports changed data capture
by reading the database logs after the transaction has completed. See capability
1 (the number inside the circle) depicted in Figure 5-10. This will have less of an
impact on the operational systems than the trigger technique. Typically these
changed records will be stored in a relational table, but there is a capability to
place these changes in a message queue. See capability 2 (the number inside
the circle) depicted in Figure 5-10. There are also products from IBM that can
capture changes in other non-relational database systems, such as IMS.
Figure 5-10 Capture function example
Messages
Applications
Messages
Web
Services
Changes
DBMS
Databases
Delivery
1
2
3
4

Get Moving Forward with the On Demand Real-time Enterprise now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.