294 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
8.2.2 ETL application architecture and process schedule
The more data that exists in the data warehouse, the more actionable insight you
can gain by using the tools discussed in this book. Issuing a BACKUP command
that coincides with the completion of an extract-transform-load (ETL) cycle is
highly productive; the recovery time will be quicker because the restore requires
fewer transaction logs to be replayed.
Make assumptions at the upper end of any range given for the volume of data to
be loaded for a given day; your backup and recovery strategy has to be able to
cater for peak usage and for predicted data growth into the future.
Be aware of any features of the ETL application. For example, if the ETL
application can identify and reprocess source data that has already been
processed, then a recovery scenario might require a restore to end of backup
followed by the reprocessing of source data by the ETL application. This reduces
the need to restore and replay transaction logs, which can reduce outage time.
Similarly, some ETL applications in certain industry sectors have the capability of
simulating or regenerating data where data loss occurs or using the ETL process
to recover lost data from source files saved during the initial data load process.
Data loss
1. Use Recovery Expert for individual
DML error and to help determine root
cause and recovery paths.
2. Use Optim High Performance Unload
to unload data from backup image
and stream to table. Use Detach and
Attach to replace erroneous data
3. Perform full table or table space
recovery as appropriate and inform
business of outage.
򐂰 Low: Invalid data for
a period of time.
Online recovery.
򐂰 High: Table space is
offline for period of
Data loss not
n minutes
(Use test
environment to
restore speed.)
1. Implement log shipping.
2. Use full database backup image as
prepared by DB2 Merge Backup to
restore to secondary system. Make
available to business and replay ETL.
򐂰 High: System
unavailable until
restore or ETL replay
򐂰 Low: Some data loss
allowed. ETL replay
to be used to regain
data lost from last
merged backup.
n minutes
(Frequent tests of
DR procedures
should take
Recovery method Impact RTO/RPO

Get Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.