76 Patterns: Information Aggregation and Data Integration with DB2 Information Integrator
and manipulate that data. The data may be stored in files and accessed through
file I/O routines or may be stored in a database with more structured and
managed access methods. Although omitted for simplicity of representation, an
Application Server/Services node can be substituted for the Data
Server/Services node where access to the data is provided through an
application API rather than directly to the database management system.
The Population node is a specialized processing node designed and optimized
for reading and writing data from/to data stores and transforming the data, often
in sophisticated ways, as it passes through. Some Population nodes are further
specialized for handling the data under different circumstances, such as efficient
throughput of large batches of records that require extensive transformation, or
for fast throughput of individual records in near real-time.
Multiple data sources may be involved in the base Population runtime pattern
process; and reasonably sophisticated filtering, cleansing, and transformations
may occur within the Population function. The main point is that this process can
occur in a single step.
3.4.3 Population: Multi Step variation pattern
The Application and Runtime patterns for the Population: Multi Step variation
pattern are described here.
Population: Multi Step variation application pattern
Figure 3-8 on page 77 represents the Population: Multi Step variation application
pattern.
Chapter 3. Data Integration and Information Aggregation patterns 77
Figure 3-8 Population: Multi Step variation application pattern
In the Multi Step variation of the Population application pattern, the basic
population function of the Population application pattern is decomposed into its
three primary constituents or steps:
򐂰 Gather
򐂰 Process
򐂰 Apply
The intermediate target data created by one step acts as the source data for the
subsequent step. In some cases, the temporary stores may be physically
Note: We have deliberately avoided using the traditional extract, transform,
and load terminology in order to accommodate the emerging functionality
requirements and variations of population patterns.
LEGEND:
Data sources are represented by disks in three different colors / shades:
Blue / plain: Read/write
Yellow / diagonal hatching: Read-only
Green / vertical hatching: Temporary
Read/write and read-only refer only to the interaction between the overall pattern and that data source
as also indicated in most cases by annotation on the linkages. In general we may assume that the
application associated with a particular data source has read/write access.
A dotted box around an application and source data indicates that the source data may need to be
accessed through the owning application via its API, or may be accessed directly via a database API.
In general, a dotted box around a number of components indicates that we are not specifying which
of those components we are interacting with.
A dashed line, arrow or component indicates an optional component.
Process
Metadata
Apply
Target
Gather
Application
Source
Temporary
store
Temporary
store
Process
Metadata
Apply
Target
Gather
Application
Source
Temporary
store
Temporary
store
Temporary
store
Temporary
store

Get Patterns: Information Aggregation and Data Integration with DB2 Information Integrator now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.