Chapter 3. Data Integration and Information Aggregation patterns 69
A Data Server/Services node is a generic data storage node that provides
managed, persistent storage of any type of data and a means to directly access
and manipulate that data. The data may be stored in files and accessed through
file I/O routines or may be stored in a database with more structured and
managed access methods.
The flow is as follows.
1. A requesting application makes a query of data from the "federated" data
source, for example, a simple SQL Select request.
2. The Data Integration node processes the request, and utilizing its metadata
(which defines the data sources) passes on the requests to the appropriate
data sources.
In many cases, the data integration/federation logic within the Data
Integration node may be logically separate from the data connector logic. This
data connector logic spreads out the overhead of making the query to multiple
data sources, allowing the queries to run in parallel against each database.
When performance is of major concern, multiple logical data connectors may
exist to process queries against a single data source—the idea here being to
prevent any single node in the process from becoming a bottleneck if too
many requests run against one data source.
3. In all cases, the results that are returned from each individual data source
must then be aggregated and normalized by the data integration layer so that
these results appear to be from one "virtual" data source.
4. The results are then sent back to the requesting application, which has no
idea that multiple data sources were involved.
3.3.3 Federation: Cache variation pattern
Figure 3-5 on page 70 represents the Federation: Cache variation pattern.
Note: Although omitted for simplicity of representation, an Application
Server/Services node can be substituted for the Data Server/Services node
where access to the data is provided through an application API rather than
directly to the database management system.
70 Patterns: Information Aggregation and Data Integration with DB2 Information Integrator
Figure 3-5 Federation: Cache variation application pattern
Local temporary storage can be used to cache data returned from read-only
queries to remote data sources. Under defined circumstances, this cache can be
used to speed up query response time or to compensate for a data source that is
temporarily off line. Such function must be used carefully, however, as the
cached data and its underlying source may no longer be in sync (there may be a
latency involved).
LEGEND:
Data sources are represented by disks in three different colors / shades:
Blue / plain: Read/write
Yellow / diagonal hatching: Read-only
Green / vertical hatching: Temporary
Read/write and read-only refer only to the interaction between the overall pattern and that data source,
as also indicated in most cases by annotation on the linkages. In general we may assume that the
application associated with a particular data source has read/write access.
A dotted box around an application and source data indicates that the source data may need to be
accessed through the owning application via its API, or may be accessed directly via a database API.
In general, a dotted box around a number of components indicates that we are not specifying which
of those components we are interacting with.
A beveled box represents an additional Application pattern.
A dashed line, arrow or component indicates an optional component.
Population
Federation
Metadata
Application
Source /
Target
Application
read only
read/write
Application
Application
Source
Temporary
store
PopulationPopulation
Federation
Metadata
Application
Source /
Target
Application
Source /
Target
Application
read only
read/write
Application
Application
Source
Temporary
store
Temporary
store

Get Patterns: Information Aggregation and Data Integration with DB2 Information Integrator now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.