Chapter 3. Data Integration and Information Aggregation patterns 69
A Data Server/Services node is a generic data storage node that provides
managed, persistent storage of any type of data and a means to directly access
and manipulate that data. The data may be stored in files and accessed through
file I/O routines or may be stored in a database with more structured and
managed access methods.
The flow is as follows.
1. A requesting application makes a query of data from the "federated" data
source, for example, a simple SQL Select request.
2. The Data Integration node processes the request, and utilizing its metadata
(which defines the data sources) passes on the requests to the appropriate
data sources.
In many cases, the data integration/federation logic within the Data
Integration node may be logically separate from the data connector logic. This
data connector logic spreads out the overhead of making the query to multiple
data sources, allowing the queries to run in parallel against each database.
When performance is of major concern, multiple logical data connectors may
exist to process queries against a single data source—the idea here being to
prevent any single node in the process from becoming a bottleneck if too
many requests run against one data source.
3. In all cases, the results that are returned from each individual data source
must then be aggregated and normalized by the data integration layer so that
these results appear to be from one "virtual" data source.
4. The results are then sent back to the requesting application, which has no
idea that multiple data sources were involved.
3.3.3 Federation: Cache variation pattern
Figure 3-5 on page 70 represents the Federation: Cache variation pattern.
Note: Although omitted for simplicity of representation, an Application
Server/Services node can be substituted for the Data Server/Services node
where access to the data is provided through an application API rather than
directly to the database management system.