Chapter 4. IBM technologies supporting real-time 181
palette helps developers diagram the flow of data through their environment via
GUI-driven drag-and-drop design components. Developers also benefit from
scripting language, debugging capabilities, and an open application
programming interface (API) for leveraging external code. The WebSphere
DataStage Designer tool is depicted in Figure 4-46.
Figure 4-46 WebSphere DataStage Designer
4.3.3 WebSphere ProfileStage
WebSphere ProfileStage allows users to integrate multiple disparate systems by
providing a complete understanding of the meta data, and by discovering
dependencies within and across tables and databases. Because the meta data is
based upon the actual source data, accuracy is nearly 100%, reducing the
project risk by uncovering integration issues before development begins.
WebSphere ProfileStage brings automation to the critical and fundamental task
of data source analysis - expediting comprehensive data analysis, reducing the
time-to-market, and minimizing overall costs and resources for critical data
integration projects. It profiles source data - analyzing column values and
structures - and provides target database recommendations, such as primary
keys, foreign keys, and table normalizations. Armed with this information, it
builds a model of the data to facilitate the source-to-target mapping and
automatically generates integration jobs.
182 Moving Forward with the On Demand Real-time Enterprise
Some of the functions/features of WebSphere ProfileStage are:
򐂰 Analyzes and profiles source and target systems to enable discovery and
documentation of data anomalies.
򐂰 Validates the content, quality, and structure of your data from disparate
systems - without programming.
򐂰 Enables metadata exchange within the integration platform.
򐂰 Provides a single and open repository for ease of maintenance and reporting.
No assumptions are made about the content of the data. The user supplies a
description of the record layouts. Then WebSphere ProfileStage reads the
source data, and automatically analyzes and profiles the data, so that the
properties of the data (defined by the metadata) are generated without error. The
properties include the tables, columns, probable keys and interrelationships
among the data. Once these properties are known and verified, WebSphere
ProfileStage automatically generates a normalized target database schema.
The business intelligence reports and source data to target database
transformations are specified as part of the construction of this target database.
After the source data is understood, it must be transformed into a relational
database. This process is automated by ProfileStage, yielding a proposal for the
target database that can be edited to get the best possible results.
The following is a description of the process and major components for profiling:
򐂰 Column Analysis: Here we examine all values for the same column to infer
the column definition and other properties such as domain values, statistical
measures, and min/max values. During Column Analysis, each available
column of each table of source data is individually examined in depth. It is
here that many properties of the data are observed and recorded, such as
minimum, maximum, and average length, precision and scale for numeric
values, basic data types encountered, including different date and time
formats, minimum, maximum and average numeric values, count of empty
values, NULL values, and non-NULL/empty values, and count of distinct
values or cardinality.
򐂰 Table Analysis: This is the process of examining a random data sample
selected from the data values for all columns of a table in order to compute
the functional dependencies for this table. The purpose is to find associations
between different columns in the same table. A functional dependency exists
in a table if one set of columns is dependent on another set of columns. Each
functional dependency has two components:
Determinant: A set of columns in the same table that compose the
determinant. That is, the set of columns that determine the dependency. A
determinant can consist of one or more columns.

Get Moving Forward with the On Demand Real-time Enterprise now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.