190 Improving Business Performance Insight
execution, which allows them to create a visual sequential data flow. A graphical
palette helps developers diagram the flow of data through their environment via
GUI-driven drag-and-drop design components. Developers also benefit from
scripting language, debugging capabilities, and an open application
programming interface (API) for leveraging external code. The WebSphere
DataStage Designer tool is depicted in Figure 6-43.
Figure 6-43 WebSphere DataStage Designer
6.2.3 WebSphere ProfileStage
WebSphere ProfileStage allows users to integrate multiple disparate systems by
providing a complete understanding of the metadata and by discovering
dependencies within and across tables and databases. Because the metadata is
based upon the actual source data, accuracy is nearly 100%, reducing the
project risk by uncovering integration issues before development begins.
WebSphere ProfileStage brings automation to the critical and fundamental tasks
of data source analysis, expediting comprehensive data analysis, reducing the
time-to-market, and minimizing overall costs and resources for critical data
integration projects. It profiles source data by analyzing column values and
structures and provides target database recommendations, such as primary
keys, foreign keys, and table normalizations. Armed with this information, it
Chapter 6. Case study software components 191
builds a model of the data to facilitate the source-to-target mapping and
automatically generates integration jobs.
Some of the functions and features of WebSphere ProfileStage are:
򐂰 Analyzes and profiles source and target systems to enable discovery and
documentation of data anomalies
򐂰 Validates the content, quality, and structure of your data from disparate
systems without programming
򐂰 Enables metadata exchange within the integration platform
򐂰 Provides a single and open repository for ease of maintenance and reporting
No assumptions are made about the content of the data. The user supplies a
description of the record layouts. Then WebSphere ProfileStage reads the
source data and automatically analyzes and profiles the data so that the
properties of the data (defined by the metadata) are generated without error. The
properties include the tables, columns, probable keys, and interrelationships
among the data. Once these properties are known and verified, WebSphere
ProfileStage automatically generates a normalized target database schema.
You specify the business intelligence reports and source data to target database
transformations as part of the construction of this target database. After the
source data is understood, it must be transformed into a relational database. This
process is automated by ProfileStage, yielding a proposal for the target database
that can be edited to get the best possible results.
The following is a description of the process and major components for profiling:
򐂰 Column Analysis: Here we examine all values for the same column to infer
the column definition and other properties such as domain values, statistical
measures, and min/max values. During Column Analysis, each available
column of each table of source data is individually examined in-depth. It is
here that many properties of the data are observed and recorded, such as
minimum, maximum, and average length, precision and scale for numeric
values, basic data types encountered including different date and time
formats, minimum, maximum and average numeric values, count of empty
values, NULL values, and non-NULL/empty values, and count of distinct
values or cardinality.
򐂰 Table Analysis: This is the process of examining a random data sample
selected from the data values for all columns of a table in order to compute
the functional dependencies for this table. The purpose is to find associations
between different columns in the same table. A functional dependency exists
in a table if one set of columns is dependent on another set of columns. Each
functional dependency has two components:

Get Improving Business Performance Insight . . . with Business Intelligence and Business Process Management now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.