Chapter 5. More real-time enterprise enablers 255
5.4.2 A WebSphere DataStage project
A WebSphere DataStage project is comprised on the following components:
DataStage jobs
Built-in components
User-defined components
Jobs
A DataStage job consists of a series of individual stages, linked together to
describe the flow of data from a data source to the data warehouse or other data
target. Each stage describes a particular phase of the process. For example, one
stage may extract data from a data source, while another transforms it. Stages
are added to a job and linked together using the same DataStage Designer.
You must specify the data you want at each stage, and how it is handled. For
example, do you want all columns in the source data or only a particular subset?
Should the data be aggregated or converted before being passed on the next
stage?
Data properties are defined by:
Table definitions: These specify the data you want, and each table definition
contains:
– Information about the table or file that holds the data records.
– A description of the individual columns.
Data elements: Each data elements describes one type of data that can be
stored in a column. The data element associated with a column defines the
operations that can be performed on that column. DataStage has numerous
predefined data elements representing commonly required data types, such
as date, time, number and string. You can also define your own special data
elements.
Transforms: These convert and cleanse the data by transforming it into the
format desired and defined for your data warehouse. DataStage provides a
large library of built-in transforms to get you started fast in this phase.
Together, these properties determine what occurs at each stage of a DataStage
job. The properties are set up project-wide and are shared by all the jobs in a
project.
DataStage supports three types of jobs:
1. Server jobs are both developed and compiled using DataStage client tools.
Compilation of a server job creates an executable that is scheduled and run
from the DataStage Director.
256 Moving Forward with the On Demand Real-time Enterprise
2. Parallel jobs are developed and compiled using DataStage client tools.
Compilation of parallel job creates an executable that is scheduled and run
from the DataStage Director. However, parallel jobs require a UNIX server for
compiling and running. Parallel jobs support parallel processing on SMP,
MPP, and cluster systems.
3. Mainframe jobs are developed using the same DataStage client tools for
server jobs, but compilation and execution occurs on a mainframe computer.
The Designer generates a COBOL source file and supporting JCL script, then
enables uploading them to the target mainframe computer. The job is
compiled and run on the mainframe computer under the control of native
mainframe software.
When a job runs, the processing stages described in the job design are
performed using the data properties defined. Executable jobs can be packaged
for use on other DataStage systems.
Stages
In DataStage there are two types of stages:
1. Built-in stages: Supplied with DataStage and used for extracting, aggregating,
transforming, or writing data. These stages can be either passive or active.
2. Plug-in stages: Additional stages defined in the DataStage Manager to
perform tasks that the built-in stages do not support.
A stage usually has at least one data input and one data output. However, some
stages can accept more than one data input, and output to more than one stage.
Stages can be passive or active:
Passive stages define read and write access to data sources and
repositories: Types of passive stages are:
– Sequential
– UniVerse®
–ODBC
– Hashed
Active stages define how data is transformed and filtered. Types of active
stages are:
– Transformation
– Aggregation
Stages and links can be grouped to form a container. A container is represented
by a container stage:
Grouping stages together into containers can make the design simple, and
easy to read.
Get Moving Forward with the On Demand Real-time Enterprise now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.