538 IBM WebSphere Information Analyzer and Data Quality Assessment
inline validation and transformation of complex data types such as the U.S.
Health Insurance Portability and Accountability Act (HIPAA), along with
high-speed joins and sorts of heterogeneous data. IBM Information Server
also provides high-volume, complex data transformation and movement
functionality that can be used for stand-alone extract/transform/load (ETL)
scenarios or as a real-time data processing engine for applications or
processes.
The WebSphere DataStage product modules currently provide this
functionality.
򐂰 Deliver your information
IBM Information Server provides the ability to virtualize, synchronize, or move
information to the people, processes, or applications that need it. Information
can be delivered through federation or time-based or event-based
processing, moved in large bulk volumes from location to location, or
accessed in place when it cannot be consolidated. IBM Information Server
provides direct, native access to a wide variety of information sources, both
mainframe and distributed. It provides access to databases, files, services,
and packaged applications and to content repositories and collaboration
systems. Companion products allow high-speed replication, synchronization,
and distribution across databases, change data capture, and event-based
publishing of information.
The WebSphere Federation Server product module currently provides this
functionality.
A.2.4 Unified parallel processing
Much of the work that IBM Information Server does takes place within the parallel
processing engine. The engine handles data processing needs as diverse as
performing analysis of large databases for WebSphere Information Analyzer,
data cleansing for WebSphere QualityStage, and complex transformations for
WebSphere DataStage. This parallel processing engine is designed to deliver:
򐂰 Parallelism and pipelining to complete increasing volumes of work in
decreasing time windows.
Data partitioning is an approach to parallelism that involves breaking the
record set into partitions, or subsets of records. Data partitioning generally
provides linear increases in application performance.
IBM Information Server partitions data automatically based on the type of
partition that the stage requires. In a well-designed, scalable architecture,
the developer does not need to be concerned about the number of
partitions that will run, the ability to increase the number of partitions, or
re-partitioning data.

Get IBM WebSphere Information Analyzer and Data Quality Assessment now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.