Chapter 2. Financial services business scenario 435
򐂰 Baseline Analysis to determine whether any significant changes have
occurred to the structure and content of the source from the time the data
profiling effort commenced to just prior to the execution of the plan to migrate
the data sources to the target.
2.5.3 IBM WebSphere AuditStage features used
We will be using IBM WebSphere AuditStage’s Data Filters functionality to
perform business rule compliance for data elements within the same table and
across multiple tables in the source.
2.5.4 North American Bank analysis
As mentioned earlier, North American Bank’s core services are to be migrated to
those of the Northern California Bank.
Using metadata information available in dictionaries, documentation and the
relational catalogs, the keys, code fields, indicator fields, and referential integrity
relationships (both implicit and explicit) to be profiled were identified as shown in
Table 2- 3.
We include only selected portions of the generated reports (for some of the fields
shown in Table 2-3) of the data profiling effort in this section. You can download
the complete reports from the IBM Redbooks Web site:
ftp://www.redbooks.ibm.com/redbooks/SG247508/
Business rules involving data elements in one or more tables are shown in
Table 2-4. These are verified using IBM WebSphere AuditStage.
Note: For convenience, we chose to include all the columns in all the tables in
our data profiling effort. In the real world, however, to avoid report and
information overload, you would apply the 80/20 rule and focus on only the
critical master data as described in 1.2.1, “Data assessment approach” on
page 5.
Note: We describe the process of generating reports in 1.14, “Reports” on
page 394 and do not repeat that information here. We show only the relevant
portions of reports here.
436 IBM WebSphere Information Analyzer and Data Quality Assessment
Table 2-3 Main keys, codes, indicators in North American Bank core services
Table 2-4 Business rules in North American Bank core services
Column Analysis reports
These reports provide information about column values for completeness,
validity, structure, and format as described in 1.7, “Column analysis” on
page 109. These reports are used to review the columns for domain, structure
and format integrity.
Figure 2-6 on page 437 shows the Column Inferences “Type & Length Properties
(Defined/Inferred/Chosen)” report for the CUSTOMER table. It identifies the data
type, length, precision and scale of each column as defined in the metadata and
inferred from the data content. Any selected status of each of these categories is
also shown here. This information (along with those relating to other columns in
Table Key column(s) Code column Indicator column RI relationships
CUSTOMER CUSTOMER_ID 򐂰 TITLE
򐂰 LEVEL_CD
򐂰 GENDER_IND
򐂰 CHURN_IND
LEVEL_CD to LEVEL_REF table
CONTACT_INFO CUSTOMER_ID, ACCOUNT_ID 򐂰 HOME_ZIP
򐂰 WORK_ZIP
CUSTOMER_ID to CUSTOMER table
ACCOUNT ACCOUNT_ID 򐂰 ACTIVE_IND
򐂰 TYPE_IND
ACCOUNT_ID to ACCOUNT table
LOAN ACCOUNT_ID, LOAN_ID AUTOMAT_DEBIT_IND ACCOUNT_ID to ACCOUNT table
LOAN_TRANSACTION ACCOUNT_ID, LOAN_ID,
TRANSACTION_ID
TRANS_TYPE_ID AUTOMAT_DEBIT_IND 򐂰 ACCOUNT_ID to ACCOUNT table
򐂰 TRANS_TYPE_CD to
TRANSACTION_TYPE_REF table
TRANSACTION ACCOUNT_ID,
TRANSACTION_ID
TRANS_TYPE_CD 򐂰 ACCOUNT_ID to ACCOUNT table
򐂰 TRANS_TYPE_CD to
TRANSACTION_TYPE_REF table
BRANCH BRANCH_ID WORK_ZIP
CUST_ACC CUSTOMER_ID, ACCOUNT_ID 򐂰 CUSTOMER_ID to CUSTOMER
table
򐂰 ACCOUNT_ID to ACCOUNT table
TRANSACTION_TYPE_REF TRANS_TYPE_CD TRANS_TYPE_CD
Serial number Description
1 An account (in the ACCOUNT table) can only have a transaction if
the ACTIVE_IND column is set to “Y”
2 The OVERDRAFT column value cannot exceed the
OVERDRAFT_LIMIT column value in the ACCOUNT table
3 When the OVERDRAFT column has a value, the
OVERDRAFT_RATE and OVERDRAFT_FEE columns in the
ACCOUNT table must be filled
Chapter 2. Financial services business scenario 437
other tables) is recorded in Table 2-13 on page 475. Our focus is on keys, codes,
and indicators as highlighted.
Figure 2-7 on page 438 shows the Column Frequency “Frequency By
Frequency” report for the GENDER_IND column in the CUSTOMER table. It
shows some unexpected values such as spaces, in addition to the valid values
M, F, U, and NULL.
Figure 2-8 on page 439 shows the Column Domain Values “Completeness and
Validity Summary” report for the CUSTOMER table. It shows 41 occurrences of
incomplete (NULL) values, and 41 occurrences of invalid values in the
GENDER_IND column.
Document all the valid and invalid values for a column in a table as shown in
Table 2-5 on page 439.
Figure 2-6 Column Inferences Type & Length Properties (Defined/Inferred/Chosen) report for the
CUSTOMER table
438 IBM WebSphere Information Analyzer and Data Quality Assessment
Figure 2-7 Column Frequency “Frequency By Frequency” report for the GENDER_IND column in the
CUSTOMER table

Get IBM WebSphere Information Analyzer and Data Quality Assessment now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.