2.6. Data Profiling

Data profiling is becoming an increasingly important facet to a DBA or warehouse architect's job title. Data profiling can shave weeks off a few-month project by speeding up the requirements process and letting the business unit know what type of issues they're going to experience prior to starting the project. It also is extremely important if you want to do data validation or cleansing as data moves through the data flow.

Data profiling is the DBA's way of checking inconsistent user-entered data that falls through the cracks of being validated from the source that saved the data to the database. DBAs can use SSIS to assist in data profiling so that the format of data (such as credit card numbers, telephone numbers, e-mail addresses, residential addresses, and so on) follows the formatted guidelines for which the data is intended to be stored. You especially see this type of data cleansing with companies that perform mailings. The cost of mailing is much cheaper if you're able to presort and standardize the addresses.

Data profiling with SSIS provides the flexibility to direct bad data to text logs or other database tables, and to provide custom information or metadata, so that there is enough information for corrective action to be taken on such data.

The following demo profiles the EmailAddress and Phone columns from the AdventureWorks Person.Contact table. The data flow will separate good and bad records by placing them into two different flat files.

Get Expert SQL Server™ 2005 Integration Services now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.