5

Data Extraction Best Practices

A natural first discussion point for ETL is the extraction, the E in ETL. This chapter applies the concepts of data extraction using SSIS. ETL applies to a broad spectrum of applications beyond just data warehousing and data integration. Therefore, the discussion of this topic includes both generalized extraction concepts and data warehouse–specific concepts.

Data extraction is the process of moving data off of a source system, potentially to a staging environment, or into the transformation phase of the ETL. Figure 5-1 shows the extraction process separated out on the left. An extraction process may pull data from a variety of sources, including files or database systems, as this figure highlights.

Figure 5-1

Image

Following are a few common objectives of data extraction:

  • Consistency in how data is extracted across source systems
  • Performance of the extraction
  • Minimal impact on the source to avoid contention with critical source processes
  • Flexibility to handle source system changes
  • The capability to target only new or changed records

This chapter is structured into three sections related to the Problem-Design-Solution of ETL extraction processes using SSIS:

  • The “Problem” section elaborates on some of the ETL extraction objectives listed earlier. This section is general to ETL, and is not directly related to SSIS implementation details.
  • The “Design” section ...

Get Microsoft® SQL Server® 2008 Integration Services: Problem-Design-Solution now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.