12.7. CHAPTER SUMMARY

  • ETL functions in a data warehouse are most important, challenging, time-consuming, and labor-intensive.

  • Data extraction is complex because of the disparate source systems; data transformation is difficult because of the wide range of tasks; data loading is challenging because of the volume of data.

  • Several data extraction techniques are available, each with its advantages and disadvantages. Choose the right technique based on the conditions in your environment.

  • The data transformation function encompasses data conversion, cleansing, consolidation, and integration. Implement the transformation function using a combination of specialized tools and in-house developed software.

  • The data loading function relates to the initial load, regular periodic incremental loads, and full refreshes from time to time. Four methods to apply data are: load, append, destructive merge, and constructive merge.

  • Tools for ETL functions fall into three broad functional categories: data transformation engines, data capture through replication, and code generators

Get DATA WAREHOUSING FUNDAMENTALS: A Comprehensive Guide for IT Professionals now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.