O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

SQL Server 2012 Data Integration Recipes: Solutions for Integration Services and Other ETL Tools

Book Description

SQL Server 2012 Data Integration Recipes provides focused and practical solutions to real world problems of data integration. Need to import data into SQL Server from an outside source? Need to export data and send it to another system? SQL Server 2012 Data Integration Recipes has your back. You'll find solutions for importing from Microsoft Office data stores such as Excel and Access, from text files such as CSV files, from XML, from other database brands such as Oracle and MySQL, and even from other SQL Server databases. You'll learn techniques for managing metadata, transforming data to meet the needs of the target system, handling exceptions and errors, and much more.

What DBA or developer isn't faced with the need to move data back and forth? Author Adam Aspin brings 10 years of extensive ETL experience involving SQL Server, and especially satellite products such as Data Transformation Services and SQL Server Integration Services. Extensive coverage is given to Integration Services, Microsoft's flagship tool for data integration in SQL Server environments. Coverage is also given to the broader range of tools such as OPENDATASOURCE, linked servers, OPENROWSET, Migration Assistant for Access, BCP Import, and BULK INSERT just to name a few. If you're looking for a resource to cover data integration and ETL across the gamut of Microsoft's SQL Server toolset, SQL Server 2012 Data Integration Recipes is the one book that will meet your needs.

  • Provides practical and proven solutions towards creating resilient ETL environments

  • Clearly answers the tough questions which professionals ask

  • Goes beyond the tools to a thorough discussion of the underlying techniques

  • Covers the gamut of data integration, beyond just SSIS

  • Includes example databases and files to allow readers to test the recipes

  • What you'll learn

  • Import and export to and from CSV files, XML files, and other text-based sources.

  • Move data between SQL databases, including SQL Server and others such as Oracle Database and MySQL.

  • Discover and manage metadata held in various database systems.

  • Remove duplicates and consolidate from multiple sources.

  • Transform data to meet the needs of target systems.

  • Profile source data as part of the discovery process.

  • Log and manage errors and exceptions during an ETL process.

  • Improve efficiency by detecting and processing only changed data.

  • Who this book is for

    SQL Server 2012 Data Integration Recipes is written for developers wishing to find fast and reliable solutions for importing and exporting to and from SQL Server. The book appeals to DBAs as well, who are often tasked with implementing ETL processes. Developers and DBAs moving to SQL Server from other platforms will find the succinct, example-based approach ideal for quickly applying their general ETL knowledge to the specific tools provided as part of a SQL Server environment.

    Table of Contents

    1. Title Page
    2. Dedication
    3. Contents at a Glance
    4. Contents
    5. About the Author
    6. About the Technical Reviewers
    7. Acknowledgments
    8. Introduction
    9. Chapter 1: Sourcing Data from MS Office Applications
      1. 1-1. Ensuring Connectivity to Access and Excel
      2. 1-2. Importing Data from Excel
      3. 1-3. Modifying Excel Data During a Load
      4. 1-4. Specifying the Excel Data to Load During an Ad-Hoc Import
      5. 1-5. Planning for Future Use of a Linked Server
      6. 1-6. Reading Data Automatically from an Excel Worksheet
      7. 1-7. Loading Excel Data as Part of a Structured ETL Process
      8. 1-8. Importing Excel 2007/2010 Data Using SSIS 2005
      9. 1-9. Handling Source Data Issues When Importing Excel Worksheets Using SSIS
      10. 1-10. Pushing Access Data into SQL Server
      11. 1-11. Importing Multiple Access Tables
      12. 1-12. Ad Hoc Imports of Access Data
      13. 1-13. Obtaining Access Data Without Regular Imports
      14. 1-14. Importing Access Data as Part of a Regular ETL Process
      15. 1-15. Convert a Complex Access Database to SQL Server
      16. 1-16. Resolving Complex Data Migration Problems During an Access to SQL Server Upgrade
      17. Summary
    10. Chapter 2: Flat File Data Sources
      1. 2-1. Importing Data From a Text File
      2. 2-2. Importing a Delimited Text File
      3. 2-3. Automatically Determining Data Types
      4. 2-4. Importing Fixed-Width Text Files
      5. 2-5. Importing Text Files Using T-SQL
      6. 2-6. Mapping a Source File
      7. 2-7. Importing Data Using T-SQL in Anticipation of Using a Linked Server
      8. 2-8. Accessing a Text File As a Linked Server Table
      9. 2-9. Importing Flat Files from the Command Line
      10. 2-10. Importing Large Text Files Using T-SQL and Putting the Emphasis on Speed
      11. 2-11. Creating a Format File for Complex Flat File Loads Using Bulk Insert or BCP
      12. 2-12. Performing a BULK INSERT with a Format File
      13. 2-13. Loading a Text File Fast Using T-SQL
      14. 2-14. Executing BULK INSERT from SSIS
      15. 2-15. Handling Complex Flat File Formats with a Row Prefix in SSIS
      16. 2-16. Pre-Parsing and Staging File Subsets in SSIS
      17. 2-17. Handling Irregular Numbers of Columns in the Source File Using SQL Server 2012
      18. 2-18. Handling Embedded Qualifiers in SQL Server 2012
      19. 2-19. Handling Irregular Numbers of Columns in the Source File in SQL Server 2005 and 2008
      20. 2-20. Determining the Number of Columns in a Source File
      21. 2-21. Preparing CSV Files for Import
      22. Summary
    11. Chapter 3: XML Data Sources
      1. 3-1. Loading XML Files for Storage in SQL Server
      2. 3-2. Loading XML Data into Rows and Columns
      3. 3-3. Shredding an XML File into an SQL Server Table
      4. 3-4. Importing XML Data as Part of an ETL Process
      5. 3-6. Importing Multiple XML Files into SQL Server
      6. 3-7. Flattening an XML File to Prepare It for Import
      7. 3-8. Importing XML Data from Very Large Files, Putting a Priority on Speed
      8. 3-9. Loading Multiple Tables at Once from a Single XML Source File
      9. 3-10. Loading and Shredding Relational Tables from an XML Source File
      10. 3-11. Overcoming Bulk Loading Challenges for XML Files
      11. 3-12. Running SQLXML Bulk Loader As Part of a Regular ETL Process
      12. 3-13. Validating an XML Document Against a Schema File As Part of a Scripted Solution
      13. 3-14. Validating an XML Document Against a Schema File in SSIS
      14. Summary
    12. Chapter 4: SQL Databases
      1. Preamble: Installing and Configuring OLEDB and ODBC Providers
      2. 4-1. Configuring Your Server to Connect to Oracle
      3. 4-2. Importing Data from Oracle As a Regular Process
      4. 4-3. Accelerating Oracle Data Import
      5. 4-4. Importing Oracle Data on an “Ad Hoc” Basis
      6. 4-5. Migrating Multiple Oracle Tables and Views
      7. 4-6. Loading DB2 Data on a Regular Basis
      8. 4-7. Importing DB2 Data Without SSIS
      9. 4-8. Sourcing Data from MySQL on a Regular Basis
      10. 4-9. Importing MySQL Data on an “Ad Hoc” Basis
      11. 4-10. Importing MySQL Data Using SSIS 2005 and 2008
      12. 4-11. Migrating Complete Tables from MySQL
      13. 4-12. Loading Data from Sybase Adaptive Server Enterprise (ASE)
      14. 4-14. Importing Sybase ASE Data on an “Ad Hoc” Basis
      15. 4-15. Importing Sybase ASE Data on a Regular Basis
      16. 4-16. Loading Teradata Data
      17. 4-17. Sourcing Data from PostgreSQL
      18. Summary
    13. Chapter 5: SQL Server Sources
      1. 5-1. Loading Ad Hoc Data from Other SQL Server Instances
      2. 5-2. Reading Data from Another SQL Server Instance over a Permanent Connection
      3. 5-3. Loading Large Data Sets Using T-SQL
      4. 5-4. Load Data Exported from SQL Server from the Command Line
      5. 5-5. Loading SQL Server Data from a Native SQL Server File
      6. 5-6. Transferring Data Between SQL Server Databases Regularly
      7. 5-7. Porting a Tiny Amount of Data Between SQL Server Databases
      8. 5-8. Copying and Pasting Between Tables
      9. 5-9. Loading Data as Fast as Possible in an ETL Process
      10. 5-10. Importing and Exporting All the Tables in a Database
      11. 5-11. Loading Data from Different Versions of SQL Server
      12. 5-12. Copying Entire Databases
      13. 5-13. Transferring a Complex Subset of Data Between Databases
      14. 5-14. Loading Data into SQL Server Azure Interactively
      15. 5-15. Loading Data into SQL Server Azure as Part of a Regular ETL Process
      16. 5-16. Loading Data into SQL Server Azure from the Command Line
      17. 5-17. Loading Ad Hoc Data into SQL Server Azure
      18. 5-18. Transferring Tables Between Databases
      19. Summary
    14. Chapter 6: Miscellaneous Data Sources
      1. 6-1. Importing Data from SQL Server Analysis Services
      2. 6-2. Importing Data from an Analysis Services Cube on a Regular Basis
      3. 6-3. Querying an OLAP Source on an Ad Hoc Basis
      4. 6-4. Loading Images and Documents into an SQL Server Table
      5. 6-5. Importing Multiple Files into an SQL Server Table
      6. 6-6. Importing Files into SQL Server on a Regular Basis
      7. 6-7. Importing Files with Their Attributes into SQL Server
      8. 6-8. Loading Visual FoxPro Files
      9. 6-9. Importing Data from dBase
      10. 6-10. Loading Data from Web Services
      11. 6-11. Importing Windows Management Instrumentation Data
      12. 6-12. Importing Data over ODBC
      13. 6-13. Linking to 32-bit data sources from a 64-bit SQL Server
      14. Summary
    15. Chapter 7: Exporting Data from SQL Server
      1. 7-1. Exporting Data Occasionally in Various Formats
      2. 7-2. Exporting Data As a Delimited Text File
      3. 7-3. Exporting Data to a Fixed-Width Text File
      4. 7-4. Exporting Text Files from the Command Line
      5. 7-5. Exporting Data Using BCP Without Recourse to the Command Line
      6. 7-6. Exporting Data As Text Files from T-SQL
      7. 7-7. Exporting Data to Another SQL Server Database
      8. 7-8. Exporting Text Files Regularly from T-SQL
      9. 7-9. Exporting and Compressing Multiple Tables
      10. 7-10. Exporting a Tiny Dataset As XML
      11. 7-11. Exporting a Larger Dataset As XML
      12. 7-12. Exporting Large Datasets As XML
      13. 7-13. Creating an XML Schema
      14. 7-14. Shaping XML Export Data
      15. 7-15. Exporting XML Data on a Regular Basis
      16. 7-16. Routinely Exporting Small XML Datasets
      17. 7-17. Exporting Data to Excel Using T-SQL
      18. 7-18. Exporting Data to Access Using T-SQL
      19. 7-19. Exporting Data Securely to Excel from T-SQL
      20. 7-20. Exporting Data Securely to Access from T-SQL
      21. 7-21. Exporting Data to Excel Using SSIS
      22. 7-22. Exporting Data to Access Using SSIS
      23. 7-23. Pulling Data from Office Applications
      24. 7-24. Exporting Files Stored in SQL Server Using T-SQL
      25. 7-25. Exporting Files Stored in SQL Server on a Regular Basis
      26. 7-26. Exporting Data from SSAS Using T-SQL on an Occasional Basis
      27. 7-27. Exporting Data from SSAS Using T-SQL on a More Regular Basis
      28. 7-28. Exporting an SSAS Dimension Using SSIS
      29. 7-29. Exporting the Result of an MDX Query in SSIS
      30. 7-30. Exporting Data to Other Relational Databases Using T-SQL
      31. 7-31. Exporting Data to Other Relational Databases Using SSIS
      32. 7-32. Exporting Data from SQL Server Azure
      33. Summary
    16. Chapter 8: Metadata
      1. 8-1. Listing Available Tables from a Linked Server
      2. 8-2. Listing the Columns Available When Using a Linked Server
      3. 8-3. Discovering Flat File Metadata
      4. 8-4. Returning Simple SQL Server Metadata
      5. 8-5. Gathering Tailored SQL Server Metadata
      6. 8-6. Analyzing SQL Server Table Metadata
      7. 8-7. Analyzing SQL Server Column Metadata
      8. 8-8. Displaying Microsoft Access Metadata
      9. 8-9. Reading MySQL Metadata
      10. 8-10. Displaying Oracle Metadata from SSIS
      11. 8-11. Querying Oracle Metadata Using T-SQL
      12. 8-12. Understanding the Oracle Data Dictionary
      13. 8-13. Displaying DB2 Metadata
      14. 8-14. Obtaining SQL Server Metadata Using .NET
      15. Summary
    17. Chapter 9: Data Transformation
      1. 9-1. Converting Data Types
      2. 9-2. Removing Duplicates from Data
      3. 9-3. Deduplicating Large Recordsets
      4. 9-4. Deduplicating Data in an ETL Data Flow
      5. 9-5. Subsetting Column Data Using T-SQL
      6. 9-6. Generating Fixed-Length Column Data Subsets Using SSIS
      7. 9-7. Generating Variable-Length Column Data Subsets
      8. 9-8. Concatenating Source Data Using T-SQL
      9. 9-9. Concatenating Data Using SSIS
      10. 9-10. Duplicating Columns
      11. 9-11. Converting Strings to Uppercase or Lowercase
      12. 9-12. Converting Strings to Title Case
      13. 9-13. PIVOTing Data in T-SQL
      14. 9-14. Pivoting Data in SSIS with SQL Server 2012
      15. 9-15. Pivoting Data in SSIS with SQL Server 2005 and 2008
      16. 9-16. Consolidating Multiple Identical Data Sources in T-SQL
      17. 9-17. Consolidating Multiple Identical Data Sources in SSIS
      18. 9-18. Normalizing Data Inside a Single Table Using T-SQL
      19. 9-19. Normalizing Data Inside a Single Table Using SSIS
      20. 9-20. Normalizing Data into Multiple Relational Tables Using T-SQL
      21. 9-21. Normalizing Data into Multiple Relational Tables Using SSIS
      22. 9-22. Denormalizing Data by Referencing Lookup Tables in T-SQL
      23. 9-23. Denormalizing Data by Referencing Lookup Tables in SSIS
      24. 9-24. Processing Type 1 Slowly Changing Dimensions (SCDs) Using T-SQL
      25. 9-25. Handling Type 2 Slowly Changing Dimensions in T-SQL
      26. 9-26. Handling Type 2 Slowly Changing Dimensions with SSIS
      27. 9-27. Handling Type 3 Slowly Changing Dimensions Using T-SQL
      28. 9-28. Handling Type 4 Slowly Changing Dimensions Using T-SQL
      29. 9-29. Handling Type 4 Slowly Changing Dimensions with SSIS
      30. 9-30. Cleansing Data As Part of an ETL Process
      31. Summary
    18. Chapter 10: Data Profiling
      1. 10-1. Profiling Data Attributes
      2. 10-2. Profiling Domain and Value Distribution
      3. 10-3. Profiling External Data
      4. 10-4. Profiling External Data Faster
      5. 10-5. Running and Logging a Complete DataProfile
      6. 10-6. Profiling Data Quickly in SSIS
      7. 10-7. Creating Custom Data Profiles with SSIS
      8. 10-8. Using the SSIS Data Profiling Task on non-SQL Server Data Sources
      9. 10-9. Reading Profile Data
      10. 10.10. Storing SSIS Profile Data in a Database
      11. 10-11. Tailoring Specific Source Data Profiles in SSIS
      12. 10-12. Domain Analysis in SSIS
      13. 10-13. Performing Multiple Domain Analyses
      14. 10-14. Pattern Profiling in a Data Flow
      15. 10-15. Pattern Profiling Using T-SQL
      16. 10-16. Profiling Data Types
      17. 10-17. Controlling Data Flow via Profile Metadata
      18. Summary
    19. Chapter 11: Delta Data Management
      1. Preamble: Why Bother with Delta Data?
      2. Delta Data Approaches
      3. 11-1. Loading Delta Data as Part of a Structured ETL Process
      4. 11-2. Loading Data Changes Using a Linked Server
      5. 11-3. Loading Data Changes From a Small Source Table as Part of a Structured ETL Process
      6. 11-4. Detecting and Loading Delta Data Only
      7. 11-5. Performing Delta Data Upserts with Other SQL Databases
      8. 11-6. Handling Data Changes Without Writing to the Source Server
      9. 11-7. Detecting Data Changes with Limited Source Database Access
      10. 11-8. Detecting and Loading Delta Data Using T-SQL and a Linked Server When MERGE Is Not Practical
      11. 11-9. Detecting, Logging, and Loading Delta Data
      12. 11-10. Detecting Differences in Rowcounts, Metadata, and Column Data
      13. Summary
    20. Chapter 12: Change Tracking and Change Data Capture
      1. 12-1. Detecting Source Table Changes with Little Overhead and No Custom Framework
      2. 12-2. Pulling Changes into a Destination Table with Change Tracking
      3. 12-3. Using Change Tracking as Part of a Structured ETL Process
      4. 12-4. Detecting Changes to Source Data Using the SQL Server Transaction Log
      5. 12-5. Applying Change Data Capture with SSIS
      6. 12-6. Using Change Data Capture with Oracle Source Data
      7. Summary
    21. Chapter 13: Organising And Optimizing Data Loads
      1. 13-1. Loading Multiple Files
      2. 13-2. Selecting Multiple Text Files to Import
      3. 13-3. Loading Multiple Files Using Complex Selection Criteria
      4. 13-4. Ordering and Filtering File Loads
      5. 13-5. Loading Multiple Flat Files in Parallel
      6. 13-6. Loading Source Files with Load Balancing
      7. 13-7. Loading Data to Parallel Destinations
      8. 13-8. Using a Single Data File As a Multiple Data Source for Parallel Destination Loads
      9. 13-9. Reading and Writing Data from a Database Source in Parallel
      10. 13-10. Inserting Records in Parallel and in Bulk
      11. 13-11. Creating Self-Optimizing Parallel Bulk Inserts
      12. 13-12. Loading Files in Controlled Batches
      13. 13-13. Executing SQL Statements and Procedures in Parallel Using SSIS
      14. 13-14. Executing SQL Statements and Procedures in Parallel Without SSIS
      15. 13-15. Executing SQL Statements and Procedures in Parallel Using SQL Server Agent
      16. Summary
    22. Chapter 14: ETL Process Acceleration
      1. 14-1. Accelerating SSIS Lookups
      2. 14-2. Disabling and Rebuilding Nonclustered Indexes in a Destination Table
      3. 14-3. Persisting Destination Database Index Metadata
      4. 14-4. Scripting and Executing DROP Statements for Destination Database Indexes
      5. 14-5. Scripting and Executing CREATE Statements for Destination Database Indexes
      6. 14-6. Storing Metadata, and Then Scripting and Executing DROP and CREATE Statements for Destination Database XML Indexes
      7. 14-7. Finding Missing Indexes
      8. 14-8. Managing Check Constraints
      9. 14-9. Managing Foreign Key Constraints
      10. 14-10. Optimizing Bulk Loads
      11. Summary
    23. Chapter 15: Logging and Auditing
      1. 15-1. Logging Events from T-SQL
      2. 15-2. Logging Data from SSIS
      3. 15-3. Customizing SSIS Logging
      4. 15-4. Saving and Applying Complex SSIS Logging Details
      5. 15-5. Extending SSIS Logging to an SQL Server Destination
      6. 15-6. Logging Information from an SSIS Script Task
      7. 15-7. Logging from T-SQL to the SSIS Log Table
      8. 15-8. Handling Errors in T-SQL
      9. 15-9. Handling Errors in SSIS
      10. 15-10. Creating a Centralized Logging Framework
      11. 15-11. Logging to a Centralized Framework When Using SSIS Containers
      12. 15-12. Logging to a Centralized Framework When Using SSIS Script Tasks and Components
      13. 15-13. Logging to a Text or XML File from T-SQL
      14. 15-14. Logging Counters in T-SQL
      15. 15-15. Logging Counters from SSIS
      16. 15-16. Creating an SSIS Catalog
      17. 15-17. Reading Logged Events and Counters from the SSIS Catalog
      18. 15-18. Analyzing Events and Counters In-Depth via the SSIS Catalog
      19. 15-19. Creating a Process Control Framework
      20. 15-20. Linking the SSIS Catalog to T-SQL Logging
      21. 15-21. Baselining ETL Processes
      22. 15-22. Auditing an ETL Process
      23. 15-23. Logging Audit Data
      24. Summary
    24. Appendix A: Data Types
      1. SQL Server Data Types
      2. SSIS Data Types
      3. Default Data Mapping in the Import/Export Wizard
      4. MSSQL9 to MSSQL8
      5. MSSQL to DB2
      6. MSSQL to IBMDB2
      7. MSSQL to Jet4
      8. MSSQL to SSIS11
      9. OracleClient to MSSQL
      10. OracleClient to MSSQL11
      11. OracleClient to SSIS11
      12. Oracle to MSSQL
      13. Oracle to MSSQL11
      14. Oracle to SSIS11
      15. SQLClient9 to MSSQL8
      16. SQLClient to DB2
      17. SQLClient to IBMDB2
      18. SQLClient to MSSQL11
      19. SQLClient to Oracle
      20. SQLClient to SSIS
      21. SSIS11 to DB2
      22. SSIS11 to IBMDB2
      23. SSIS11 to MSSQL
      24. DB2 to MSSQL
      25. SSIS to Jet
      26. DB2 to MSSQL11
      27. SSIS to Oracle
      28. DB2 to SSIS11
      29. IBMDB2 to MSSQL
      30. IBMDB2 to MSSQL11
      31. IBMDB2 to SSIS11
      32. Jet to MSSQL8
      33. Jet to SSIS
      34. ACE to SSIS
      35. Excel to SQL Server and SSIS Data Mapping
      36. Access to SQL Server and SSIS Data Mapping
      37. Oracle to SQL Server and SSIS Data Mapping
      38. Oracle to SQL Server Replication Data Type Mapping
      39. MySQL Data Types
      40. Sybase to SQL Server Data Type Conversion
    25. Appendix B: Sample Databases and Scripts
      1. Sample Databases and Files
      2. Directory Structure for the Sample Files
    26. Index