Chapter 24

Joining Data with the Lookup Transform

Are you looking for a way to join data from a new source into your Data Flow pipeline? The Lookup Transform in SQL Server Integration Services (SSIS) enables you to perform the equivalent of an inner and outer hash join. The only difference is that the operations occur outside the realm of the database engine.

This transform is used in many different situations, but would typically be found in an ETL process that populates a data warehouse. For example, you may want to populate a table by joining data from two separate source systems on different database platforms. The component can join only two data sets at a time, so to join three or more data sets you would need to string multiple Lookup Transforms together.

The Lookup Transform is a synchronous transform; therefore, it does not block the pipeline’s flow of data. As new data enters the transform, rows that have been joined leave through one of the possible outputs. The caveat to this is that in certain caching modes, the component will initially block the package’s execution for a period of time while it charges its internal caches.

Sometimes rows will not join successfully. For example, you may have a product that has no purchase history and its identifier in the product table would have no matches in the sales table. SSIS supports this by having multiple outputs on the Lookup Transform; in the simplest (default/legacy) configuration, you would have one output for matched ...

Get Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.