Chapter 22

Rolling Up Data with the Aggregate Transform

Do you have a large amount of data that you want to roll up to a different granularity? The Aggregate Transform enables you to essentially apply a GROUP BY statement on data that are entering it. Aggregate Transforms are one of the more expensive operations you can perform against data, much like a GROUP BY statement is in T-SQL, and they can be very memory intensive.

NOTE The Aggregate Transform is an asynchronous transform and is fully blocking. This means that every row must enter the transform prior to sending the first row out. Because of this, your transform will need as much RAM as the source retrieves. For example, if your Data Flow is reading a 1 GB file, your Aggregate Transform will require at least 1 GB of memory.

Once you drag the transform over, simply check the columns in the Aggregations tab that you want to aggregate or sum. The Operation drop-down box enables you to select what type of aggregation function you want to apply to the data. The most important operation is a Group By operation, which enables you to roll the data up to that grain. For example, if you have a dozen sales of three products, and you grouped them by the ProductID, you’d have only three rows come out of the transform. You can see a list of all the operations allowed in the following table.

Data TypeOperations Allowed
StringGroup by, Count, Count distinct
NumericGroup by, Count, Count distinct, Minimum, Maximum
DateGroup by, ...

Get Knight's Microsoft SQL Server 2012 Integration Services 24-Hour Trainer now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.