Chapter 18. Rolling Up Data with The Aggregate Transform
Do you have a large amount of data that you want to roll up to a different granularity? The Aggregate Transform allows you to essentially apply a GROUP BY statement on data entering it. Aggregate Transforms are one of the more expensive operations you can perform against data, much like a GROUP BY statement is in T-SQL, and they can be very memory intensive.
Note
The Aggregate Transform is an asynchronous transform and is fully blocking. This means that every row must enter the transform prior to sending the first row out. Because of this, your transform will need as much RAM as the source retrieves. For example, if your Data Flow is reading a 1GB file, your Aggregate Transform will require at least 1GB of memory.
Once you drag the transform over, simply check the columns in the Aggregations tab that you want to aggregate or sum. The Operation drop-down box allows you to select what type of aggregation function you want to apply to the data. The most important operation is a Group By operation, which allows you to roll the data up to that grain. For example, if you have a dozen sales of three products, and you grouped by the ProductID, you'd have only three rows come out of the transform. You can see a list of all the operations allowed in the following table.
Data Type | Operations Allowed |
---|---|
String | Group by, Count, Count distinct |
Numeric | Group by, Count, Count distinct, Minimum, Maximum |
Date | Group by, Count, Count distinct, Minimum, Maximum, ... |
Get Knight's 24-Hour Trainer: Microsoft® SQL Server® 2008 Integration Services now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.