The aggregation pattern

The aggregation design pattern explores the usage of Pig to transform data by applying summarization or aggregation operations on data.

Background

Aggregation provides a summarized high-level view of the data. Aggregation combines more than one attribute into a single attribute, thus reducing the total records by treating a set of records as a single record or by paying no attention to subsections of unimportant records. Data aggregation can be performed at different levels of granularity.

Data aggregation retains the integrity of the data, though the volume of the resulting dataset is smaller than the original datasets.

Motivation

Data aggregation plays a key role in Big Data, as it is inherently difficult for huge volumes ...

Get Pig Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.