The data normalization pattern

The data normalization design pattern discusses ways to perform normalization or standardization of data values.

Background

Normalization of data implies fitting, adjusting, or scaling the data values measured on different scales to a notionally common range. As a simplistic example, joining datasets that consist of different units for distance measurement, such as kilometers and miles, can provide varying results when they are not normalized. Hence, they are normalized to bring them back to one common unit, such as kilometers or miles, so that the effect of different units of measurement is not felt by the analytics.

Motivation

In Big Data, it is common to encounter varying values in the same data attribute while integrating ...

Get Pig Design Patterns now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.