The basic statistical profiling pattern

This section describes the basic statistical profiling design pattern in which we use Pig scripts to apply statistical functions to capture important information about data quality.


The previous design pattern depicts one way of inferring the data type. The next logical step in the data profiling process is to evaluate the quality metrics of the values. This is done by collecting and analyzing the data by applying statistical methods. These statistics provide a high-level overview of the suitability of the data for a particular analytical problem, and uncover potential problems early in the data lifecycle management.


The basic statistical profiling design pattern helps to create data quality ...

Get Pig Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.