The data type inference pattern

This section describes the data type inference design pattern in which we use Pig scripts to capture important information about data types.

Background

Most of the ingested data in Hadoop has some associated metadata, which is a description of its characteristics. This metadata includes important information on the types of fields, their length, constraints, and uniqueness. We can also know if a field is mandatory. This metadata is also used in interpretation of the values by examining the scale, units of measurement, meaning of labels, and so on. Understanding the intended structure of a dataset helps in expounding its meaning, description, semantics, and the data quality. This analysis of data types helps us to ...

Get Pig Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.