The constraint validation and cleansing design pattern

The constraint validation and cleansing pattern deals with validating the data against a set of rules and techniques and then cleansing the invalid data.

Background

Constraints tell us about the properties that the data should comply with. They can be applied to the entire database, a table, a column, or an entire schema. These constraints are rules created at design time to prevent the data from getting corrupt and reduce the overhead of processing wrong data; they dictate what values are valid for a data.

Constraints, such as null checks and range checks, can be used to know if the data ingested in Hadoop is valid or not. Often, constraint validation and cleansing on the data in Hadoop can ...

Get Pig Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.