Erasing

As the name suggests, this causes data loss when applied to the input data. Depending on the significance of the data we are dealing with, we need to apply this technique. Typical examples of this technique is to set a NULL value for all the records in a column. Since this null data cannot be used to infer anything that is meaningful, this technique helps in making sure that confidential data is not sent to the other phases of data processing.

Let's take few examples of erasing:

Input Data

Output Data

What's erased

NULL earns 1000 INR per month

Ravi earns NULL per month

Salary and name

NULL mobile number is 0123456789

Ravi's mobile number is NULL

Mobile number and name

 

From the examples, you might be ...

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.