Shuffling

This is also considered one of the standard techniques of achieving anonymity of data. This process is more applicable where we have records of data with several attributes (columns in database terminology). In this technique, the data in the records is shuffled around a column so as to make sure that the record-level information is changed. But statistically, the data value remains the same in that column.

Example: When doing an analysis on the salary ranges of an organization, we can actually do a shuffle of the entire salary column, where the salaries of all the employees never match reality. But we can use this data to do an analysis on the ranges.

Complex methods can also be employed in this case, where we can do a shuffle ...

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.