Arguably the most obvious and well known method of protecting data is encryption. We would use this whether our data is in transit or at rest, so, virtually all of the time, apart from when the data is actually being processed inside memory. The mechanics of encryption are different depending upon the state of the data.

Data at rest

Our data will always need to be stored somewhere, whether it be HDFS, S3, or local disk. If we have taken all of the precautions of ensuring that users are authorized and authenticated, there is still the issue of plain text actually existing on the disk. With direct access to the disk, either physically or by accessing it through a lower level in the OSI stack, it is fairly trivial to stream the entire contents ...

Get Mastering Spark for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.