Chapter 9. Data Protection

So far, we have covered how Hadoop can be configured to enforce standard AAA controls. In this chapter, we will understand how these controls, along with the CIA principles discussed in Chapter 1, provide the foundation for protecting data. Data protection is a broad concept that involves topics ranging from data privacy to acceptable use. One of the topics we will specifically focus on is encryption.

Encryption is a common method to protect data. There are two primary flavors of data encryption: data-at-rest encryption and data-in-transit encryption, also referred to as over-the-wire encryption. Data at rest refers to data that is stored even after machines are powered off. This includes data on hard drives, flash drives, USB sticks, memory cards, CDs, DVDs, or even some old floppy drives or tapes in storage boxes. Data in transit, as its name implies, is data on the move, such as data traveling on the Internet, a USB cable, a coffee shop WiFi, cell phone towers, or from a remote space station to Earth.

Encryption Algorithms

Before diving into the two flavors of data encryption, we’ll briefly discuss encryption algorithms. Encryption algorithms define the mathematical technique used to encrypt data. A common encryption algorithm is the Advanced Encryption Standard, or AES. It is a specification established by the U.S. National Institute of Standards and Technology (NIST) in FIPS-197.

Describing how AES encryption works is beyond ...

Get Hadoop Security now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.