Chapter 7

Security, Compliance, Auditing, and Protection

The sheer size of a Big Data repository brings with it a major security challenge, generating the age-old question presented to IT: How can the data be protected? However, that is a trick question—the answer has many caveats, which dictate how security must be imagined as well as deployed. Proper security entails more than just keeping the bad guys out; it also means backing up data and protecting data from corruption.

The first caveat is access. Data can be easily protected, but only if you eliminate access to the data. That’s not a pragmatic solution, to say the least. The key is to control access, but even then, knowing the who, what, when, and where of data access is only a start.

The second caveat is availability: controlling where the data are stored and how the data are distributed. The more control you have, the better you are positioned to protect the data.

The third caveat is performance. Higher levels of encryption, complex security methodologies, and additional security layers can all improve security. However, these security techniques all carry a processing burden that can severely affect performance.

The fourth caveat is liability. Accessible data carry with them liability, such as the sensitivity of the data, the legal requirements connected to the data, privacy issues, and intellectual property concerns.

Adequate security in the Big Data realm becomes a strategic balancing act among these caveats along with ...

Get Big Data Analytics: Turning Big Data into Big Money now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.