book

Hadoop Security

by Ben Spivey, Joey Echeverria

June 2015

Intermediate to advanced

340 pages

8h 43m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
AudienceConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgmentsFrom JoeyFrom BenFrom EddieDisclaimer
1. Introduction
Security OverviewConfidentialityIntegrityAvailabilityAuthentication, Authorization, and AccountingHadoop Security: A Brief HistoryHadoop Components and EcosystemApache HDFSApache YARNApache MapReduceApache HiveCloudera ImpalaApache Sentry (Incubating)Apache HBaseApache AccumuloApache SolrApache OozieApache ZooKeeperApache FlumeApache SqoopCloudera HueSummary
I. Security Architecture
2. Securing Distributed Systems
Threat CategoriesUnauthorized Access/MasqueradeInsider ThreatDenial of ServiceThreats to DataThreat and Risk AssessmentUser AssessmentEnvironment AssessmentVulnerabilitiesDefense in DepthSummary
3. System Architecture
Operating EnvironmentNetwork SecurityNetwork SegmentationNetwork FirewallsIntrusion Detection and PreventionHadoop Roles and Separation StrategiesMaster NodesWorker NodesManagement NodesEdge NodesOperating System SecurityRemote Access ControlsHost FirewallsSELinuxSummary
4. Kerberos
Why Kerberos?Kerberos OverviewKerberos Workflow: A Simple ExampleKerberos TrustsMIT KerberosServer ConfigurationClient ConfigurationSummary
II. Authentication, Authorization, and Accounting
5. Identity and Authentication
IdentityMapping Kerberos Principals to UsernamesHadoop User to Group MappingProvisioning of Hadoop UsersAuthenticationKerberosUsername and Password AuthenticationTokensImpersonationConfigurationSummary
6. Authorization
HDFS AuthorizationHDFS Extended ACLsService-Level AuthorizationMapReduce and YARN AuthorizationMapReduce (MR1)YARN (MR2)ZooKeeper ACLsOozie AuthorizationHBase and Accumulo AuthorizationSystem, Namespace, and Table-Level AuthorizationColumn- and Cell-Level AuthorizationSummary

7. Apache Sentry (Incubating)
Sentry ConceptsThe Sentry ServiceSentry Service ConfigurationHive AuthorizationHive Sentry ConfigurationImpala AuthorizationImpala Sentry ConfigurationSolr AuthorizationSolr Sentry ConfigurationSentry Privilege ModelsSQL Privilege ModelSolr Privilege ModelSentry Policy AdministrationSQL CommandsSQL Policy FileSolr Policy FilePolicy File Verification and ValidationMigrating From Policy FilesSummary
8. Accounting
HDFS Audit LogsMapReduce Audit LogsYARN Audit LogsHive Audit LogsCloudera Impala Audit LogsHBase Audit LogsAccumulo Audit LogsSentry Audit LogsLog AggregationSummary
III. Data Security
9. Data Protection
Encryption AlgorithmsEncrypting Data at RestEncryption and Key ManagementHDFS Data-at-Rest EncryptionMapReduce2 Intermediate Data EncryptionImpala Disk Spill EncryptionFull Disk EncryptionFilesystem EncryptionImportant Data Security Consideration for HadoopEncrypting Data in TransitTransport Layer SecurityHadoop Data-in-Transit EncryptionData Destruction and DeletionSummary
10. Securing Data Ingest
Integrity of Ingested DataData Ingest ConfidentialityFlume EncryptionSqoop EncryptionIngest WorkflowsEnterprise ArchitectureSummary
11. Data Extraction and Client Access Security
Hadoop Command-Line InterfaceSecuring ApplicationsHBaseHBase ShellHBase REST GatewayHBase Thrift GatewayAccumuloAccumulo ShellAccumulo Proxy ServerOozieSqoopSQL AccessImpalaHiveWebHDFS/HttpFSSummary
12. Cloudera Hue
Hue HTTPSHue AuthenticationSPNEGO BackendSAML BackendLDAP BackendHue AuthorizationHue SSL Client ConfigurationsSummary
IV. Putting It All Together
13. Case Studies
Case Study: Hadoop Data WarehouseEnvironment SetupUser ExperienceSummaryCase Study: Interactive HBase Web ApplicationDesign and ArchitectureSecurity RequirementsCluster ConfigurationImplementation NotesSummary
Afterword
Unified AuthorizationData GovernanceNative Data ProtectionFinal Thoughts
Index

Content preview from Hadoop Security

Chapter 8. Accounting

So far in this part of the book, we’ve described how to properly identify and authenticate users and services, as well as how authorization controls limit what users and services can do in the cluster. While all of these various controls do a good job defining and enforcing a security model for a Hadoop cluster, they do not complete a fundamental component of a security model: accounting. Also referred to as auditing, accounting is the mechanism to keep track of what users and services are doing in the cluster. This is a critical piece of the security puzzle because without it, breaches in security can occur without anybody noticing. Accounting rounds out a security model by providing a record of what happened, which can be used for:

Active auditing: This type of auditing is used in conjunction with some kind of alerting mechanism. For example, if a user tries to access a resource on the cluster and is denied, active auditing could generate an email to security administrators alerting them of this event.
Passive auditing: This refers to auditing that does not generate some kind of alert. Passive auditing is often a bare-minimum requirement in a business so that designated auditors and security administrators can query audit events to look for certain events. For example, if there is a breach in security to the cluster, a security administrator can query the audit logs to find the data that was accessed during the breach.
Security compliance: A business ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491900970Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Hadoop Security

by Ben Spivey, Joey Echeverria

Chapter 8. Accounting

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.