book

Hadoop Security

by Ben Spivey, Joey Echeverria

June 2015

Intermediate to advanced

340 pages

8h 43m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
AudienceConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgmentsFrom JoeyFrom BenFrom EddieDisclaimer
1. Introduction
Security OverviewConfidentialityIntegrityAvailabilityAuthentication, Authorization, and AccountingHadoop Security: A Brief HistoryHadoop Components and EcosystemApache HDFSApache YARNApache MapReduceApache HiveCloudera ImpalaApache Sentry (Incubating)Apache HBaseApache AccumuloApache SolrApache OozieApache ZooKeeperApache FlumeApache SqoopCloudera HueSummary
I. Security Architecture
2. Securing Distributed Systems
Threat CategoriesUnauthorized Access/MasqueradeInsider ThreatDenial of ServiceThreats to DataThreat and Risk AssessmentUser AssessmentEnvironment AssessmentVulnerabilitiesDefense in DepthSummary
3. System Architecture
Operating EnvironmentNetwork SecurityNetwork SegmentationNetwork FirewallsIntrusion Detection and PreventionHadoop Roles and Separation StrategiesMaster NodesWorker NodesManagement NodesEdge NodesOperating System SecurityRemote Access ControlsHost FirewallsSELinuxSummary
4. Kerberos
Why Kerberos?Kerberos OverviewKerberos Workflow: A Simple ExampleKerberos TrustsMIT KerberosServer ConfigurationClient ConfigurationSummary
II. Authentication, Authorization, and Accounting
5. Identity and Authentication
IdentityMapping Kerberos Principals to UsernamesHadoop User to Group MappingProvisioning of Hadoop UsersAuthenticationKerberosUsername and Password AuthenticationTokensImpersonationConfigurationSummary
6. Authorization
HDFS AuthorizationHDFS Extended ACLsService-Level AuthorizationMapReduce and YARN AuthorizationMapReduce (MR1)YARN (MR2)ZooKeeper ACLsOozie AuthorizationHBase and Accumulo AuthorizationSystem, Namespace, and Table-Level AuthorizationColumn- and Cell-Level AuthorizationSummary

7. Apache Sentry (Incubating)
Sentry ConceptsThe Sentry ServiceSentry Service ConfigurationHive AuthorizationHive Sentry ConfigurationImpala AuthorizationImpala Sentry ConfigurationSolr AuthorizationSolr Sentry ConfigurationSentry Privilege ModelsSQL Privilege ModelSolr Privilege ModelSentry Policy AdministrationSQL CommandsSQL Policy FileSolr Policy FilePolicy File Verification and ValidationMigrating From Policy FilesSummary
8. Accounting
HDFS Audit LogsMapReduce Audit LogsYARN Audit LogsHive Audit LogsCloudera Impala Audit LogsHBase Audit LogsAccumulo Audit LogsSentry Audit LogsLog AggregationSummary
III. Data Security
9. Data Protection
Encryption AlgorithmsEncrypting Data at RestEncryption and Key ManagementHDFS Data-at-Rest EncryptionMapReduce2 Intermediate Data EncryptionImpala Disk Spill EncryptionFull Disk EncryptionFilesystem EncryptionImportant Data Security Consideration for HadoopEncrypting Data in TransitTransport Layer SecurityHadoop Data-in-Transit EncryptionData Destruction and DeletionSummary
10. Securing Data Ingest
Integrity of Ingested DataData Ingest ConfidentialityFlume EncryptionSqoop EncryptionIngest WorkflowsEnterprise ArchitectureSummary
11. Data Extraction and Client Access Security
Hadoop Command-Line InterfaceSecuring ApplicationsHBaseHBase ShellHBase REST GatewayHBase Thrift GatewayAccumuloAccumulo ShellAccumulo Proxy ServerOozieSqoopSQL AccessImpalaHiveWebHDFS/HttpFSSummary
12. Cloudera Hue
Hue HTTPSHue AuthenticationSPNEGO BackendSAML BackendLDAP BackendHue AuthorizationHue SSL Client ConfigurationsSummary
IV. Putting It All Together
13. Case Studies
Case Study: Hadoop Data WarehouseEnvironment SetupUser ExperienceSummaryCase Study: Interactive HBase Web ApplicationDesign and ArchitectureSecurity RequirementsCluster ConfigurationImplementation NotesSummary
Afterword
Unified AuthorizationData GovernanceNative Data ProtectionFinal Thoughts
Index

Content preview from Hadoop Security

Chapter 6. Authorization

In “Authentication”, we saw how the various Hadoop ecosystem projects support strong authentication to ensure that users are who they claim to be. However, authentication is only part of the overall security story—you also need a way to model which actions or data an authenticated user can access. The protection of resources in this manner is called authorization and is probably one of the most complex topics related to Hadoop security. Each service is relatively unique in the services it provides, and thus the authorization model it supports. The sections in this chapter are divided into subsections based on how each service implements authorization.

We start by looking at HDFS and its support for POSIX-style file permissions, as well as its support for service-level authorization to restrict user access to specific HDFS functions. Next, we turn our attention to MapReduce and YARN, which support a similar style of service-level authorization as well as a queue-based model controlling access to system resources. In the case of MapReduce and YARN, authorization is useful for both security and resource management/multitenancy (for more information on resource management, we recommend Hadoop Operations by Eric Sammer [O’Reilly]). Finally, we cover the authorization features of the popular BigTable clones, Apache HBase and Apache Accumulo, including a discussion of the pros and cons of role-based and attribute-based security as well as a discussion of cell-level ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491900970Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Hadoop Security

by Ben Spivey, Joey Echeverria

Chapter 6. Authorization

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.