book

Hadoop Security

by Ben Spivey, Joey Echeverria

June 2015

Intermediate to advanced

340 pages

8h 43m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
AudienceConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgmentsFrom JoeyFrom BenFrom EddieDisclaimer
1. Introduction
Security OverviewConfidentialityIntegrityAvailabilityAuthentication, Authorization, and AccountingHadoop Security: A Brief HistoryHadoop Components and EcosystemApache HDFSApache YARNApache MapReduceApache HiveCloudera ImpalaApache Sentry (Incubating)Apache HBaseApache AccumuloApache SolrApache OozieApache ZooKeeperApache FlumeApache SqoopCloudera HueSummary
I. Security Architecture
2. Securing Distributed Systems
Threat CategoriesUnauthorized Access/MasqueradeInsider ThreatDenial of ServiceThreats to DataThreat and Risk AssessmentUser AssessmentEnvironment AssessmentVulnerabilitiesDefense in DepthSummary
3. System Architecture
Operating EnvironmentNetwork SecurityNetwork SegmentationNetwork FirewallsIntrusion Detection and PreventionHadoop Roles and Separation StrategiesMaster NodesWorker NodesManagement NodesEdge NodesOperating System SecurityRemote Access ControlsHost FirewallsSELinuxSummary
4. Kerberos
Why Kerberos?Kerberos OverviewKerberos Workflow: A Simple ExampleKerberos TrustsMIT KerberosServer ConfigurationClient ConfigurationSummary
II. Authentication, Authorization, and Accounting
5. Identity and Authentication
IdentityMapping Kerberos Principals to UsernamesHadoop User to Group MappingProvisioning of Hadoop UsersAuthenticationKerberosUsername and Password AuthenticationTokensImpersonationConfigurationSummary
6. Authorization
HDFS AuthorizationHDFS Extended ACLsService-Level AuthorizationMapReduce and YARN AuthorizationMapReduce (MR1)YARN (MR2)ZooKeeper ACLsOozie AuthorizationHBase and Accumulo AuthorizationSystem, Namespace, and Table-Level AuthorizationColumn- and Cell-Level AuthorizationSummary

7. Apache Sentry (Incubating)
Sentry ConceptsThe Sentry ServiceSentry Service ConfigurationHive AuthorizationHive Sentry ConfigurationImpala AuthorizationImpala Sentry ConfigurationSolr AuthorizationSolr Sentry ConfigurationSentry Privilege ModelsSQL Privilege ModelSolr Privilege ModelSentry Policy AdministrationSQL CommandsSQL Policy FileSolr Policy FilePolicy File Verification and ValidationMigrating From Policy FilesSummary
8. Accounting
HDFS Audit LogsMapReduce Audit LogsYARN Audit LogsHive Audit LogsCloudera Impala Audit LogsHBase Audit LogsAccumulo Audit LogsSentry Audit LogsLog AggregationSummary
III. Data Security
9. Data Protection
Encryption AlgorithmsEncrypting Data at RestEncryption and Key ManagementHDFS Data-at-Rest EncryptionMapReduce2 Intermediate Data EncryptionImpala Disk Spill EncryptionFull Disk EncryptionFilesystem EncryptionImportant Data Security Consideration for HadoopEncrypting Data in TransitTransport Layer SecurityHadoop Data-in-Transit EncryptionData Destruction and DeletionSummary
10. Securing Data Ingest
Integrity of Ingested DataData Ingest ConfidentialityFlume EncryptionSqoop EncryptionIngest WorkflowsEnterprise ArchitectureSummary
11. Data Extraction and Client Access Security
Hadoop Command-Line InterfaceSecuring ApplicationsHBaseHBase ShellHBase REST GatewayHBase Thrift GatewayAccumuloAccumulo ShellAccumulo Proxy ServerOozieSqoopSQL AccessImpalaHiveWebHDFS/HttpFSSummary
12. Cloudera Hue
Hue HTTPSHue AuthenticationSPNEGO BackendSAML BackendLDAP BackendHue AuthorizationHue SSL Client ConfigurationsSummary
IV. Putting It All Together
13. Case Studies
Case Study: Hadoop Data WarehouseEnvironment SetupUser ExperienceSummaryCase Study: Interactive HBase Web ApplicationDesign and ArchitectureSecurity RequirementsCluster ConfigurationImplementation NotesSummary
Afterword
Unified AuthorizationData GovernanceNative Data ProtectionFinal Thoughts
Index

Content preview from Hadoop Security

Chapter 4. Kerberos

Kerberos often intimidates even experienced system administrators and developers at the first mention of it. Applications and systems that rely on Kerberos often have many support calls and trouble tickets filed to fix problems related to it. This chapter will introduce the basic Kerberos concepts that are necessary to understand how strong authentication works, and explain how it plays an important role with Hadoop authentication in Chapter 5.

So what exactly is Kerberos? From a mythological point of view, Kerberos is the Greek word for Cerberus, a multiheaded dog that guards the entrance to Hades to ensure that nobody who enters will ever leave. Kerberos from a technical (and more pleasant) point of view is the term given to an authentication mechanism developed at Massachusetts Institute of Technology (MIT). Kerberos evolved to become the de facto standard for strong authentication for computer systems large and small, with varying implementations ranging from MIT’s Kerberos distribution to the authentication component of Microsoft’s Active Directory.

Why Kerberos?

Playing devil’s advocate here (pun intended), why does Hadoop need Kerberos at all? The reason becomes apparent when looking at the default model for Hadoop authentication. When presented with a username, Hadoop happily believes whatever you tell it, and ensures that every machine in the entire cluster believes it, too.

To use an analogy, if a person at a party approached you and introduced himself ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491900970Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Hadoop Security

by Ben Spivey, Joey Echeverria

Chapter 4. Kerberos

Why Kerberos?

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.