Chapter 5. Identity and Authentication

The first step necessary for any system securing data is to provide each user with a unique identity and to authenticate a user’s claim of a particular identity. The reason authentication and identity are so essential is that no authorization scheme can control access to data if the scheme can’t trust that users are who they claim to be.

In this chapter, we’ll take a detailed look at how authentication and identity are managed for core Hadoop services. We start by looking at identity and how Hadoop integrates information from Kerberos KDCs and from LDAP and Active Directory domains to provide an integrated view of distributed identity. We’ll also look at how Hadoop represents users internally and the options for mapping external, global identities to those internal representations. Next, we revisit Kerberos and go into more details of how Hadoop uses Kerberos for strong authentication. From there, we’ll take a look at how some core components use username/password–based authentication schemes and the role of distributed authentication tokens in the overall architecture. We finish the chapter with a discussion of user impersonation and a deep dive into the configuration of Hadoop authentication.

Identity

In the context of the Hadoop ecosystem, identity is a relatively complex topic. This is due to the fact that Hadoop goes to great lengths to be loosely coupled from authoritative identity sources. In Chapter 4, we introduced the Kerberos ...

Get Hadoop Security now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.