Chapter 5. Identity and Authentication
The first step necessary for any system securing data is to provide each user with a unique identity and to authenticate a userâs claim of a particular identity. The reason authentication and identity are so essential is that no authorization scheme can control access to data if the scheme canât trust that users are who they claim to be.
In this chapter, weâll take a detailed look at how authentication and identity are managed for core Hadoop services. We start by looking at identity and how Hadoop integrates information from Kerberos KDCs and from LDAP and Active Directory domains to provide an integrated view of distributed identity. Weâll also look at how Hadoop represents users internally and the options for mapping external, global identities to those internal representations. Next, we revisit Kerberos and go into more details of how Hadoop uses Kerberos for strong authentication. From there, weâll take a look at how some core components use username/passwordâbased authentication schemes and the role of distributed authentication tokens in the overall architecture. We finish the chapter with a discussion of user impersonation and a deep dive into the configuration of Hadoop authentication.
Identity
In the context of the Hadoop ecosystem, identity is a relatively complex topic. This is due to the fact that Hadoop goes to great lengths to be loosely coupled from authoritative identity sources. In Chapter 4, we introduced ...
Get Hadoop Security now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.