Chapter 10. Integration with Identity Management Providers

In Chapter 9, we covered how cluster services provide authentication, authorization, and confidentiality. These security mechanisms rely heavily on a common understanding between clients, services, and operating systems of which users and groups exist. Cluster architects need to be familiar with how cluster services use identity services for authentication and authorization and what providers are available, in order to decide how best to configure the clusters within the enterprise context. In this chapter, we examine these interactions and outline some common integration architectures.

Integration Areas

We need identity management providers in the following areas:


As we have seen, integration with a KDC is essential to secure authentication in most Hadoop services. Every user wishing to use the cluster must have a principal in one of the trusted realms, and ideally this principal maps to an existing enterprise user account with the same password. Each server in the cluster must be configured to allow users and servers to authenticate to a KDC.

User accounts and groups

Cluster services will use users and groups when making authentication and authorization decisions and for execution. For example, YARN requires that users exist on every node, to ensure security isolation between running jobs. We therefore need a way of resolving enterprise user accounts on each cluster node, and furthermore these need to correspond ...

Get Architecting Modern Data Platforms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.