O'Reilly logo

Hadoop Security by Joey Echeverria, Ben Spivey

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 4. Kerberos

Kerberos often intimidates even experienced system administrators and developers at the first mention of it. Applications and systems that rely on Kerberos often have many support calls and trouble tickets filed to fix problems related to it. This chapter will introduce the basic Kerberos concepts that are necessary to understand how strong authentication works, and explain how it plays an important role with Hadoop authentication in Chapter 5.

So what exactly is Kerberos? From a mythological point of view, Kerberos is the Greek word for Cerberus, a multiheaded dog that guards the entrance to Hades to ensure that nobody who enters will ever leave. Kerberos from a technical (and more pleasant) point of view is the term given to an authentication mechanism developed at Massachusetts Institute of Technology (MIT). Kerberos evolved to become the de facto standard for strong authentication for computer systems large and small, with varying implementations ranging from MIT’s Kerberos distribution to the authentication component of Microsoft’s Active Directory.

Why Kerberos?

Playing devil’s advocate here (pun intended), why does Hadoop need Kerberos at all? The reason becomes apparent when looking at the default model for Hadoop authentication. When presented with a username, Hadoop happily believes whatever you tell it, and ensures that every machine in the entire cluster believes it, too.

To use an analogy, if a person at a party approached you and introduced himself as “Bill,” you naturally would believe that he is, in fact, Bill. How do you know that he really is Bill? Well, because he said so and you believed him without question. Hadoop without Kerberos behaves in much the same way, except that, to take the analogy a step further, Hadoop not only believes “Bill” is who he says he is but makes sure that everyone else believes it, too. This is a problem.

Hadoop by design is meant to store and process petabytes of data. As the old adage goes, with great power comes great responsibility. Hadoop in the enterprise can no longer get by with simplistic means for identifying (and trusting) users. Enter Kerberos. In the previous analogy, “Bill” introduces himself to you. Upon doing so, what if you responded by asking to see a valid passport and upon receiving it (naturally, because everyone brings a passport to a party…), checked the passport against a database to verify validity? This is the type of identify verification that Hadoop introduced by adding Kerberos authentication.

Kerberos Overview

The stage is now set and it is time to dig in and understand just how Kerberos works. Kerberos implementation is, as you might imagine, a client/server architecture. Before breaking down the components in detail, a bit of Kerberos terminology is needed.

First, identities in Kerberos are called principals. Every user and service that participates in the Kerberos authentication protocol requires a principal to uniquely identify itself. Principals are classified into two categories: user principals and service principals. User principal names, or UPNs, represent regular users. This closely resembles usernames or accounts in the operating system world. Service principal names, or SPNs, represent services that a user needs to access, such as a database on a specific server. The relationship between UPNs and SPNs will become more apparent when we work through an example later.

The next important Kerberos term is realm. A Kerberos realm is an authentication administrative domain. All principals are assigned to a specific Kerberos realm. A realm establishes a boundary, which makes administration easier.

Now that we have established what principals and realms are, the natural next step is to understand what stores and controls all of this information. The answer is a key distribution center (KDC). The KDC is comprised of three components: the Kerberos database, the authentication service (AS), and the ticket-granting service (TGS). The Kerberos database stores all the information about the principals and the realm they belong to, among other things. Kerberos principals in the database are identified with a naming convention that looks like the following:


A UPN that uniquely identifies the user (also called the short name): alice in the Kerberos realm EXAMPLE.COM. By convention, the realm name is always uppercase.


A variation of a regular UPN in that it identifies an administrator bob for the realm EXAMPLE.COM. The slash (/) in a UPN separates the short name and the admin distinction. The admin component convention is regularly used, but it is configurable as we will see later.


This principal represents an SPN for the hdfs service, on the host node1.example.com, in the Kerberos realm EXAMPLE.COM. The slash (/) in an SPN separates the short name hdfs and the hostname node1.example.com.


The entire principal name is case sensitive! For instance, hdfs/Node1.Hadoop.com@EXAMPLE.COM is a different principal than the one in the third example. Typically, it is best practice to use all lowercase for the principal, except for the realm component, which is uppercase. The caveat here is, of course, that the underlying hostnames referred to in SPNs are also lowercase, which is also a best practice for host naming and DNS.

The second component of the KDC, the AS, is responsible for issuing a ticket-granting ticket (TGT) to a client when they initiate a request to the AS. The TGT is used to request access to other services.

The third component of the KDC, the TGS, is responsible for validating TGTs and granting service tickets. Service tickets allow an authenticated principal to use the service provided by the application server, identified by the SPN. The process flow of obtaining a TGT, presenting it to the TGS, and obtaining a service ticket is explained in the next section. For now, understand that the KDC has two components, the AS and TGS, which handle requests for authentication and access to services.


There is a special principal of the form krbtgt/<REALM>@<REALM> within the Kerberos database, such as krbtgt/EXAMPLE.COM@EXAMPLE.COM. This principal is used internally by both the AS and the TGS. The key for this principal is actually used to encrypt the content of the TGT that is issued to clients, thus ensuring that the TGT issued by the AS can only be validated by the TGS.

Table 4-1 provides a summary of the Kerberos terms and abbreviations introduced in this chapter.

Table 4-1. Kerberos term abbreviations
Term Name Description


User principal name

A principal that identifies a user in a given realm, with the format <shortname><@REALM> or <shortname>/admin@<REALM>


Service principal name

A principal that identifies a service on a specific host in a given realm, with the format <shortname>/<hostname>@<REALM>


Ticket-granting ticket

A special ticket type granted to a user after successfully authenticating to the AS


Key distribution center

A Kerberos server that contains three components: Kerberos database, AS, and TGS


Authentication service

A KDC service that issues TGTs


Ticket-granting service

A KDC service that validates TGTs and grants service tickets

What has been presented thus far are a few of the basic Kerberos components needed to understand authentication at a high level. Kerberos in its own right is a very in-depth and complex topic that warrants an entire book on the subject. Thankfully, that has already been done. If you wish to dive far deeper than what is presented here, take a look at Jason Garman’s excellent book, Kerberos: The Definitive Guide (O’Reilly).

Kerberos Workflow: A Simple Example

Now that the terminology and components have been introduced, we can now work through an example workflow showing how it all works at a high level. First, we will identify all of the components in play:


The Kerberos realm


A user of the system, identified by the UPN alice@EXAMPLE.COM


A service that will be hosted on server1.example.com, identified by the SPN myservice/server1.example.com@EXAMPLE.COM


The KDC for the Kerberos realm EXAMPLE.COM

In order for Alice to use myservice, she needs to present a valid service ticket to myservice. The following list of steps shows how she does this (some details omitted for brevity):

  1. Alice needs to obtain a TGT. To do this, she initiates a request to the AS at kdc.example.com, identifying herself as the principal alice@EXAMPLE.COM.

  2. The AS responds by providing a TGT that is encrypted using the key (password) for the principal alice@EXAMPLE.COM.

  3. Upon receipt of the encrypted message, Alice is prompted to enter the correct password for the principal alice@EXAMPLE.COM in order to decrypt the message.

  4. After successfully decrypting the message containing the TGT, Alice now requests a service ticket from the TGS at kdc.example.com for the service identified by myservice/server1.example.com@EXAMPLE.COM, presenting the TGT along with the request.

  5. The TGS validates the TGT and provides Alice a service ticket, encrypted with the myservice/server1.example.com@EXAMPLE.COM principal’s key.

  6. Alice now presents the service ticket to myservice, which can then decrypt it using the myservice/server1.example.com@EXAMPLE.COM key and validate the ticket.

  7. The service myservice permits Alice to use the service because she has been properly authenticated.

This shows how Kerberos works at a high level. Obviously this is a greatly simplified example and many of the underlying details have not been presented. See Figure 4-1 for a sequence diagram of this example.

hdsc 0401
Figure 4-1. Kerberos workflow example

Kerberos Trusts

So far, Kerberos has been introduced under the implicit expectation that all users and services are contained within a single Kerberos realm. While this works well for introductory material, it is often not realistic given how large enterprises work. Over time, large enterprises end up with multiple Kerberos realms from things like mergers, acquisitions, or just simply wanting to segregate different parts of the enterprise. However, by default, a KDC only knows about its own realm and the principals in its own database. What if a user from one realm wants to use a service that is controlled by another realm? In order to make this happen, a Kerberos trust is needed between the two realms.

For example, suppose that Example is a very large corporation and has decided to create multiple realms to identify different lines of business, including HR.EXAMPLE.COM and MARKETING.EXAMPLE.COM. Because users in both realms might need to access services from both realms, the KDC for HR.EXAMPLE.COM needs to trust information from the MARKETING.EXAMPLE.COM realm and vice versa.

On the surface this seems pretty straightforward, except that there are actually two different types of trusts: one-way trust and two-way trust (sometimes called bidirectional trust or full trust). The example we just looked at represents a two-way trust.

What if there is also a DEV.EXAMPLE.COM realm where developers have principals that need to access the DEV.EXAMPLE.COM and MARKETING.EXAMPLE.COM realms, but marketing users should not be able to access the DEV.EXAMPLE.COM realm? This scenario requires a one-way trust. A one-way trust is very common in Hadoop deployments when a KDC is installed and configured to contain all the information about the SPNs for the cluster nodes, but all UPNs for end users exist in a different realm, such as Active Directory. Oftentimes, Active Directory administrators or corporate policies prohibit full trusts for a variety of reasons.

So how does a Kerberos trust actually get established? Earlier in the chapter it was noted that a special principal is used internally by the AS and TGS, and it is of the form krbtgt/<REALM>@<REALM>. This principal becomes increasingly important for establishing trusts. With trusts, the principal instead takes the form of krbtgt/<TRUSTING_REALM>@<TRUSTED_REALM>. A key concept of this principal is that it exists in both realms. For example, if the HR.EXAMPLE.COM realm needs to trust the MARKETING.EXAMPLE.COM realm, the principal krbtgt/HR.EXAMPLE.COM@MARKETING.EXAMPLE.COM needs to exist in both realms.


The password for the krbtgt/<TRUSTING_REALM>@<TRUSTED_REALM> principal and the encryption types used must be the same in both realms in order for the trust to be established.

The previous example shows what is required for a one-way trust. In order to establish a full trust, the principal krbtgt/MARKETING.EXAMPLE.COM@HR.EXAMPLE.COM also needs to exist in both realms. To summarize, for the HR.EXAMPLE.COM realm to have a full trust with the MARKETING.EXAMPLE.COM realm, both realms need the principals krbtgt/MARKETING.EXAMPLE.COM@HR.EXAMPLE.COM and krbtgt/HR.EXAMPLE.COM@MARKETING.EXAMPLE.COM.

MIT Kerberos

As mentioned in the beginning of this chapter, Kerberos was first created at MIT. Over the years, it has undergone several revisions and the current version is MIT Kerberos V5, or krb5 as it is often called. This section covers some of the components of the MIT Kerberos distribution to put some real examples into play with the conceptual examples introduced thus far.


For the most up-to-date definitive resource on the MIT Kerberos distribution, consult the excellent documentation at the official project website.

In the earlier example, we glossed over the fact that Alice initiated an authentication request. In practice, Alice does this by using the kinit tool (Example 4-1).

Example 4-1. kinit using the default user
[alice@server1 ~]$ kinit
Enter password for alice@EXAMPLE.COM:
[alice@server1 ~]$

This example pairs the current Linux username alice with the default realm to come up with the suggested principal alice@EXAMPLE.COM. The default realm is explained later when we dive into the configuration files. The kinit tool also allows the user to explicitly identify the principal to authenticate as (Example 4-2).

Example 4-2. kinit using a specified user
[alice@server1 ~]$ kinit alice/admin@EXAMPLE.COM
Enter password for alice/admin@EXAMPLE.COM:
[alice@server1 ~]$

Explicitly providing a principal name is often necessary to authenticate as an administrative user, as the preceding example depicts. Another option for authentication is by using a keytab file. A keytab file stores the actual encryption key that can be used in lieu of a password challenge for a given principal. Creating keytab files are useful for noninteractive principals, such as SPNs, which are often associated with long-running processes like Hadoop daemons. A keytab file does not have to be a 1:1 mapping to a single principal. Multiple different principal keys can be stored in a single keytab file. A user can use kinit with a keytab file by specifying the keytab file location, and the principal name to authenticate as (again, because multiple principal keys may exist in the keytab file), shown in Example 4-3.

Example 4-3. kinit using a keytab file
[alice@server1 ~]$ kinit -kt alice.keytab alice/admin@EXAMPLE.COM
[alice@server1 ~]$

The keytab file allows a user to authenticate without knowledge of the password. Because of this fact, keytabs should be protected with appropriate controls to prevent unauthorized users from authenticating with it. This is especially important when keytabs are created for administrative principals!

Another useful utility that is part of the MIT Kerberos distribution is called klist. This utility allows users to see what, if any, Kerberos credentials they have in their credentials cache. The credentials cache is the place on the local filesystem where, upon successful authentication to the AS, TGTs are stored. By default, this location is usually the file /tmp/krb5cc_<uid> where <uid> is the numeric user ID on the local system. After a successful kinit, alice can view her credentials cache with klist, as shown in Example 4-4.

Example 4-4. Viewing the credentials cache with klist
[alice@server1 ~]$ kinit
Enter password for alice@EXAMPLE.COM:
[alice@server1 ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_5000
Default principal: alice@EXAMPLE.COM

Valid starting     Expires            Service principal
02/13/14 12:00:27  02/14/14 12:00:27  krbtgt/EXAMPLE.COM@EXAMPLE.COM
        renew until 02/20/14 12:00:27
[alice@server1 ~]$

If a user tries to look at the credentials cache without having authenticated first, no credentials will be found (see Example 4-5).

Example 4-5. No credentials cache found
[alice@server1 ~]$ klist
No credentials cache found (ticket cache FILE:/tmp/krb5cc_5000
[alice@server1 ~]$

Another useful tool in the MIT Kerberos toolbox is kdestroy. As the name implies, this allows users to destroy credentials in their credentials cache. This is useful for switching users, or when trying out or debugging new configurations (see Example 4-6).

Example 4-6. Destroying the credentials cache with kdestroy
[alice@server1 ~]$ kinit
Enter password for alice@EXAMPLE.COM:
[alice@server1 ~]$ klist
Ticket cache: FILE:/tmp/krb5cc_5000
Default principal: alice@EXAMPLE.COM

Valid starting     Expires            Service principal
02/13/14 12:00:27  02/14/14 12:00:27  krbtgt/EXAMPLE.COM@EXAMPLE.COM
        renew until 02/20/14 12:00:27
[alice@server1 ~]$ kdestroy
[alice@server1 ~]$ klist
No credentials cache found (ticket cache FILE:/tmp/krb5cc_5000
[alice@server1 ~]$

So far, all of the MIT Kerberos examples shown “just work.” Hidden away in these examples is the fact that there is a fair amount of configuration necessary to make it all work, both on the client and server side. The next two sections present basic configurations to tie together some of the concepts that have been presented thus far.

Server Configuration

Kerberos server configuration is primarily specified in the kdc.conf file, which is shown in Example 4-7. This file lives in /var/kerberos/krb5kdc/ on Red Hat/CentOS systems.

Example 4-7. kdc.conf
 kdc_ports = 88
 kdc_tcp_ports = 88

  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  supported_enctypes = aes256-cts:normal aes128-cts:normal arcfour-hmac-md5:normal
  max_renewable_life = 7d

The first section, kdcdefaults, contains configurations that apply to all the realms listed, unless the specific realm configuration has values for the same configuration items. The configurations kdc_ports and kdc_tcp_ports specify the UDP and TCP ports the KDC should listen on, respectively. The next section, realms, contains all of the realms that the KDC is the server for. A single KDC can support multiple realms. The realm configuration items from this example are as follows:


This specifies the file location to be used by the admin server for access controls (more on this later).


This specifies the file that contains words that are not allowed to be used as passwords because they are easily cracked/guessed.


This specifies all of the encryption types supported by the KDC. When interacting with the KDC, clients must support at least one of the encryption types listed here. Be aware of using weak encryption types, such as DES, because they are easily exploitable.


This specifies the maximum amount of time that a ticket can be renewable. Clients can request a renewable lifetime up to this length. A typical value is seven days, denoted by 7d.


By default, encryption settings in MIT Kerberos are often set to a variety of encryption types, including weak choices such as DES. When possible, remove weak encryption types to ensure the best possible security. Weak encryption types are easily exploitable and well documented as such. When using AES-256, Java Cryptographic Extensions need to be installed on all nodes in the cluster to allow for unlimited strength encryption types. It is important to note that some countries prohibit the usage of these encryption types. Always follow the laws governing encryption strength for your country. A more detailed discussion of encryption is provided in Chapter 9.

The acl_file location (typically the file kadm5.acl) is used to control which users have privileged access to administer the Kerberos database. Administration of the Kerberos database is controlled by two different, but related, components: kadmin.local and kadmin. The first is a utility that allows the root user of the KDC server to modify the Kerberos database. As the name implies, it can only be run by the root user on the same machine where the Kerberos database resides. Administrators wishing to administer the Kerberos database remotely must use the kadmin server.

The kadmin server is a daemon process that allows remote connections to administer the Kerberos database. This is where the kadm5.acl file (shown in Example 4-8) comes into play. The kadmin utility uses Kerberos authentication, and the kadm5.acl file specifies which UPNs are allowed to perform privileged functions.

Example 4-8. kadm5.acl
*/admin@EXAMPLE.COM      *
cloudera-scm@EXAMPLE.COM *     hdfs/*@EXAMPLE.COM
cloudera-scm@EXAMPLE.COM *     mapred/*@EXAMPLE.COM

This allows any principal from the EXAMPLE.COM realm with the /admin distinction to perform any administrative action. While it is certainly acceptable to change the admin distinction to some other arbitrary name, it is recommended to follow the convention for simplicity and maintainability. Administrative users should only use their admin credentials for specific privileged actions, much in the same way administrators should not use the root user in Linux for everyday nonadministrative actions.

The example also shows how the ACL can be defined to restrict privileges to a target principal. It demonstrates that the user cloudera-scm can perform any action but only on SPNs that start with hdfs and mapred. This type of syntax is useful to grant access to a third-party tool to create and administer Hadoop principals, but not grant access to all of the admin functions.

As mentioned earlier, the kadmin tool allows for administration of the Kerberos database. This tool brings users to a shell-like interface where various commands can be entered to perform operations against the Kerberos database (see Examples 4-9 through 4-12.

Example 4-9. Adding a new principal to the Kerberos database
kadmin: addprinc alice@EXAMPLE.COM
WARNING: no policy specified for alice@EXAMPLE.COM; defaulting to no policy
Enter password for principal "alice@EXAMPLE.COM":
Re-enter password for principal "alice@EXAMPLE.COM":
Principal "alice@EXAMPLE.COM" created.
Example 4-10. Displaying the details of a principal in the Kerberos database
kadmin: getprinc alice@EXAMPLE.COM
Principal: alice@EXAMPLE.COM
Expiration date: [never]
Last password change: Tue Feb 18 20:48:15 EST 2014
Password expiration date: [none]
Maximum ticket life: 1 day 00:00:00
Maximum renewable life: 7 days 00:00:00
Last modified: Tue Feb 18 20:48:15 EST 2014 (root/admin@EXAMPLE.COM)
Last successful authentication: [never]
Last failed authentication: [never]
Failed password attempts: 0
Number of keys: 2
Key: vno 1, aes256-cts-hmac-sha1-96, no salt
Key: vno 1, aes128-cts-hmac-sha1-96, no salt
MKey: vno1
Policy: [none]
Example 4-11. Deleting a principal from the Kerberos database
kadmin: delprinc alice@EXAMPLE.COM
Are you sure you want to delete the principal "alice@EXAMPLE.COM"? (yes/no): yes
Principal "alice@EXAMPLE.COM" deleted.
Make sure that you have removed this principal from all ACLs before reusing.
Example 4-12. Listing all the principals in the Kerberos database
kadmin: listprincs

Client Configuration

The default Kerberos client configuration file is typically named krb5.conf, and lives in the /etc/ directory on Unix/Linux systems. This configuration file is read whenever client applications need to use Kerberos, including the kinit utility. The krb5.conf shown in Example 4-13 configuration file is minimally configured from the default that comes with Red Hat/CentOS 6.4.

Example 4-13. krb5.conf
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

 default_realm = DEV.EXAMPLE.COM
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true
 default_tkt_enctypes = aes256-cts aes128-cts
 default_tgs_enctypes = aes256-cts aes128-cts
 udp_preference_limit = 1

  kdc = kdc.example.com
  admin_server = kdc.example.com

   kdc = kdc.dev.example.com
   admin_server = kdc.dev.example.com

 .example.com = EXAMPLE.COM
 example.com = EXAMPLE.COM
 .dev.example.com = DEV.EXAMPLE.COM
 dev.example.com = DEV.EXAMPLE.COM

In this example, there are several different sections. The first, logging, is self-explanatory. It defines where logfiles are stored for the various Kerberos components that generate log events. The second section, libdefaults, contains general default configuration information. Let’s take a closer look at the individual configurations in this section:


This defines what Kerberos realm should be assumed if no realm is provided. This is right in line with the earlier kinit example when a realm was not provided.


DNS can be used to determine what Kerberos realm to use.


DNS can be used to find the location of the KDC.


This specifies how long a ticket lasts for. This can be any length of time up to the maximum specified by the KDC. A typical value is 24 hours, denoted by 24h.


This specifies how long a ticket can be renewed for. Tickets can be renewed by the KDC without having a client reauthenticate. This must be done prior to tickets expiring.


This specifies that tickets can be forwardable, which means that if a user has a TGT already but logs into a different remote system, the KDC can automatically reissue a new TGT without the client having to reauthenticate.


This specifies the encryption types to use for session keys when making requests to the AS. Preference from highest to lowest is left to right.


This specifies the encryption types to use for session keys when making requests to the TGS. Preference from highest to lowest is left to right.


This specifies the maximum packet size to use before switching to TCP instead of UDP. Setting this to 1 forces TCP to always be used.

The next section, realms, lists all the Kerberos realms that the client is aware of. The kdc and admin_server configurations tell the client which server is running the KDC and kadmin processes, respectively. These configurations can specify the port along with the hostname. If no port is specified, it is assumed to use port 88 for the KDC and 749 for admin server. In this example, two realms are shown. This is a common configuration where a one-way trust exists between two realms, and clients need to know about both realms. In this example, perhaps the EXAMPLE.COM realm contains all of the end-user principals and DEV.EXAMPLE.COM contains all of the Hadoop service principals for a development cluster. Setting up Kerberos in this fashion allows users of this dev cluster to use their existing credentials in EXAMPLE.COM to access it.

The last section, domain_realm, maps DNS names to Kerberos realms. The first entry says all hosts under the example.com domain map to the EXAMPLE.COM realm, while the second entry says that example.com itself maps to the EXAMPLE.COM realm. This is similarly the case with dev.example.com and DEV.EXAMPLE.COM. If no matching entry is found in this section, the client will try to use the domain portion of the DNS name (converted to all uppercase) as the realm name.


The important takeaway from this chapter is that Kerberos authentication is a multistep client/server process to provide strong authentication of both users and services. We took a look at the MIT Kerberos distribution, which is a popular implementation choice. While this chapter covered some of the details of configuring the MIT Kerberos distribution, we strongly encourage you to refer to the official MIT Kerberos documentation, as it is the most up-to-date reference for the latest distribution; in addition, it serves as a more detailed guide about all of the configuration options available to a security administrator for setting up a Kerberos environment.

In the next chapter, the Kerberos concepts covered thus far will be taken a step further by putting them into the context of core Hadoop and the extended Hadoop ecosystem.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required