No cluster is an island—users and applications need to access APIs and services, and data needs to flow in and out. In an enterprise context, it is essential that data is stored, processed, and accessed securely. The aspects of security are usually broken into four domains: authentication, authorization, auditing, and confidentiality. In this chapter, we discuss how these four domains intersect with services running in the cluster. Confidentiality controls are often important in protecting the network exchanges of authentication and authorization mechanisms, so we start by looking at in-flight encryption. We then cover authentication and authorization and finish with a discussion of the available options for at-rest encryption.
There is plenty in the Hadoop documentation and general literature about Hadoop and security, but, in the spirit of keeping this book as self-contained as possible, we cover the essentials here. If you are already well versed in the area, feel free to skip to the next chapter, in which we examine how to integrate the available security mechanisms into the wider enterprise context.
For more detailed coverage of all the concepts discussed in this chapter, we strongly recommend that you read Hadoop Security by Joey Echeverria and Ben Spivey (O’Reilly).
Hadoop clusters are big users of the network (see “How Services Use a Network”), with both data and metadata regularly being transferred between distributed components. ...