Kerberos authentication

Many installations of Apache Spark use Kerberos to provide security and authentication to services such as HDFS and Kafka. It's also especially common when integrating with third-party databases and legacy systems. As a commercial data scientist, at some point, you'll probably find yourself in a situation where you'll have to work with data in a Kerberized environment, so, in this part of the chapter, we'll cover the basics of Kerberos - what it is, how it works, and how to use it.

Kerberos is a third-party authentication technique that's particularly useful where the primary form of communication is over a network, which makes it ideal for Apache Spark. It's used in preference to alternative methods of authentication, for ...

Get Mastering Spark for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.