Chapter 18. Security
To understand Hive security, we have to backtrack and understand Hadoop security and the history of Hadoop. Hadoop started out as a subproject of Apache Nutch. At that time and through its early formative years, features were prioritized over security. Security is more complex in a distributed system because multiple components across different machines need to communicate with each other.
Unsecured Hadoop like the versions before the v0.20.205 release
derived the username by forking a call to the whoami
program. Users are free to change this parameter by setting the hadoop.job.ugi
property for FSShell
(filesystem) commands. Map and reduce
tasks all run under the same system user (usually hadoop
or mapred
) on TaskTracker nodes.
Also, Hadoop components are typically listening on ports with high numbers.
They are also typically launched by nonprivileged users (i.e., users other
than root).
The recent efforts to secure Hadoop involved several changes,
primarily the incorporation of Kerberos
authorization support, but also other changes to close vulnerabilities.
Kerberos allows mutual authentication between client and server. A client’s
request for a ticket is passed along
with a request. Tasks on the TaskTracker are run as the
user who launched the job. Users are no longer able to impersonate other
users by setting the hadoop.job.ugi
property. For this to work, all Hadoop components must use Kerberos security
from end to end.
Hive was created before any of this Kerberos ...
Get Programming Hive now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.