In this recipe, we will look at how to setup a Hive server and use it to query the data stored in a distributed system.
Apache Hive is a client-side library that provides a warehouse solution, which enables representation of data on HDFS in a structure format and querying of it using SQL. The table definitions and mapping are stored in a metastore, which is a combination of a service and a database.
The Hive metastore can run in any of three modes: standalone, local metastore, and remote metastore mode. Standalone or embedded mode is not used in production as it limits the number of connections to just one, and everything runs inside a single JVM.
The Hive driver, metastore interface, and database are the three things ...