July 2017
Beginner to intermediate
378 pages
10h 26m
English
We now move on from file storage formats to data processing and retrieval components. Apache Hive is a project in the Hadoop ecosystem that fits within the analysis interaction area. It allows you to write queries in a SQL-like language, called Hive Query Language (HiveQL), which it interprets into processing commands for execution over HDFS. HiveQL is very similar to SQL and will be instantly usable to any SQL developer.
Hive architecture consists of a User Interface (UI), a metastore database, a driver, a compiler, and an execution engine. These components work together to translate a user query into a Directed Acyclic Graph (DAG) of actions that orchestrate execution and return results utilizing Hadoop MapReduce (usually) and HDFS. ...