© Deepak Vohra 2016

Deepak Vohra, Practical Hadoop Ecosystem, 10.1007/978-1-4842-2199-0_3

3. Apache Hive

Deepak Vohra

(1)Apt 105, White Rock, British Columbia, Canada

Apache Hive is a data warehouse framework for querying and managing large datasets stored in Hadoop distributed filesystems (HDFS) . Hive also provides a SQL-like query language called HiveQL . The HiveQL queries may be run in the Hive CLI shell . By default, Hive stores data in the HDFS, but also supports the Amazon S3 filesystem.

Hive stores data in tables. A Hive table is an abstraction and the metadata for a Hive table is stored in an embedded Derby database called a Derby metastore. Other databases such as MySQL and Oracle Database could also be configured as the Hive metastore ...

Get Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.