Skip to Main Content
Apache Hive Essentials - Second Edition
book

Apache Hive Essentials - Second Edition

by Dayong Du
June 2018
Beginner to intermediate content levelBeginner to intermediate
210 pages
5h 12m
English
Packt Publishing
Content preview from Apache Hive Essentials - Second Edition

Partitions

By default, a simple HQL query scans the whole table. This slows down the performance when querying a big table. This issue could be resolved by creating partitions, which are very similar to what's in the RDBMS. In Hive, each partition corresponds to a predefined partition column(s), which maps to subdirectories in the table's directory in HDFS. When the table gets queried, only the required partitions (directory) of data in the table are being read, so the I/O and time of the query is greatly reduced. Using partition is a very easy and effective way to improve performance in Hive.

The following is an example of partition creation in HQL:

> CREATE TABLE employee_partitioned ( > name STRING, > work_place ARRAY<STRING>, > gender_age ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Apache Hive Essentials

Apache Hive Essentials

Dayong Du
Apache Hive Cookbook

Apache Hive Cookbook

Hanish Bansal, Saurabh Chauhan, Shrey Mehrotra
Apache Spark Quick Start Guide

Apache Spark Quick Start Guide

Shrey Mehrotra, Akash Grade

Publisher Resources

ISBN: 9781788995092Supplemental Content