Chapter 5. HiveQL: Data Manipulation

This chapter continues our discussion of HiveQL, the Hive query language, focusing on the data manipulation language parts that are used to put data into tables and to extract data from tables to the filesystem.

This chapter uses SELECT ... WHERE clauses extensively when we discuss populating tables with data queried from other tables. So, why aren’t we covering SELECT ... WHERE clauses first, instead of waiting until the next chapter, Chapter 6?

Since we just finished discussing how to create tables, we wanted to cover the next obvious topic: how to get data into these tables so you’ll have something to query! We assume you already understand the basics of SQL, so these clauses won’t be new to you. If they are, please refer to Chapter 6 for details.

Loading Data into Managed Tables

Since Hive has no row-level insert, update, and delete operations, the only way to put data into an table is to use one of the “bulk” load operations. Or you can just write files in the correct directories by other means.

We saw an example of how to load data into a managed table in Partitioned, Managed Tables, which we repeat here with an addition, the use of the OVERWRITE keyword:

LOAD DATA LOCAL INPATH '${env:HOME}/california-employees'
OVERWRITE INTO TABLE employees
PARTITION (country = 'US', state = 'CA');

This command will first create the directory for the partition, if it doesn’t already exist, then copy the data to it.

If the target table is not partitioned, you ...

Get Programming Hive now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.