Chapter 6. HiveQL: Queries

After learning the many ways we can define and format tables, let’s learn how to run queries. Of course, we have assumed all along that you have some prior knowledge of SQL. We’ve used some queries already to illustrate several concepts, such as loading query data into other tables in Chapter 5. Now we’ll fill in most of the details. Some special topics will be covered in subsequent chapters.

We’ll move quickly through details that are familiar to users with prior SQL experience and focus on what’s unique to HiveQL, including syntax and feature differences, as well as performance implications.

SELECT … FROM Clauses

SELECT is the projection operator in SQL. The FROM clause identifies from which table, view, or nested query we select records (see Chapter 7).

For a given record, SELECT specifies the columns to keep, as well as the outputs of function calls on one or more columns (e.g., the aggregation functions like count(*)).

Recall again our partitioned employees table:

CREATE TABLE employees (
  name         STRING,
  salary       FLOAT,
  subordinates ARRAY<STRING>,
  deductions   MAP<STRING, FLOAT>,
  address      STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (country STRING, state STRING);

Let’s assume we have the same contents we showed in Text File Encoding of Data Values for four employees in the US state of Illinois (abbreviated IL). Here are queries of this table and the output they produce:

hive> SELECT name, salary FROM employees;
John Doe    100000.0

Get Programming Hive now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.