Chapter 6. HiveQL: Queries
After learning the many ways we can define and format tables, let’s learn how to run queries. Of course, we have assumed all along that you have some prior knowledge of SQL. We’ve used some queries already to illustrate several concepts, such as loading query data into other tables in Chapter 5. Now we’ll fill in most of the details. Some special topics will be covered in subsequent chapters.
We’ll move quickly through details that are familiar to users with prior SQL experience and focus on what’s unique to HiveQL, including syntax and feature differences, as well as performance implications.
SELECT … FROM Clauses
SELECT is the
projection operator in SQL. The FROM clause identifies from which table, view,
or nested query we select records (see Chapter 7).
For a given record, SELECT
specifies the columns to keep, as well as the outputs of function calls on
one or more columns (e.g., the aggregation functions
like count(*)).
Recall again our partitioned employees table:
CREATETABLEemployees(nameSTRING,salaryFLOAT,subordinatesARRAY<STRING>,deductionsMAP<STRING,FLOAT>,addressSTRUCT<street:STRING,city:STRING,state:STRING,zip:INT>)PARTITIONEDBY(countrySTRING,stateSTRING);
Let’s assume we have the same contents we showed in Text File Encoding of Data Values for four employees in the US state of Illinois (abbreviated IL). Here are queries of this table and the output they produce:
hive>SELECTname,salaryFROMemployees;JohnDoe100000.0
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access