Time for action – performing a join
Joins are a very frequently used tool in SQL, though sometimes appear a little intimidating to those new to the language. Essentially a join allows rows in multiple tables to be logically combined together based on a conditional statement. Hive has rich support for joins which we will now examine.
- Create the following as
join.hql
:SELECT t1.sighted, t2.full_name FROM ufodata t1 JOIN states t2 ON (LOWER(t2.abbreviation) = LOWER(SUBSTR( t1.sighting_location, (LENGTH(t1.sighting_location)-1)))) LIMIT 5 ;
- Execute the query:
$ hive -f join.hql
You will receive the following response:
OK 20060930 Alaska 20051018 Alaska 20050707 Alaska 20100112 Alaska 20100625 Alaska Time taken: 33.255 seconds
What just happened?
The ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.