April 2016
Beginner
268 pages
5h 32m
English
In this recipe, you will learn how to use a map-side joins in Hive.
While joining multiple tables in Hive, there comes a scenario where one of the tables is small in terms of rows while another is large. In order to produce the result in an efficient manner, Hive uses map-side joins. In map-side joins, the smaller table is cached in the memory while the large table is streamed through mappers. By doing so, Hive completes the joining at the mapper side only, thereby removing the reducer job. By doing so, performance is improved tremendously.
There are two ways of using map-side joins in Hive.
One is to use the /*+ MAPJOIN(<table_name>)*/ hint just after the select keyword. table_name has to be the table that is smaller ...