Chapter 7. Joins and Join Optimization

In this chapter, you will learn:

  • Understanding the joins concept
  • Using a left/right/full outer join
  • Using a left semi join
  • Using a cross join
  • Using a map-side join
  • Using a bucket map join
  • Using a bucket sort merge map join
  • Using a skew join

Understanding the joins concept

A join in Hive is used for the same purpose as in a traditional RDBMS. A join is used to fetch meaningful data from two or more tables based on a common value or field. In other words, a join is used to combine data from multiple tables. A join is performed whenever multiple tables are specified inside the FROM clause.

As of now, joins based on equality conditions only are supported in Hive. It does not support any join condition that is based on non-equality ...

Get Apache Hive Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.