Chapter 10. Hive Tuning

In this chapter, you will learn the following:

  • Enabling predicate pushdown optimizations in Hive
  • Optimizations to reduce the number of map
  • Sampling

Enabling predicate pushdown optimizations in Hive

In this recipe, you will learn how to use predicate pushdown in Hive.

Getting ready

Predicate pushdown is a traditional RDBMS term, whereas in Hive, it works as predicate pushup. In this, the focus is on to execute all the expressions such as filters as early as possible to optimize the performance of a query. For example, let's look at the query mentioned later, which includes a join condition as well as a filter condition:

SELECT a.*, b.* FROM Sales a JOIN Sales_orc b ON a.id = b.id
WHERE a.id > 100 AND b.id > 300;

In the preceding ...

Get Apache Hive Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.