Performance Considerations

Although Hive is built to deal with big data processing, we still cannot ignore the importance of performance. Most of the time, a better query can rely on the smart query optimizer to find the best execution strategy, as well as the default settings and best practices. However, experienced users should learn more about the theory and practice of performance tuning, especially when working on a performance-sensitive project or environment.

In this chapter, we will start using utilities available in HQL to find potential issues causing poor performance. Then, we introduce the best practices for performance considerations in the areas of design, file format, compression, storage, queries, and jobs. In this chapter, ...

Get Apache Hive Essentials now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.