This chapter is about how to aggregate and sample data in Hive. It firstly covers the usage of several aggregation functions, analytic functions working with
GROUP BY and
PARTITION BY, and windowing clauses. Then, it introduces different ways of sampling data in Hive.
In this chapter, we will cover the following topics:
Data aggregation is any process to gather and express data in a summary form to get more information about particular groups based on specific conditions. Hive offers several built-in aggregate functions, such as
AVG, and so on. Hive also supports advanced aggregation by ...