Analytics functions in Hive
Hive provides the following set of analytical functions:
RANK
DENSE_RANK
ROW_NUMBER
PERCENT_RANK
CUME_DIST
NTILE
Common and useful sets of analytical functions are ranking functions where rows from resultset are ranked according to a scheme.
How to do it…
Let's analyze each function in detail. We will be using the same sales
dataset and applying analytical functions to it:
ROW_NUMBER
: This function will provide a unique number to each row in resultset based on theORDER BY
clause within thePARTITION
. For example, if we want to assignrow_number
to eachfname
, which is also partitioned by IP address in thesales
dataset, the query would be:hive> select fname,ip,ROW_NUMBER() OVER (ORDER BY ip ) as rownum from sales;
RANK
: It ...
Get Apache Hive Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.