April 2016
Beginner
268 pages
5h 32m
English
It is the mechanism of collecting the top K column values of a Hive table. In this, the top K values of the most skewed column are stored in the partition. This is applicable for both existing and newly created tables.
Top K statistics computation is disabled by default. The following are some of the properties that could be set to compute and store top K statistics:
hive.stats.topk.collectThis would enable computing top K and putting it into skewed information:
falsetrue, falsehive.stats.topk.numhive.stats.topk.minpercentfloat value between ...