Skip to Content
Apache Hive Cookbook
book

Apache Hive Cookbook

by Hanish Bansal, Saurabh Chauhan, Shrey Mehrotra
April 2016
Beginner content levelBeginner
268 pages
5h 32m
English
Packt Publishing
Content preview from Apache Hive Cookbook

Top K statistics in Hive

It is the mechanism of collecting the top K column values of a Hive table. In this, the top K values of the most skewed column are stored in the partition. This is applicable for both existing and newly created tables.

How to do it…

Top K statistics computation is disabled by default. The following are some of the properties that could be set to compute and store top K statistics:

  • hive.stats.topk.collect

    This would enable computing top K and putting it into skewed information:

    • Default Value: false
    • Valid Values: true, false
  • hive.stats.topk.num
    • Using this property, you can specify K value for your top K result
  • hive.stats.topk.minpercent
    • It is the minimal percentage of a row value to be in top K result
    • It could be any float value between ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Introduction to Apache Hive

Introduction to Apache Hive

Tom Hanlon

Publisher Resources

ISBN: 9781782161080Supplemental Content