Skip to Content
Apache Hive Cookbook
book

Apache Hive Cookbook

by Hanish Bansal, Saurabh Chauhan, Shrey Mehrotra
April 2016
Beginner content levelBeginner
268 pages
5h 32m
English
Packt Publishing
Content preview from Apache Hive Cookbook

Column statistics in Hive

Similar to table and partition statistics, Hive also supports the analysis of column statistics. The following are the statistics captured by Hive when a column or set of columns are analyzed:

  • The number of distinct values
  • The number of NULL values
  • Minimum or maximum K values where K could be given by a user
  • Histogram: frequency and height balanced
  • Average size of the column
  • Average or sum of all values in the column if their type is numerical
  • Percentiles of the value

How to do it…

As discussed in the previous recipe, Hive provides the analyze command to compute table or partition statistics. The same command could be used to compute statistics for one or more column of a Hive table or partition. The HiveQL in order to compute ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Introduction to Apache Hive

Introduction to Apache Hive

Tom Hanlon

Publisher Resources

ISBN: 9781782161080Supplemental Content