O'Reilly logo

Apache Hive Essentials by Dayong Du

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hive buckets

Besides partition, bucket is another technique to cluster datasets into more manageable parts to optimize query performance. Different from partition, the bucket corresponds to segments of files in HDFS. For example, the employee_partitioned table from the previous section uses the year and month as the top-level partition. If there is a further request to use the employee_id as the third level of partition, it leads to many deep and small partitions and directories. For instance, we can bucket the employee_partitioned table using employee_id as the bucket column. The value of this column will be hashed by a user-defined number into buckets. The records with the same employee_id will always be stored in the same bucket (segment of ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required