Grouping and counting column values with Hive

Once a schema is defined inside Hive, many different ad hoc queries can be run against it. Based on the submitted query, Hive generates a plan that may be one or more MapReduce jobs. This recipe shows how to group and count on the values of a specified column using Hive.

How to do it...

  1. Insert some entries ensuring that the 'favorite_movie' column is populated:
    [default@ks33] set cf33['ed']['favorite_movie']='memento'; 
    [default@ks33] set cf33['stacey']['favorite_movie']='drdolittle'; 
    [default@ks33] set cf33['bob']['favorite_movie']='memento';      
  2. Create an HQL query that will count values of the favorite_movie column and then order the counts in ascending order:
    hive> SELECT favorite_movie,count(1) as ...

Get Cassandra High Performance Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.