Grouping and counting column values with Hive
Once a schema is defined inside Hive, many different ad hoc queries can be run against it. Based on the submitted query, Hive generates a plan that may be one or more MapReduce jobs. This recipe shows how to group and count on the values of a specified column using Hive.
How to do it...
- Insert some entries ensuring that the '
favorite_movie' column is populated:
[default@ks33] set cf33['ed']['favorite_movie']='memento'; [default@ks33] set cf33['stacey']['favorite_movie']='drdolittle'; [default@ks33] set cf33['bob']['favorite_movie']='memento';
- Create an HQL query that will count values of the
favorite_moviecolumn and then order the counts in ascending order:
hive> SELECT favorite_movie,count(1) as ...