Example 2

Let's consider a few more examples.

As already mentioned, the HiveQL language is very similar to standard SQL, and it's worthy of the time taken to explore some additional data manipulations using HiveQL.

A key point is that while Hive is intended as a convenience/interface for querying large amounts of data stored in HDFS, SQL is more intended for online operations requiring many reads and writes, which is very similar with somewhat different objectives.

The following script can be used to identify the unique websites viewed in a particular month (the month of June) using the DISTINCT HiveQL function:

select distinct(mysite) from dabigdatatable where mydate = 'Jun' 

This would yield the following (partial) output:

Sorting

Just like with ...

Get Big Data Visualization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.