Example 2

Let's consider a few more examples.

As already mentioned, the HiveQL language is very similar to standard SQL, and it's worthy of the time taken to explore some additional data manipulations using HiveQL.

A key point is that while Hive is intended as a convenience/interface for querying large amounts of data stored in HDFS, SQL is more intended for online operations requiring many reads and writes, which is very similar with somewhat different objectives.

The following script can be used to identify the unique websites viewed in a particular month (the month of June) using the DISTINCT HiveQL function:

select distinct(mysite) from dabigdatatable where mydate = 'Jun' 

This would yield the following (partial) output:


Just like with ...

Get Big Data Visualization now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.