July 2015
Intermediate to advanced
480 pages
13h 43m
English
Data is the new oil. No: Data is the new soil.
—David McCandless
One of the biggest decisions in the design of a Hadoop ecosystem is selecting the SQL engines for the use cases. You have to ask yourself, for different types of applications and projects, should we use Hive on Tez, Impala, Spark SQL, Phoenix for HBase, and so on? The decision gets harder as each new release adds functionality that overlaps other SQL engines. In this chapter we discuss Hadoop SQL engines and two of the primary tools that use these engines, Hive and Pig.
In the early days of computing, everything was file based and only geeks could parse and process such data. With RDBMSs, SQL became the universal language of data ...
Read now
Unlock full access