Working with Hive tables

In this section, we will discuss the integration of Spark SQL with Hive tables. We will see the process of executing the Hive queries in Spark SQL, which will help us in creating and analyzing Hive tables in HDFS.

Spark SQL provides the flexibility of directly executing Hive queries with our Spark SQL codebase. The best part is that the Hive queries are executed on the Spark cluster and we just require the setup of HDFS for reading and storing the Hive tables. In other words, there is no need to set up a complete Hadoop cluster with services like ResourceManager or NodeManager. We just need services of HDFS, which are available as soon as we start NameNode and DataNode.

Perform the following steps for creating Hive tables ...

Get Real-Time Big Data Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.