Chapter 9. SQL on Hadoop

This chapter covers

  • Learning the Hadoop specifics of Hive, including user-defined functions and performance-tuning tips
  • Learning about Impala and how you can write user-defined functions
  • Embedding SQL in your Spark code to intertwine the two languages and play to their strengths

Let’s say that it’s nine o’clock in the morning and you’ve been asked to generate a report on the top 10 countries that generated visitor traffic over the last month. And it needs to be done by noon. Your log data is sitting in HDFS ready to be used. Are you going to break out your IDE and start writing Java MapReduce code? Not likely. This is where high-level languages such as Hive, Impala, and Spark come into play. With their SQL syntax, ...

Get Hadoop in Practice, Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.