Chapter 9. SQL on Hadoop

This chapter covers

  • Learning the Hadoop specifics of Hive, including user-defined functions and performance-tuning tips
  • Learning about Impala and how you can write user-defined functions
  • Embedding SQL in your Spark code to intertwine the two languages and play to their strengths

Let’s say that it’s nine o’clock in the morning and you’ve been asked to generate a report on the top 10 countries that generated visitor traffic over the last month. And it needs to be done by noon. Your log data is sitting in HDFS ready to be used. Are you going to break out your IDE and start writing Java MapReduce code? Not likely. This is where high-level languages such as Hive, Impala, and Spark come into play. With their SQL syntax, ...

Get Hadoop in Practice, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.