O'Reilly logo

Hadoop in Practice, Second Edition by Alex Holmes

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 9. SQL on Hadoop

This chapter covers

  • Learning the Hadoop specifics of Hive, including user-defined functions and performance-tuning tips
  • Learning about Impala and how you can write user-defined functions
  • Embedding SQL in your Spark code to intertwine the two languages and play to their strengths

Let’s say that it’s nine o’clock in the morning and you’ve been asked to generate a report on the top 10 countries that generated visitor traffic over the last month. And it needs to be done by noon. Your log data is sitting in HDFS ready to be used. Are you going to break out your IDE and start writing Java MapReduce code? Not likely. This is where high-level languages such as Hive, Impala, and Spark come into play. With their SQL syntax, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required