Hadoop in Practice

Chapter 10. Hacking with Hive

This chapter covers

Learning how serialization and deserialization works in Hive
Writing a UDF to use the distributed cache
Optimizing your joins for faster query execution times
Using the EXPLAIN command to understand how Hive is planning your work

Working with MapReduce is nontrivial and has a steep learning curve, even for Java programmers. Over the course of the next three chapters, we’ll look at technologies that lower the barrier of entry to MapReduce.

Let’s say that it’s nine o’clock in the morning and you’ve been asked to generate a report on the top ten countries that generated visitor traffic over the last month. And it needs to be done by noon. Your log data is sitting in HDFS ready to be used. ...

Get Hadoop in Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop in Practice by Alex Holmes

Chapter 10. Hacking with Hive

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly