O'Reilly logo

Hadoop Cluster Deployment by Danil Zburivsky

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hive

If you were curious to explore the source code of the WordCount MapReduce job example from Chapter 2, Installing and Configuring Hadoop, or tried to write some code yourself, you should have realized by now that this is a very low-level way of processing data in Hadoop. Indeed, if writing MapReduce jobs was the only way to access data in Hadoop, its usability would be pretty limited.

Hive was designed to solve this particular problem. It turned out, that lots of MapReduce code that deal with data filtering, aggregation, and grouping can be generated automatically. So, it is possible to design a high-level data processing language, which can then be compiled into native Java MapReduce code. Actually, there is no need to design a new language ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required