O'Reilly logo

Pentaho for Big Data Analytics by Feris Thia, Manoj R Patil

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Summary

This chapter begins with an introduction to the concept of Hadoop, which provides us with a deeper understanding of its distributed architecture on storages and processes, why and when we will use it, its working mechanism, and how the distributed job/task tracker works.

Following the introduction is the walkthrough of Pentaho Data Integration working on Hortonworks Sandbox, one of the Hadoop distributions, that is suitable for learning Hadoop. The chapter shows you how to read and write a datafile to HDFS, import it to Hive, and query the data using a SQL-like language.

In the following chapters, we will discuss how to extend the usage of Hadoop with the help of other Pentaho tools and present it visually using CTools, a community driven ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required