CHAPTER 11

image

Reporting with Hadoop

Because the potential storage capability of a Hadoop cluster is so very large, you need some means to track both the data contained on the cluster and the data feeds moving data into and out of it. In addition, you need to consider the locations where data might reside on the cluster—that is, in HDFS, Hive, HBase, or Impala. Knowing you should track your data only spawns more questions, however: What type of reporting might be required and in what format? Is a dashboard needed to post the status of data at any given moment? Are graphs or tables helpful to show the state of a data source for a given time period, ...

Get Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.