Chapter 2. What the Hadoop Ecosystem Offers

Apache Hadoop and related technologies are rapidly evolving, and as such they are spawning a large array of new tools. As people see growing value and expanding use cases in this area, the number of tools to address significant needs also grows. This trend is good news in that it provides a wide range of functions to support the activities you may want to carry out in this new arena. However, the wealth of new and unfamiliar tools can feel a bit overwhelming.

In order to help guide you through the choices being offered in the Hadoop ecosystem in a meaningful way, we take a look here at some of the key actions that are commonly desired in Hadoop and NoSQL use cases and provide you with a description of some of the tools widely used to carry out those operations. This is by no means a full catalog of what’s available, nor is it a how-to manual for using the tools. Instead, our focus is on the issues associated with functions common to many Hadoop-based projects. This high-level view of what various Hadoop ecosystem tools are used for is intended to help you to assess tools of interest to you, whether or not they’re included in our list, in terms of how they may be helpful for your projects.

To get started, we’ve put together a chart of some major needs that are common to many use cases and that shows you a few of the tools associated with each one. In Table 2-1, a selection of the most prominent tools in the Hadoop ecosystem are broken down ...

Get Real-World Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.