This book is for you if you are interested in how Apache Hadoop and related technologies can address problems involving large-scale data in cost-effective ways. Whether you are new to Hadoop or a seasoned user, you should find the content in this book both accessible and helpful.
Here we speak to business team leaders, CIOs, business analysts, and technical developers to explain in basic terms how NoSQL Apache Hadoop and Apache HBase–related technologies work to meet big data challenges and the ways in which people are using them, including using Hadoop in production. Detailed knowledge of Hadoop is not a prerequisite for this book. We do assume you are rougly familiar with what Hadoop and HBase are, and we focus mainly on how best to use them to advantage. The book includes some suggestions for best practice, but it is intended neither as a technical reference nor a comprehensive guide to how to use these technologies, and people can easily read it whether or not they have a deeply technical background. That said, we think that technical adepts will also benefit, not so much from a review of tools, but from a sharing of experience.
Based on real-world situations and experience, in this book we aim to describe how Hadoop-based systems and new NoSQL database technologies such as Apache HBase have been used to solve a wide variety of business and research problems. These tools have grown to be very effective and production-ready. Hadoop and associated tools are being used successfully in a variety of use cases and sectors. To choose to move into these new approaches is a big decision, and the first step is to recognize how these solutions can be an advantage to achieve your own specific goals. For those just getting started, we describe some of the pre-planning and early decisions that can make the process easier and more productive. People who are already using Hadoop and NoSQL-based technologies will find suggestions for new ways to gain the full range of benefits possible from employing Hadoop well.
In order to help inform the choices people make as they consider these new solutions, we’ve put together:
- An overview of the reasons people are turning to these technologies
- A brief review of what the Hadoop ecosystem tools can do for you
- A collection of tips for success
- A description of some widely applicable prototypical use cases
- Stories from the real world to show how people are already using Hadoop and NoSQL successfully for experimentation, development, and in production
This book is a selection of various examples that should help guide decisions and spark your ideas for how best to employ these technologies. The examples we describe are based on how customers use the Hadoop distribution from MapR Technologies to solve their big data needs in many situations across a range of different sectors. The uses for Hadoop we describe are not, however, limited to MapR. Where a particular capability is MapR-specific, we call that to your attention and explain how this would be handled by other Hadoop distributions. Regardless of the Hadoop distribution you choose, you should be able to see yourself in these examples and gain insights into how to make the best use of Hadoop for your own purposes.
How to Use This Book
If you are inexperienced with Apache Hadoop and NoSQL non-relational databases, you will find basic advice to get you started, as well as suggestions for planning your use of Hadoop going forward.
If you are a seasoned Hadoop user and have familiarity with Hadoop-based tools, you may want to mostly skim or even skip Chapter 2 except as a quick review of the ecosystem.
For all readers, when you reach Chapter 5 and Chapter 6, consider them together. The former explains some of the most rewarding of prototypical Hadoop use cases so that you see what your options are. Chapter 6 then shows you how Hadoop users are putting those options together in real-world settings to address many different problems.
We hope you find this approach helpful.
—Ted Dunning and Ellen Friedman, January 2015