Chapter 6. Integrating Storm and Hadoop

In this chapter, we will cover:

  • Implementing TF-IDF in Hadoop
  • Persisting documents from Storm
  • Integrating the batch and real-time views

Introduction

In Chapter 4, Distributed Remote Procedure Calls, we implemented the Speed layer for a Lambda architecture instance using Storm. In this chapter, we will implement the Batch and Service layers to complete the architecture.

There are some key concepts underlying this big data architecture:

  • Immutable state
  • Abstraction and composition
  • Constrain complexity

Immutable state is the key, in that it provides true fault-tolerance for the architecture. If a failure is experienced at any level, we can always rebuild the data from the original immutable data. This is in contrast to ...

Get Storm Real-time Processing Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.