Chapter 6. Integrating Storm and Hadoop
In this chapter, we will cover:
- Implementing TF-IDF in Hadoop
- Persisting documents from Storm
- Integrating the batch and real-time views
In Chapter 4, Distributed Remote Procedure Calls, we implemented the Speed layer for a Lambda architecture instance using Storm. In this chapter, we will implement the Batch and Service layers to complete the architecture.
There are some key concepts underlying this big data architecture:
- Immutable state
- Abstraction and composition
- Constrain complexity
Immutable state is the key, in that it provides true fault-tolerance for the architecture. If a failure is experienced at any level, we can always rebuild the data from the original immutable data. This is in contrast to ...