Introducing parallelism in Storm
Recall from the introduction that Storm allows a computation to scale horizontally across multiple machines by dividing the computation into multiple, independent tasks that execute in parallel across a cluster. In Storm, a task is simply an instance of a spout or bolt running somewhere on the cluster.
To understand how parallelism works, we must first explain the four main components involved in executing a topology in a Storm cluster:
- Nodes (machines): These are simply machines configured to participate in a Storm cluster and execute portions of a topology. A Storm cluster contains one or more nodes that perform work.
- Workers (JVMs): These are independent JVM processes running on a node. Each node is configured ...
Get Storm Blueprints: Patterns for Distributed Real-time Computation now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.