Apache Hadoop's MapReduce works well with offline batch jobs but was not designed to process in-memory datasets. In-memory data should be processed fast. Apache Ignite offers APIs to perform MapReduce or Java's ForkJoin on in-memory datasets. The Apache Ignite architecture offers two APIs for job distribution: distributed closure and MapReduce; however, the MapReduce API gives you more control over job to node mapping and error handling, such as you can write your own fail-over logic.
The ComputeTask interface is the gateway to the Ignite MapReduce framework. An Ignite MapReduce task's life cycle consists of the following phases:
- STEP 1 (MAP): The initial phase is to map the jobs to the worker nodes. The map(List<ClusterNode> ...