The easiest way to build a cluster is to use some nodes as storage nodes and others as processing ones. This configuration seems very easy to use as we don't need a complex framework to handle this situation. In fact, many small clusters are built exactly in this way: a couple of servers handle the data (plus their replica) and another bunch process the data. Although this may appear as a great solution, it's not often used for many reasons:
- It only works for embarrassingly parallel algorithms. If an algorithm requires a common area of memory shared among the processing servers, this approach cannot be used.
- If one or many storage nodes die, the data is not guaranteed to be consistent. ...