59MapReduce Family of Large-Scale Data-Processing Systems
2.3.5 eFFeCtive Data PlaCement
In the basic implementation of the Hadoop project, the objective of the data place-
ment policy is to achieve good load balance by distributing the data evenly across
the data servers, independently of the intended use of the data. This simple data
placement policy works well with most Hadoop applications that access just a sin-
gle le. However, there are some other applications that process data from multiple
les, which can get a signicant boost in performance with customized strategies.
In these applications, the absence of data colocation increases the data-shufing
costs, increases the network overhead, and reduces the effectiveness of data parti ...