June 2018
Intermediate to advanced
372 pages
8h 44m
English
Hadoop and MapReduce are standard, cookie-cutter ways of taking complicated jobs that parallelize the jobs that run on a number of machines and get the results back for your convenience. The power and versatility of the MapReduce programming paradigm and the great design underlying Hadoop have spawned an entire ecosystem of its own. One side effect of this Hadoop ecosystem and popularity is the rise of clustered or distributed computing. But it is a simple fact that configuring a cluster of distributed machines is quite complicated and expensive. Ask yourself how many companies or organizations do you know that run Hadoop in a fully distributed mode with raw Hadoop, without making use of nice company versions such as Cloudera. ...