Chapter 11. Implementation of HBase as a Master Data Management Tool
In Chapter 10, we reviewed the implementation of a customer 360 solution. In addition to HBase, it uses several different applications, including MapReduce and Hive. On the HBase side, we described how MapReduce is used to do lookups or to generate HBase files. In addition, as discussed in the previous chapter, Collective plans to improve its architecture by using Kafka. None of this should be new for you, as we covered it in detail in previous chapters. However, Collective is also planning to use Spark, and this is where things start to be interesting. Indeed, over the last several years, when applications needed to process HBase data, they have usually used MapReduce or the Java API. However, with Spark becoming more and more popular, we are seeing people starting to implement solutions using Spark on top of HBase.
Because we already covered Kafka, MapReduce, and the Java API in previous chapters, instead of going over all those technologies again to provide you with very similar examples, we will here focus on Spark over HBase. The example we are going to implement will still put the customer 360 description in action, but as for the other implementation examples, this can be reused for any other use case.
MapReduce Versus Spark
Before we continue, we should establish the pros and cons of using Spark versus using MapReduce. Although we will not provide a lengthy discussion on this topic, we will briefly highlight ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access