CHAPTER 7Big Data Analytics, Mining, and Machine Learning

One of the main goals of efficient big data processing is performing some sort of analytics, which useful insights to be gained about the data and making it actionable from the business perspective. Typically, three types of analytics can be distinguished:

  • Descriptive – allows you to summarize and understand what has happened based on the available data
  • Predictive – makes predictions about the future taking into account past events
  • Prescriptive – helps to make the best out of possible actions in order to achieve the desired outcome

Advances in parallel computing architectures and computational models, make it attractive to consider distribution of ML-DM (Machine Learning–Data Mining) algorithms. Taking into account the explicit distribution of the data resources, decentralization of computations often becomes obligatory. Yet, not all algorithms can be distributed in a straightforward way. Ghoting et al. [2011].

Firstly, taking a sequential algorithm and throwing it into a generic parallelization framework, typically creates a lot of trouble with communication and data management. So ideally dedicated versions of the algorithms should be implemented. On the other hand, the researchers would like to be able to code in an easy way, which they are used to.

Secondly, the specifics of ML-DM require several interactions and prototyping. Often a wide variety of algorithms need to be tried. Rapid prototyping in a distributed ...

Get Modern Big Data Architectures now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.