CHAPTER 5Programming Models

Programming a parallel data-intensive application is a complex undertaking. Data-intensive frameworks provide programming abstractions to make it easy to develop applications at scale. The programming models and data structures used by frameworks have a large impact on their performance and generality. Here, we will look at a few popular programming models and APIs for data-intensive applications.


A parallel computing model provides an easy-to-use abstraction needed to express an algorithm and its composition for solving problems using many computers [1]. The effectiveness of a model is defined by how generally applicable it is to express problems in the application domain and the efficiency of the programs developed using it.

Algorithms are expressed using abstract concepts such as vectors, matrices, graphs, tables, and tensors, along with the operations around them. These abstract concepts are represented in computers using data structures such as arrays, lists, trees, and hash maps, and we can implement various operations around the like.

A parallel programming API is a combination of a computing model, data structures, and operations. In other words, the data structures and operations make a domain-specific parallel programming API using a parallel programming model. More operations and data structures supported by an API means it will be able to solve a variety of problems easily. Also, if the data structures are generic enough ...

Get Foundations of Data Intensive Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.