O'Reilly logo

High Performance Visualization by E. Wes Bethel, Charles Hansen, Hank Childs

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2
Parallel Visualization Frameworks
Hank Childs
Lawrence Berkeley National Laboratory
2.1 Introduction ...................................................... 9
2.2 Background ...................................................... 11
2.2.1 Parallel Computing ...................................... 11
2.2.2 Data Flow Networks ..................................... 12
2.3 Parallelization Strategy .......................................... 13
2.4 Usage ............................................................. 16
2.5 Advanced Processing Techniques ................................ 17
2.5.1 Contracts ................................................ 18
2.5.2 Data Subsetting ......................................... 19
2.5.3 Parallelization Artifacts ................................. 19
2.5.4 Scheduling ............................................... 21
2.6 Conclusion ........................................................ 21
References .......................................................... 23
Parallelization is the most common way to deal with the large data sets regu-
larly generated by simulations or captured through experiments. This chapter
seeks to answer key questions about frameworks for parallelizing visualiza-
tion algorithms: What is the nature of these frameworks? How are they used?
How do they parallelize processing? What problems result from paralleliza-
tion? And how can optimizations be incorporated?
2.1 Introduction
Parallel visualization frameworks exist to deal with “large data.” Of course,
the notion of what is “large” is relative. Here, “large data” is defined as data
that is too large be processed, in its entirety, all at one time, because it exceeds
the available memory. This definition has three important criteria: (1) in its
entirety, (2) all at one time, and (3) exceeds the available memory. It is not
surprising that the approaches dealing with large data address one or more of
these three criteria. Parallel visualization frameworks approach this problem
through the third criterion; they use parallel resources with enough memory
9
10 High Performance Visualization
to store the data, as well as any derived data generated from it. This approach
is popular because it has proven capable of dealing with virtually all visual-
ization use cases, from applying simple algorithms to complex combinations
of algorithms and from exploration to presentation.
Alternatives to parallelization address the large data problem through the
other two criteria. The first criterion—processing the data in its entirety—
can be approached in multiple ways. One technique is data subsetting: to
process only the salient portions of the data set and ignore the portions that
do not affect the final picture. An example of such a technique is query-
driven visualization, which is discussed in Chapter 7. Another technique is
multiresolution processing, which views coarse versions of the data, by default,
and only processes data at finer resolutions when necessary (see Chap. 8).
The streaming technique attacks the problem through the second criterion,
processing all of the data at one time. This technique, instead, treats the
data set as being composed of multiple pieces and processes data one piece at
a time (see Chap. 10). Note that these techniques and parallel visualization
frameworks are not mutually exclusive. Parallel visualization frameworks are
flexible; they do not require that data be read in its entirety or that all data is
processed at one time. As discussed in 2.5, parallel visualization frameworks
can be used to provide a parallel foundation for any of the data subsetting,
streaming, or multiresolution techniques.
A parallel visualization framework is like any software framework. It pro-
vides abstractions for key concepts, such as visualization algorithms or data
representations, that are easily extended. It provides infrastructure code that
dictates how modules in the framework interact and manages the flow of con-
trol within the framework. These approaches have been well borne out from
a lineage of data flow networks (discussed further in 2.2.2).
Data flow networks have played a major role in visualization and analysis
software since the early 1990s [11, 9, 1, 7], as they are so effective in rapid
application development for solving a variety of visualization problems. They
provide an execution model, a data model (i.e., a way to represent data), and
algorithms to transform data. It is somewhat surprising that these frameworks
can solve such a wide range of problems, since the data access patterns and na-
ture of visualization algorithms vary widely. Parallel visualization frameworks
extend data flow networks to operate in a parallel setting. It is even more sur-
prising that their extension to the parallel world also has been so successful,
since data access patterns are even more varied in a parallel setting. At the
heart of this success is the commonality between visualization algorithms: data
loading, data transformation, and data presentation. By focusing on these ab-
stractions, parallel visualization frameworks are able to successfully support
myriad algorithms.
This chapter gives an overview of parallel visualization frameworks. It
introduces concepts for parallel computing (2.2.1) and data flow networks
(2.2.2). It also describes the basic approach for data parallel processing (2.3),

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required