44
Graphics Processing Unit Graphics Processing Unit
Programming and Applications Programming and Applications
Oscar Montiel,* Juan J. Tapia, Francisco Javier
Díaz-Delgadillo and Nataly Medina Rodríguez
ABSTRACT
In this chapter, we shall introduce the reader to the fi eld of graphic
processing unit programming (GPU) and applications. The main goal is
to show the advantages of using GPUs for problem-solving of scientifi c
applications that require intensive computation. Specifi cally, we have
focused this chapter in giving an introduction to the Compute Unifi ed
Device Architecture—CUDA—which is a parallel computing platform
and programming model of the NVIDIA Company. CUDA is one of
the most well-known and used GPU programming framework. As a
case study, we present the computational intensive task of generating
the Mandelbrot fractal programmed sequentially, and the application
of the CUDA for breaking down this computational problem with
the aim of showing the benefi ts of using GPUs.
4.1 Introduction
The use of Graphics Processing Units for general parallel computation is
an ongoing research fi eld. Parallel algorithms running on GPUs can often
achieve a speedup of hundreds of times over their respective sequential CPU
Instituto Politécnico Nacional, CITEDI, Tijuana, México.
Email: oross@ipn.mx
* Corresponding author
Graphics Processing Unit Programming and ApplicationsGraphics Processing Unit Programming and Applications 95
algorithms. The GPUs can be used in applications for physics simulations,
signal processing, fi nancial modeling, neural networks, image processing
and many other fi elds. The characteristic that allows the use of the GPUs
in such high performance scientifi c computations is their very large scale
of parallelization; some GPUs have hundreds and even thousands of
embedded cores.
The key success of the use of GPUs in general computing applications
lies on two main facts. The fi rst one is their relation cost/performance
when compared with other processor types, i.e., multicore architectures,
and the second one is the wide distribution of programming frameworks
for general computing—such as NVIDIA CUDA—from the main GPU
developers around the world. Hence, the GPUs constitute a relatively cheap
high performance computing tool that offers a theoretical peak number of
GFLOPS that is nearly six times greater than CPUs and a memory bandwidth
that is practically fi ve-times higher than CPUs (García-Risueño and Ibáñez
2012) but at a fraction of the price of these others architectures. The reason
behind this gap in the computational capacity between a GPU and a CPU
comes from their design; a GPU devotes more transistors and resources to
data processing, rather than data caching and fl ow control, as in a CPU. All
these factors, in addition to the available programming frameworks, are
the reasons behind the wide use of GPU computing by a diverse scientifi c
community of many different research areas. In the textbook (Sanders and
Kandrot 2010) the authors give a detailed explanation of the concepts of
CUDA programming and the main characteristics of the GPUs with very
intuitive examples. In (Kirk and Hwu 2013) is presented the history of GPU
computing as well the main concepts of parallel programming in GPU, some
applications show the development process in the use of GPUs.
In the fi eld of mathematical optimization, many authors are dedicating
their efforts on bringing some of the most widely used algorithms to the
CUDA platform. In (Robilliard et al. 2009) was presented one of the earlier
works on Genetic Programming in GPUs. There the authors provided a
faster evaluation process and thus improved the general performance of the
algorithm. In (Weiss 2011) the methodology needed in order to parallelize
Ant Colony Optimization Algorithm is shown.
Pallipuram et al. (2012) compares the two most popular GPU
programming models (CUDA and OpenCL) and two GPU architectures
from different vendors (NVIDIAS’s Fermi and AMD/ATI’s Radeon 5870),
using a two-level character recognition network developed employing four
spiking neural network models.
Another fi eld of application is the real time simulation and graphical
representation of physical and chemical processes. Venetillo and Celes
(2007) isone of the earliest works that simulates particles in confined
environments, including support for inter-particle collisions, constraints,

Get High Performance Programming for Soft Computing now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.