High Performance Programming for Soft Computing

Graphics Processing Unit Graphics Processing Unit

Programming and Applications Programming and Applications

Oscar Montiel,* Juan J. Tapia, Francisco Javier

Díaz-Delgadillo and Nataly Medina Rodríguez

ABSTRACT

In this chapter, we shall introduce the reader to the ﬁ eld of graphic

processing unit programming (GPU) and applications. The main goal is

to show the advantages of using GPUs for problem-solving of scientiﬁ c

applications that require intensive computation. Speciﬁ cally, we have

focused this chapter in giving an introduction to the Compute Uniﬁ ed

Device Architecture—CUDA™—which is a parallel computing platform

and programming model of the NVIDIA Company. CUDA is one of

the most well-known and used GPU programming framework. As a

case study, we present the computational intensive task of generating

the Mandelbrot fractal programmed sequentially, and the application

of the CUDA™ for breaking down this computational problem with

the aim of showing the beneﬁ ts of using GPUs.

4.1 Introduction

The use of Graphics Processing Units for general parallel computation is

an ongoing research ﬁ eld. Parallel algorithms running on GPUs can often

achieve a speedup of hundreds of times over their respective sequential CPU

Instituto Politécnico Nacional, CITEDI, Tijuana, México.

Email: oross@ipn.mx

* Corresponding author

Graphics Processing Unit Programming and ApplicationsGraphics Processing Unit Programming and Applications 95

algorithms. The GPUs can be used in applications for physics simulations,

signal processing, ﬁ nancial modeling, neural networks, image processing

and many other ﬁ elds. The characteristic that allows the use of the GPUs

in such high performance scientiﬁ c computations is their very large scale

of parallelization; some GPUs have hundreds and even thousands of

embedded cores.

The key success of the use of GPUs in general computing applications

lies on two main facts. The ﬁ rst one is their relation cost/performance

when compared with other processor types, i.e., multicore architectures,

and the second one is the wide distribution of programming frameworks

for general computing—such as NVIDIA CUDA—from the main GPU

developers around the world. Hence, the GPUs constitute a relatively cheap

high performance computing tool that offers a theoretical peak number of

GFLOPS that is nearly six times greater than CPUs and a memory bandwidth

that is practically ﬁ ve-times higher than CPUs (García-Risueño and Ibáñez

2012) but at a fraction of the price of these others architectures. The reason

behind this gap in the computational capacity between a GPU and a CPU

comes from their design; a GPU devotes more transistors and resources to

data processing, rather than data caching and ﬂ ow control, as in a CPU. All

these factors, in addition to the available programming frameworks, are

the reasons behind the wide use of GPU computing by a diverse scientiﬁ c

community of many different research areas. In the textbook (Sanders and

Kandrot 2010) the authors give a detailed explanation of the concepts of

CUDA programming and the main characteristics of the GPUs with very

intuitive examples. In (Kirk and Hwu 2013) is presented the history of GPU

computing as well the main concepts of parallel programming in GPU, some

applications show the development process in the use of GPUs.

In the ﬁ eld of mathematical optimization, many authors are dedicating

their efforts on bringing some of the most widely used algorithms to the

CUDA platform. In (Robilliard et al. 2009) was presented one of the earlier

works on Genetic Programming in GPUs. There the authors provided a

faster evaluation process and thus improved the general performance of the

algorithm. In (Weiss 2011) the methodology needed in order to parallelize

Ant Colony Optimization Algorithm is shown.

Pallipuram et al. (2012) compares the two most popular GPU

programming models (CUDA and OpenCL) and two GPU architectures

from different vendors (NVIDIAS’s Fermi and AMD/ATI’s Radeon 5870),

using a two-level character recognition network developed employing four

spiking neural network models.

Another ﬁ eld of application is the real time simulation and graphical

representation of physical and chemical processes. Venetillo and Celes

(2007) isone of the earliest works that simulates particles in conﬁned

environments, including support for inter-particle collisions, constraints,

Get High Performance Programming for Soft Computing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

High Performance Programming for Soft Computing by Oscar Montiel Ross, Roberto Sepulveda

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly