Skip to Main Content
Designing Scientific Applications on GPUs
book

Designing Scientific Applications on GPUs

by Raphael Couturier
November 2013
Intermediate to advanced content levelIntermediate to advanced
498 pages
17h 6m
English
Chapman and Hall/CRC
Content preview from Designing Scientific Applications on GPUs
240 Designing Scientific Applications on GPUs
where C
ins
denotes the number of cycles to execute instruction ins
{add, div, mult, cmp}.
Each thread has to load N
el
variables to compute its partial sum of squared
variables. The thread computing the division also loads the coefficient c
j
. This
must be done for the N
col
columns with which a block has to deal. We must
also take into account that the scheduler hides some latency by swapping the
warps, so the total latency C
latency
must be divided by the number of warps
N
W
. Thus, the number of cycles relative to memory accesses is given by
C
Accesses
=
N
col
· (N
el
+ 1) · C
latency
N
W
(10.6)
At the end of the execution
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Introduction to Numerical Analysis and Scientific Computing

Introduction to Numerical Analysis and Scientific Computing

Nabil Nassif, Dolly Khuwayri Fayyad
Computational Electromagnetism

Computational Electromagnetism

Alain Bossavit, Isaak D. Mayergoyz

Publisher Resources

ISBN: 9781466571648