Skip to Content
GPU Pro 5
book

GPU Pro 5

by Wolfgang Engel
May 2014
Intermediate to advanced
522 pages
16h 1m
English
A K Peters/CRC Press
Content preview from GPU Pro 5
7. Optimizing OpenCL Kernels for the ARM Mali-T600 GPUs 347
This notation allows us to write a single iteration over k as
32n+31
,
i=32n
A[i, k] × 4
,
4m+3
,
j=4m
B[4k +0,j]
× 32,
4m+3
,
j=4m
B[4k +1,j]
× 32,
4m+3
,
j=4m
B[4k +2,j]
× 32,
4m+3
,
j=4m
B[4k +3,j]
× 32.
As a cache line has space for four
float4 elements, we see that the reads from
A read the first quarter of 32 consecutive cache lines and the reads from B read
four full cache lines. To get full cache lines instead, we consider four consecutive
iterations in k together, and we see that those four iterations read 32 full cache
lines from A and 16 full cache lines from B. For the moment, we restrict ourselves ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

GPU Pro 4

GPU Pro 4

Wolfgang Engel
GPU Pro 7

GPU Pro 7

Wolfgang Engel
GPU Pro 6

GPU Pro 6

Wolfgang Engel
GPU PRO 3

GPU PRO 3

Wolfgang Engel

Publisher Resources

ISBN: 9781482208641