book

GPU Pro 5

by Wolfgang Engel

May 2014

Intermediate to advanced

522 pages

16h 1m

English

A K Peters/CRC Press

Read now

Unlock full access

Content preview from GPU Pro 5

7. Optimizing OpenCL Kernels for the ARM Mali-T600 GPUs 347

This notation allows us to write a single iteration over k as



32n+31

i=32n

A[i, k] × 4





4m+3

j=4m

B[4k +0,j]



× 32,



4m+3

j=4m

B[4k +1,j]



× 32,



4m+3

j=4m

B[4k +2,j]



× 32,



4m+3

j=4m

B[4k +3,j]



× 32.

As a cache line has space for four

float4 elements, we see that the reads from

A read the ﬁrst quarter of 32 consecutive cache lines and the reads from B read

four full cache lines. To get full cache lines instead, we consider four consecutive

iterations in k together, and we see that those four iterations read 32 full cache

lines from A and 16 full cache lines from B. For the moment, we restrict ourselves ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

GPU Pro 4

Wolfgang Engel

GPU Pro 7

Wolfgang Engel

GPU Pro 6

Wolfgang Engel

GPU PRO 3

Wolfgang Engel

Publisher Resources

ISBN: 9781482208641

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

GPU Pro 4

GPU Pro 7

GPU Pro 6

GPU PRO 3

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.