Loop unrolling
Loop unrolling is a technique that seeks to ensure you do a reasonable number of data operations for the overhead of running through a loop. Take the following code:
{
for (i=0;i<100;i++)
q[i]=i;
}
In terms of assembly code, this will generate:
• A load of a register with 0 for parameter i.
• A test of the register with 100.
• A branch to either exit or execute the loop.
• An increment of the register holding the loop counter.
• An address calculation of array q indexed by i.
• A store of i to the calculated address.
Only the last of these instructions actually does some real work. The rest of the instructions are overhead.
We can rewrite this C code as
{
for (i=0;i<25;i+=4)
q[i]=i;
q[i+1]=i+1;
q[i+2]=i+2;
Get CUDA Programming now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.