
158 GPGPU Programming for Games and Science
should be unrolled by the compiler.. Modify the compute shader as shown
next:
[ unroll ]
for ( int r=0; r< 3; ++r)
{
[ unroll ]
for ( int c=0; c< 3; ++c)
{
result += weight[r ][c] ∗ input [t + offset [ r ][ c ]];
}
}
which tells the compiler to unroll both loops, if possible. In this case, the
number of loop iterations is known at compile time, and we expect to obtain
nine occurrences of the inner-loop body, say,
re su lt += 0.0625 f ∗ input [ int2 (t .x − 1, t.y − 1)];
re su lt += 0.1250 f ∗ input [ int2 (t .x , t .y − 1)];
re su lt += 0.0625 f ∗ input [ int2 (t .x + 1, t .y − 1)];
re su lt += 0.1250 f ∗ input [ int2 ...