
68 Designing Scientific Applications on GPUs
50 ou tpu t [ id x++ ] = o u t v a l 1 ;
out put [ id x++ ] = o u t v a l 2 ;
out put [ id x++ ] = o u t v a l 3 ;
out put [ id x++ ] = o u t v a l 4 ;
out put [ id x++ ] = o u t v a l 5 ;
55 ou tpu t [ id x++ ] = o u t v a l 6 ;
out put [ id x ] = o u t v a l 7 ;
}
Listing 5.8. CUDA kernel achieving a vertical 1D convolution operation after
a preloading of data into shared memory
g l o b a l void k ern el co nvo Sep Sh x8p H ( unsigned char ∗ output , in t j di m
, i n t r )
{
in t i c , j c , p ;
in t k = 2∗ r+1 ;
5 f l o a t o u t v a l 0 =0 .0 , o u t v a l 1 = 0. 0 , ou t v a l 2 =0 .0 , o u t v a l 3 =0.0