198 Designing Scientific Applications on GPUs
in order to verify termination conditions. The experiments performed on stan-
dard QAP benchmarks showed that the GPU implementation using MATA
obtained a speedup of 19× compared to the CPU implementation, compared
with a speedup of only 5× when MATA is not used.
Particle swarm optimization
In [40] Zhou and Tan propose a full GPU implementation of a standard
PSO algorithm. All the data is stored in global memory (velocities, positions,
swarm population, etc). Only working data is copied to shared memory by
each thread. The four steps of the PSO have been parallelized on GPU: fitness
evaluation of the swarm, update of local best and global best of each particle,
and update of velocity and position of each