6 Vectorization: FLOPs for free

This chapter covers

  • The importance of vectorization
  • The kind of parallelization provided by a vector unit
  • Different ways you can access vector parallelization
  • Performance benefits you can expect

Processors have special vector units that can load and operate on more than one data element at a time. If we’re limited by floating-point operations, it is absolutely necessary to use vectorization to reach peak hardware capabilities. Vectorization is the process of grouping operations together so more than one can be done at a time. But, adding more flops to hardware capability when an application is memory bound has limited benefit. Take note, most applications are memory bound. Compilers can be powerful, but as ...

