Florian Wende*; Michael Klemm†; Thomas Steinke*; Alexander Reinefeld** Zuse Institute Berlin, Germany† Intel, Germany
This chapter describes the principle of concurrent kernel offloading to the coprocessor and the aspects which need be considered for optimizing the performance. Concurrent kernel offload targets application scenarios with many small-scale workloads that cannot exploit the provided resources on their own. This chapter explains how the computational throughput for multiple small-scale workloads can be improved on the Intel Xeon Phi coprocessor by concurrent kernel execution using the offload programming model. Each of the optimization steps are elaborated and illustrated by ...
Get High Performance Parallelism Pearls Volume One now with O’Reilly online learning.
O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.