Chapter 8

Optimizing Gather/Scatter Patterns

Simon J. Pennycook*; Christopher J. Hughes; Mikhail Smelyanskiy    * Intel Corporation, UK Intel Corporation, USA

Abstract

Many modern microarchitectures rely on single-instruction multiple-data execution to provide high compute ­capabilities in an energy efficient manner. Such microarchitectures—including those employed by the most recent Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors—are optimized and/or better suited to dealing with contiguous loads and stores than non-contiguous loads (i.e., gathers) and stores (i.e., scatters). Gather and scatter behavior are more complex than that of contiguous loads and stores (e.g., it may depend on how close together the data items being read/written ...

Get High Performance Parallelism Pearls Volume One now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.