Memory access in GPUs
By now, it should hopefully be clear to you that fast and local memory is key to the performance of the kinds of workloads we are offloading to our processor when doing deep learning. It is, however, not just the quantity and proximity of memory that matters, but also how this memory is accessed. Think of sequential access versus random access performance on hard drives, as the principle is the same.
Why does this matter for DNNs? Put simply, they are high-dimensional structures that have to be embedded, ultimately, in a 1D space for the memory that feeds our ALUs. Modern (vector) GPUs, built for graphics workloads, assume that they will be accessing adjacent memory, which is where one part of a 3D scene will be stored ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access