Memory Handling with CUDA
In the conventional CPU model we have what is called a linear or flat memory model. This is where any single CPU core can access any memory location without restriction. In practice, for CPU hardware, you typically see a level one (L1), level two (L2), and level three (L3) cache. Those people who have optimized CPU code or come from a high-performance computing (HPC) background will be all too familiar with this. For most programmers, however, it’s something they can easily abstract away.
Abstraction has been a trend in modern programming language, where the programmer is further and further removed from the underlying hardware. While this can lead to higher levels of productivity, as problems ...