We will now look at how we can write our very own wrappers for some pre-packaged binary CUDA library functions using Ctypes. In particular, we will be writing wrappers for the CUDA Driver API, which will allow us to perform all of the necessary operations needed for basic GPU usage—including GPU initialization, memory allocation/transfers/deallocation, kernel launching, and context creation/synchronization/destruction. This is a very powerful piece of knowledge; it will allow us to use our GPU without going through PyCUDA, and also without writing any cumbersome host-side C-function wrappers.
We will now write a small module that will act as a wrapper library for the CUDA Driver API. Let's talk about ...