Chapter 2. GPUs: A Breakthrough Technology
The foundation for affordable and scalable high-performance data analytics already exists based on steady advances in CPU, memory, storage, and networking technologies. As noted in Chapter 1, these evolutionary changes have shifted the performance bottleneck from memory I/O to compute.
In an attempt to address the need for faster processing at scale, CPUs now contain as many as 32 cores. But even the use of multicore CPUs deployed in large clusters of servers can make sophisticated analytical applications unaffordable for all but a handful of organizations.
A far more cost-effective way to address the compute performance bottleneck today is the graphics processing unit (GPU). GPUs are capable of processing data up to 100 times faster than configurations containing CPUs alone. The reason for such a dramatic improvement is their massively parallel processing capabilities, with some GPUs containing nearly 6,000 cores—upwards of 200 times more than the 16 to 32 cores found in today’s most powerful CPUs. For example, the Tesla V100—powered by the latest NVIDIA Volta GPU architecture, and equipped with 5,120 NVIDIA CUDA cores and 640 NVIDIA Tensor cores—offers the performance of up to 100 CPUs in a single GPU.
The GPU’s small, efficient cores are also better suited to performing similar, repeated instructions in parallel, making it ideal for accelerating the processing-intensive workloads common in today’s data analysis applications.