Edoardo Aprà*; Karol Kowalski*; Jeff R. Hammond†; Michael Klemm‡* Pacific Northwest National Laboratory, USA† Intel USA‡ Intel, Germany
In this chapter, we describe the excellent performance of NWChem’s CCSD(T) method running on a large-scale hybrid cluster of 460 dual-socket Xeon E5-2600 series nodes each of which is equipped with two Intel Xeon Phi 5110P coprocessor cards (a total of 62.5k hybrid cores). We describe how, without any low-level programming, offload transfers and compute kernels have been optimized. NWChem shows that high-level Fortran code can be brought to the machine at high productivity while maintaining high performance and scalability. This makes this ...
Get High Performance Parallelism Pearls Volume One now with O’Reilly online learning.
O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.