Edoardo Aprà*; Karol Kowalski*; Jeff R. Hammond†; Michael Klemm‡* Pacific Northwest National Laboratory, USA† Intel USA‡ Intel, Germany
In this chapter, we describe the excellent performance of NWChem’s CCSD(T) method running on a large-scale hybrid cluster of 460 dual-socket Xeon E5-2600 series nodes each of which is equipped with two Intel Xeon Phi 5110P coprocessor cards (a total of 62.5k hybrid cores). We describe how, without any low-level programming, offload transfers and compute kernels have been optimized. NWChem shows that high-level Fortran code can be brought to the machine at high productivity while maintaining high performance and scalability. This makes this ...
Get High Performance Parallelism Pearls Volume One now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.