IBM High-Performance Computing Insights with IBM Power System AC922 Clustered Solution

Book description

This IBM® Redbooks® publication documents and addresses topics to set up a complete infrastructure environment and tune the applications to use an IBM POWER9™ hardware architecture with the technical computing software stack.

This publication is driven by a CORAL project solution. It explores, tests, and documents how to implement an IBM High-Performance Computing (HPC) solution on a POWER9 processor-based system by using IBM technical innovations to help solve challenging scientific, technical, and business problems.
This book documents the HPC clustering solution with InfiniBand on IBM Power Systems™ AC922 8335-GTH and 8335-GTX servers with NVIDIA Tesla V100 SXM2 graphics processing units (GPUs) with NVLink, software components, and the IBM Spectrum™ Scale parallel file system.

This solution includes recommendations about the components that are used to provide a cohesive clustering environment that includes job scheduling, parallel application tools, scalable file systems, administration tools, and a high-speed interconnect.

This book is divided into three parts: Part 1 focuses on the planners of the solution, Part 2 focuses on the administrators, and Part 3 focuses on the developers.

This book targets technical professionals (consultants, technical support staff, IT architects, and IT specialists) who are responsible for delivering cost-effective HPC solutions that help uncover insights among clients' data so that they can act to optimize business results, product development, and scientific discoveries.

Table of contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Part 1 Planning
  5. Chapter 1. Introduction to IBM high-performance computing
    1. 1.1 Overview of HPC
    2. 1.2 Reasons for implementing HPC on IBM POWER9 processor-based systems
    3. 1.3 Overview of the POWER9 processor-based CORAL Project
  6. Chapter 2. IBM Power System AC922 server for HPC overview
    1. 2.1 Power AC922 server for HPC
    2. 2.2 Functional component description
      1. 2.2.1 Power AC922 models
      2. 2.2.2 POWER9 processor
      3. 2.2.3 Memory subsystem
      4. 2.2.4 PCI adapters
      5. 2.2.5 IBM CAPI2
      6. 2.2.6 NVLink 2.0
      7. 2.2.7 Baseboard management controller
  7. Chapter 3. Software stack
    1. 3.1 Red Hat Enterprise Linux
    2. 3.2 Device drivers
      1. 3.2.1 Mellanox OpenFabrics Enterprise Distribution
      2. 3.2.2 NVIDIA CUDA
    3. 3.3 Development environment software
      1. 3.3.1 IBM XL compilers
      2. 3.3.2 LLVM
      3. 3.3.3 IBM Engineering and Scientific Subroutine Library
      4. 3.3.4 IBM Parallel Engineering and Scientific Subroutine Library
      5. 3.3.5 IBM Spectrum MPI
      6. 3.3.6 IBM Parallel Performance Toolkit
    4. 3.4 Workload management
      1. 3.4.1 IBM Spectrum Load Sharing Facility
    5. 3.5 Cluster management software
      1. 3.5.1 Extreme Cluster and Cloud Administration Toolkit
      2. 3.5.2 Cluster Administration and Storage Tools
      3. 3.5.3 Mellanox Unified Fabric Manager
    6. 3.6 IBM Storage and file systems
      1. 3.6.1 IBM Spectrum Scale
      2. 3.6.2 IBM Elastic Storage Server
  8. Chapter 4. Reference architecture
    1. 4.1 Large HPC cluster architecture
    2. 4.2 Medium HPC cluster architecture
    3. 4.3 Software stack mapping
    4. 4.4 Generic sizing of nodes
    5. 4.5 Ethernet network layout
    6. 4.6 InfiniBand network
      1. 4.6.1 InfiniBand network topologies
      2. 4.6.2 Fat tree topology
    7. 4.7 Job launch overview
  9. Part 2 Deployment
  10. Chapter 5. Nodes and software deployment
    1. 5.1 Deployment overview
    2. 5.2 System management
      1. 5.2.1 The Intelligent Platform Management Interface tool
      2. 5.2.2 OpenBMC
      3. 5.2.3 Firmware update
      4. 5.2.4 Boot order configuration
    3. 5.3 xCAT deployment overview
      1. 5.3.1 xCAT database: Objects and tables
      2. 5.3.2 xCAT node booting
      3. 5.3.3 xCAT node discovery
      4. 5.3.4 xCAT baseboard management controller discovery
      5. 5.3.5 xCAT installation types: Disks and state
      6. 5.3.6 xCAT network interfaces: Primary and additional
      7. 5.3.7 xCAT software kits
      8. 5.3.8 xCAT synchronizing files
      9. 5.3.9 xCAT version
      10. 5.3.10 xCAT scenario for high-performance computing
    4. 5.4 Initial xCAT management node installation on an IBM Power System LC922 server
      1. 5.4.1 Red Hat Enterprise Linux server
      2. 5.4.2 xCAT packages and installation
      3. 5.4.3 Configuring more network interfaces
      4. 5.4.4 Host name and alias
      5. 5.4.5 xCAT networks
      6. 5.4.6 DNS server
      7. 5.4.7 DHCP server
      8. 5.4.8 Intelligent Platform Management Interface credentials
    5. 5.5 xCAT node discovery
      1. 5.5.1 Verification of network boot configuration and Genesis image files
      2. 5.5.2 Configuring the DHCP dynamic range
      3. 5.5.3 Configuring BMCs to DHCP mode
      4. 5.5.4 Definition of temporary BMC objects
      5. 5.5.5 Defining node objects
      6. 5.5.6 Configuring the host table, DNS, and DHCP servers
      7. 5.5.7 Booting into node discovery
    6. 5.6 xCAT compute nodes (stateless)
      1. 5.6.1 Network interfaces
      2. 5.6.2 Red Hat Enterprise Linux operating system images
      3. 5.6.3 NVIDIA CUDA Toolkit
      4. 5.6.4 Mellanox OpenFabrics Enterprise Distribution
      5. 5.6.5 IBM XL C/C++ runtime libraries
      6. 5.6.6 IBM XL Fortran runtime libraries
      7. 5.6.7 Advance Toolchain runtime libraries
      8. 5.6.8 IBM Spectrum MPI
      9. 5.6.9 IBM Parallel Performance Toolkit
      10. 5.6.10 IBM Engineering and Scientific Subroutine Library
      11. 5.6.11 IBM Parallel Engineering and Scientific Subroutine Library
      12. 5.6.12 IBM Spectrum Scale (formerly IBM GPFS)
      13. 5.6.13 PGI runtime libraries
      14. 5.6.14 IBM Spectrum LSF integration with Cluster Systems Management
      15. 5.6.15 Synchronizing the configuration files
      16. 5.6.16 Generating and packing the image
      17. 5.6.17 Node provisioning
      18. 5.6.18 Postinstallation verification
    7. 5.7 xCAT login nodes (stateful)
  11. Chapter 6. Cluster Administration and Storage Tools
    1. 6.1 Cluster Systems Management
    2. 6.2 Preparing CSM
      1. 6.2.1 Software dependencies
      2. 6.2.2 Installation
      3. 6.2.3 CSM RPMs overview
      4. 6.2.4 Installing CSM on to the management node
      5. 6.2.5 Installing CSM on to the service node
      6. 6.2.6 Installing CSM in the login, launch, and workload manager nodes
      7. 6.2.7 Installing CSM in the compute nodes
      8. 6.2.8 Configuration
      9. 6.2.9 Configuring the CSM database
      10. 6.2.10 Default configuration files
      11. 6.2.11 Configuring SSL
      12. 6.2.12 Heartbeat interval
      13. 6.2.13 Environmental buckets
      14. 6.2.14 Prolog and epilog scripts
      15. 6.2.15 CSM Pluggable Authentication Module
      16. 6.2.16 Starting the CSM daemons
      17. 6.2.17 Running the infrastructure health check
      18. 6.2.18 Setting up the environment for job launch
      19. 6.2.19 Installing the configuring the CSM REST daemon
      20. 6.2.20 Uninstalling the CSM daemons
      21. 6.2.21 Diskless images
    3. 6.3 Burst Buffer
      1. 6.3.1 Installing Burst Buffer
      2. 6.3.2 Using the Burst Buffer
      3. 6.3.3 Troubleshooting
  12. Part 3 Application development
  13. Chapter 7. Compilation, execution, and application development
    1. 7.1 Compiler options
      1. 7.1.1 IBM XL compiler options
      2. 7.1.2 GNU Compiler Collection compiler options
    2. 7.2 Porting applications to IBM Power Systems servers
    3. 7.3 IBM Engineering and Scientific Subroutine Library
      1. 7.3.1 IBM ESSL Compilation in Fortran, IBM XL C/C++, and GCC/G++
      2. 7.3.2 IBM ESSL example
    4. 7.4 IBM Parallel Engineering and Scientific Subroutine Library
      1. 7.4.1 Program development
      2. 7.4.2 Using GPUs with the IBM Parallel ESSL
      3. 7.4.3 Compilation
    5. 7.5 Using POWER9 vectorization
      1. 7.5.1 AltiVec operations with GCC
      2. 7.5.2 AltiVec operations with IBM XL
    6. 7.6 Development models
      1. 7.6.1 OpenMP programs with IBM Parallel Environment
      2. 7.6.2 CUDA C programs with the NVIDIA CUDA Toolkit
      3. 7.6.3 OpenACC
      4. 7.6.4 IBM XL C/C++ and Fortran offloading
      5. 7.6.5 MPI programs with IBM Parallel Environment V2.3
      6. 7.6.6 Hybrid MPI and CUDA programs with IBM Parallel Environment
      7. 7.6.7 OpenSHMEM programs in IBM Parallel Environment
      8. 7.6.8 Parallel Active Messaging Interface programs
      9. 7.6.9 MPI programs that use IBM Spectrum MPI
      10. 7.6.10 Migrating from an IBM Parallel Environment Runtime Edition environment to IBM Spectrum MPI
      11. 7.6.11 Using IBM Spectrum MPI
  14. Chapter 8. Running parallel software, performance enhancement, and scalability testing
    1. 8.1 Controlling the running of multithreaded applications
      1. 8.1.1 Running OpenMP applications
      2. 8.1.2 Setting and retrieving process affinity at run time
      3. 8.1.3 Controlling the NUMA policy for processes and shared memory
    2. 8.2 Performance enhancements and scalability tests
      1. 8.2.1 IBM ESSL execution in multiple CPUs and GPUs
      2. 8.2.2 OpenACC execution and scalability
    3. 8.3 Using IBM Parallel Environment V2.3
      1. 8.3.1 Running applications
      2. 8.3.2 Managing applications
      3. 8.3.3 Running OpenSHMEM programs
    4. 8.4 Using IBM Spectrum LSF
      1. 8.4.1 Submit jobs
      2. 8.4.2 Managing jobs
    5. 8.5 IBM Spectrum LSF Job Step Manager
    6. 8.6 Running tasks with IBM Spectrum MPI
  15. Chapter 9. Measuring and tuning applications
    1. 9.1 Effects of basic performance tuning techniques
      1. 9.1.1 Performance effect of a rational choice of SMT mode
      2. 9.1.2 Effect of optimization options on performance
      3. 9.1.3 Favorable modes and options for applications from the NPB suite
      4. 9.1.4 Importance of binding threads to logical processors
    2. 9.2 General methodology of performance benchmarking
      1. 9.2.1 Defining the purpose of performance benchmarking
      2. 9.2.2 Benchmarking plans
      3. 9.2.3 Defining the performance metric and constraints
      4. 9.2.4 Defining the success criteria
      5. 9.2.5 Correctness and determinacy
      6. 9.2.6 Keeping the log of benchmarking
      7. 9.2.7 Probing scalability
      8. 9.2.8 Evaluation of performance on a favorable number of cores
      9. 9.2.9 Evaluation of scalability
      10. 9.2.10 Conclusions
      11. 9.2.11 Summary
    3. 9.3 Sample code for the construction of thread affinity strings
    4. 9.4 IBM Engineering and Scientific Subroutine Library performance results
    5. 9.5 GPU tuning
      1. 9.5.1 GPU processing modes
      2. 9.5.2 CUDA Multi-Process Service
    6. 9.6 Application development and tuning tools
      1. 9.6.1 Parallel Performance Toolkit
      2. 9.6.2 Parallel application debuggers
      3. 9.6.3 Eclipse for Parallel Application Developers
      4. 9.6.4 NVIDIA Nsight Eclipse Edition for CUDA C/C++
      5. 9.6.5 Command-line tools for CUDA C/C++
  16. Appendix A. Additional material
    1. Locating the web material
    2. Using the web material
  17. Related publications
    1. IBM Redbooks
    2. Online resources
    3. Help from IBM
  18. Back cover

Product information

  • Title: IBM High-Performance Computing Insights with IBM Power System AC922 Clustered Solution
  • Author(s): Dino Quintero, Miguel Gomez Gonzalez, Ahmad Y Hussein, Jan-Frode Myklebust
  • Release date: May 2019
  • Publisher(s): IBM Redbooks
  • ISBN: 9780738457451