O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

POWER8 High-performance Computing Guide IBM Power System S822LC (8335-GTB) Edition

Book Description

Abstract

This IBM® Redbooks® publication documents and addresses topics to provide step-by-step customizable application and programming solutions to tune application and workloads to use IBM Power Systems™ hardware architecture. This publication explores, tests, and documents the solution to use the architectural technologies and the software solutions that are available from IBM to help solve challenging technical and business problems.

This publication also demonstrates and documents that the combination of IBM high-performance computing (HPC) solutions (hardware and software) delivers significant value to technical computing clients who are in need of cost-effective, highly scalable, and robust solutions.

First, the book provides a high-level overview of the HPC solution, including all of the components that makes the HPC cluster: IBM Power System S822LC (8335-GTB), software components, interconnect switches, and the IBM Spectrum™ Scale parallel file system. Then, the publication is divided in three parts: Part 1 focuses on the developers, Part 2 focuses on the administrators, and Part 3 focuses on the evaluators and planners of the solution.

The IBM Redbooks publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective HPC solutions that help uncover insights from vast amounts of client’s data so they can optimize business results, product development, and scientific discoveries.

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  4. Chapter 1. IBM Power System S822LC for HPC server overview
    1. 1.1 IBM Power System S822LC for HPC server
      1. 1.1.1 IBM POWER8 processor
      2. 1.1.2 NVLink
    2. 1.2 HPC system hardware components
      1. 1.2.1 Login nodes
      2. 1.2.2 Management nodes
      3. 1.2.3 Compute nodes
      4. 1.2.4 Compute racks
      5. 1.2.5 High-performance interconnect
      6. 1.2.6 Management and operating system
      7. 1.2.7 Parallel file system
    3. 1.3 HPC system software components
      1. 1.3.1 System software
      2. 1.3.2 Application development software
      3. 1.3.3 Application software
    4. 1.4 HPC system solution
      1. 1.4.1 Compute nodes
      2. 1.4.2 Management node
      3. 1.4.3 Login node
      4. 1.4.4 Combining the management and the login node
      5. 1.4.5 Parallel file system
      6. 1.4.6 High-performance interconnect switch
  5. Part 1 Developers guide
  6. Chapter 2. Compilation, execution, and application development
    1. 2.1 Compiler options
      1. 2.1.1 IBM XL compiler options
      2. 2.1.2 GCC compiler options
    2. 2.2 Porting applications to IBM Power Systems
    3. 2.3 IBM Engineering and Scientific Subroutine Library
      1. 2.3.1 ESSL Compilation in Fortran, XL C/C++, and GCC/G++
      2. 2.3.2 ESSL example
    4. 2.4 Parallel ESSL
      1. 2.4.1 Program development
      2. 2.4.2 Using GPUs with Parallel ESSL
      3. 2.4.3 Compilation
    5. 2.5 Using POWER8 vectorization
      1. 2.5.1 AltiVec operations with GNU GCC
      2. 2.5.2 AltiVec operations with IBM XL
    6. 2.6 Development models
      1. 2.6.1 OpenMP programs with the IBM Parallel Environment
      2. 2.6.2 CUDA C programs with the NVIDIA CUDA Toolkit
      3. 2.6.3 OpenACC
      4. 2.6.4 IBM XL C/C++ and Fortran offloading
      5. 2.6.5 MPI programs with IBM Parallel Environment v2.3
      6. 2.6.6 Hybrid MPI and CUDA programs with IBM Parallel Environment
      7. 2.6.7 OpenSHMEM programs with the IBM Parallel Environment
      8. 2.6.8 Parallel Active Messaging Interface programs
      9. 2.6.9 MPI programs with IBM Spectrum MPI
      10. 2.6.10 Migrating from IBM PE Runtime Edition to IBM Spectrum MPI
      11. 2.6.11 Using Spectrum MPI
  7. Chapter 3. Running parallel software, performance enhancement, and scalability testing
    1. 3.1 Controlling the running of multithreaded applications
      1. 3.1.1 Running OpenMP applications
      2. 3.1.2 Setting and retrieving process affinity at run time
      3. 3.1.3 Controlling NUMA policy for processes and shared memory
    2. 3.2 Performance enhancements and scalability tests
      1. 3.2.1 ESSL execution in multiple CPUs and GPUs
      2. 3.2.2 OpenACC execution and scalability
      3. 3.2.3 XL Offload execution and scalability
    3. 3.3 Using IBM Parallel Environment v2.3
      1. 3.3.1 Running applications
      2. 3.3.2 Managing application
      3. 3.3.3 Running OpenSHMEM programs
    4. 3.4 Using the IBM Spectrum LSF
      1. 3.4.1 Submit jobs
      2. 3.4.2 Manage jobs
    5. 3.5 Running tasks with IBM Spectrum MPI
  8. Chapter 4. Measuring and tuning applications
    1. 4.1 Effects of basic performance tuning techniques
      1. 4.1.1 Performance effect of a Rational choice of an SMT mode
      2. 4.1.2 Effect of optimization options on performance
      3. 4.1.3 Favorable modes and options for applications from the NPB suite
      4. 4.1.4 Importance of binding threads to logical processors
    2. 4.2 General methodology of performance benchmarking
      1. 4.2.1 Defining the purpose of performance benchmarking
      2. 4.2.2 Benchmarking plans
      3. 4.2.3 Defining the performance metric and constraints
      4. 4.2.4 Defining the success criteria
      5. 4.2.5 Correctness and determinacy
      6. 4.2.6 Keeping the log of benchmarking
      7. 4.2.7 Probing the scalability
      8. 4.2.8 Evaluation of performance on a favorable number of cores
      9. 4.2.9 Evaluation of scalability
      10. 4.2.10 Conclusions
      11. 4.2.11 Summary
    3. 4.3 Sample code for the construction of thread affinity strings
    4. 4.4 ESSL performance results
    5. 4.5 GPU tuning
      1. 4.5.1 Power Cap Limit
      2. 4.5.2 CUDA Multi-Process Service
    6. 4.6 Application development and tuning tools
      1. 4.6.1 Parallel Performance Toolkit
      2. 4.6.2 Parallel application debuggers
      3. 4.6.3 Eclipse for Parallel Application Developers
      4. 4.6.4 NVIDIA Nsight Eclipse Edition for CUDA C/C++
      5. 4.6.5 Command-line tools for CUDA C/C++
  9. Part 2 Administrator’s guide
  10. Chapter 5. Node and software deployment
    1. 5.1 Software stack
    2. 5.2 System management
      1. 5.2.1 Frequently used commands with the IPMItool
      2. 5.2.2 Boot order configuration
      3. 5.2.3 System firmware upgrade
    3. 5.3 xCAT overview
      1. 5.3.1 xCAT cluster: Nodes and networks
      2. 5.3.2 xCAT database: Objects and tables
      3. 5.3.3 xCAT node booting
      4. 5.3.4 xCAT node discovery
      5. 5.3.5 xCAT BMC discovery
      6. 5.3.6 xCAT OS installation types: Disks and state
      7. 5.3.7 xCAT network interfaces: Primary and additional
      8. 5.3.8 xCAT software kits
      9. 5.3.9 xCAT synchronizing files
      10. 5.3.10 xCAT version
      11. 5.3.11 xCAT scenario
    4. 5.4 Initial xCAT Management Node installation on S812LC
      1. 5.4.1 RHEL server
      2. 5.4.2 xCAT packages
      3. 5.4.3 Configuring more network interfaces
      4. 5.4.4 Host name and aliases
      5. 5.4.5 xCAT networks
      6. 5.4.6 DNS server
      7. 5.4.7 DHCP server
      8. 5.4.8 IPMI authentication credentials
    5. 5.5 xCAT node discovery
      1. 5.5.1 Verification of network boot configuration and genesis image files
      2. 5.5.2 Configuring the DHCP dynamic range
      3. 5.5.3 Configuring BMCs to DHCP mode
      4. 5.5.4 Definition of temporary BMC objects
      5. 5.5.5 Defining node objects
      6. 5.5.6 Configuring host table, DNS, and DHCP servers
      7. 5.5.7 Booting into Node discovery
    6. 5.6 xCAT Compute Nodes (stateless)
      1. 5.6.1 Network interfaces
      2. 5.6.2 RHEL server
      3. 5.6.3 CUDA Toolkit
      4. 5.6.4 Mellanox OFED
      5. 5.6.5 XL C/C++ runtime libraries
      6. 5.6.6 XL Fortran runtime libraries
      7. 5.6.7 Advance Toolchain runtime libraries
      8. 5.6.8 PGI runtime libraries
      9. 5.6.9 SMPI
      10. 5.6.10 PPT
      11. 5.6.11 ESSL
      12. 5.6.12 PESSL
      13. 5.6.13 Spectrum Scale (formerly GPFS)
      14. 5.6.14 IBM Spectrum LSF
      15. 5.6.15 Synchronize configuration files
      16. 5.6.16 Generating and packing the image
      17. 5.6.17 Node provisioning
      18. 5.6.18 Postinstallation verification
    7. 5.7 xCAT Login Nodes (stateful)
  11. Chapter 6. Cluster monitoring and health checking
    1. 6.1 Basic commands
    2. 6.2 IBM Spectrum LSF tools for job monitoring
      1. 6.2.1 General information about clusters
      2. 6.2.2 Getting information about hosts
      3. 6.2.3 Getting information about jobs and queues
      4. 6.2.4 Administering the cluster
    3. 6.3 Using the BMC for node monitoring
    4. 6.4 Using nvidia-smi tool for GPU monitoring
      1. 6.4.1 Information about jobs on GPU
      2. 6.4.2 All GPU details
      3. 6.4.3 Compute modes
      4. 6.4.4 Persistence mode
      5. 6.4.5 More information
    5. 6.5 Diagnostic and health check framework
      1. 6.5.1 Installation
      2. 6.5.2 Configuration
      3. 6.5.3 Usage
      4. 6.5.4 Adding tests
  12. Part 3 Evaluation and system planning guide
  13. Chapter 7. Hardware components
    1. 7.1 Server features
      1. 7.1.1 Minimum features
      2. 7.1.2 System cooling
    2. 7.2 NVIDIA Tesla P100
    3. 7.3 Operating environment
    4. 7.4 Physical package
    5. 7.5 System architecture
    6. 7.6 POWER8 processor
      1. 7.6.1 POWER8 processor overview
      2. 7.6.2 POWER8 processor core
      3. 7.6.3 Simultaneous multithreading
      4. 7.6.4 Memory access
      5. 7.6.5 On-chip L3 cache innovation and intelligent cache
      6. 7.6.6 L4 cache and memory buffer
      7. 7.6.7 Hardware transactional memory
    7. 7.7 Memory subsystem
      1. 7.7.1 Memory riser cards
      2. 7.7.2 Memory placement rules
      3. 7.7.3 Memory bandwidth
    8. 7.8 POWERAccel
      1. 7.8.1 PCIe
      2. 7.8.2 CAPI
      3. 7.8.3 NVLink
    9. 7.9 System bus
    10. 7.10 PCI adapters
      1. 7.10.1 Slot configuration
      2. 7.10.2 LAN adapters
      3. 7.10.3 Fibre Channel adapters
      4. 7.10.4 CAPI-enabled InfiniBand adapters
      5. 7.10.5 Compute intensive accelerator
      6. 7.10.6 Flash storage adapters
    11. 7.11 System ports
    12. 7.12 Internal storage
      1. 7.12.1 Disk and media features
    13. 7.13 External I/O subsystems
      1. 7.13.1 BMC
    14. 7.14 Mellanox InfiniBand
    15. 7.15 IBM System Storage
      1. 7.15.1 IBM Storwize family
      2. 7.15.2 IBM FlashSystem family
      3. 7.15.3 IBM XIV Storage System
      4. 7.15.4 IBM Elastic Storage Server
  14. Chapter 8. Software stack
    1. 8.1 System management
    2. 8.2 OPAL firmware
    3. 8.3 xCAT
    4. 8.4 RHEL server
    5. 8.5 NVIDIA CUDA Toolkit
    6. 8.6 Mellanox OFED for Linux
    7. 8.7 IBM XL compilers, GCC, and Advance Toolchain
      1. 8.7.1 XL compilers
      2. 8.7.2 GCC and Advance Toolchain
    8. 8.8 IBM Spectrum MPI
      1. 8.8.1 IBM Parallel Performance Toolkit for POWER
    9. 8.9 IBM Engineering and Scientific Subroutine Library and IBM Parallel ESSL
    10. 8.10 IBM Spectrum Scale (formerly IBM GPFS)
    11. 8.11 IBM Spectrum LSF (formerly IBM Platform LSF)
  15. Appendix A. ISV Applications
    1. Application software
  16. Appendix B. Additional material
    1. Locating the Web material
    2. Using the Web material
  17. Related publications
    1. IBM Redbooks
    2. Other publications
    3. Online resources
    4. Help from IBM
  18. Back cover