Implementing InfiniBand on IBM System p

Book description

This IBM Redbooks publication will illustrate the installation procedures of InfiniBand on the IBM System p5 with Linux and AIX 5L. InfiniBand adapters, switches, and network management software will be described in this publication. The IBM HPC stack will be tested with InfiniBand (Parallel Environment, LoadLeveler, GPFS, ESSL, and Parallel ESSL). Communication protocols such as MPI and LAPI will be tested and observations will be illustrated in this book.

This book is the complete guide on how to implement InfiniBand on the IBM System p5. It is targeted at all IT professionals looking to understand what is behind the InfiniBand technologies, how to deploy it, and what is the IBM solution incorporating this technology.

Table of contents

  1. Notices
    1. Trademarks
  2. Preface
    1. The team that wrote this book
    2. Become a published author
    3. Comments welcome
  3. Part 1: InfiniBand architecture
    1. Chapter 1: Introduction
      1. Introduction to InfiniBand
    2. Chapter 2: Introduction to InfiniBand technology
      1. A technical introduction to InfiniBand
      2. Markets
      3. Application clustering
      4. I/O architectures: fabric versus bus
        1. Shared bus architecture
        2. New interconnects compliment InfiniBand
        3. Bandwidth out of the box
      5. InfiniBand technical overview
      6. InfiniBand layers (1/2)
      7. InfiniBand layers (2/2)
        1. Physical layer
        2. Link layer
        3. Network layer
        4. Transport layer
        5. Upper layers
        6. InfiniBand elements
      8. InfiniBand architecture
        1. Channel adapters
        2. The IB switch
      9. InfiniBand components
        1. Router
        2. Subnet manager
        3. Management infrastructure
      10. InfiniBand support for the Direct Access Programming Library (DAPL)
      11. Adapter sharing
      12. Summary
    3. Chapter 3: InfiniBand hardware overview and implementation
      1. Limitations and considerations
      2. Features and benefits of InfiniBand on System p
        1. AIX supported environments
        2. Linux on System p: SLES9 SP3 supported environments
      3. Hardware requirements
      4. Hardware Management Console (HMC)
        1. Cluster Ready Hardware Server (CRHS) mode
        2. Why move the DHCP server
      5. Supported System p servers
      6. Supported host channel adapters (HCA) (1/2)
      7. Supported host channel adapters (HCA) (2/2)
        1. Sharing the host channel adapter (HCA)
      8. Logical partitioning (LPAR)
        1. Make it smaller (micro partitions)
      9. Cisco InfiniBand switches
        1. Cisco SFS 7000P InfiniBand Server Switch
        2. Cisco SFS 7008P InfiniBand Server Switch
        3. Using the switch: user and passwords
      10. InfiniBand cables
        1. Cabling with octopus cables
      11. Management server
        1. A private network DHCP IP configuration versus a static IP configuration
      12. IBM Network Manager (IBM NM) (1/3)
      13. IBM Network Manager (IBM NM) (2/3)
      14. IBM Network Manager (IBM NM) (3/3)
  4. Part 2: Implementation
    1. Chapter 4: InfiniBand on AIX 5L
      1. InfiniBand on AIX (1/3)
      2. InfiniBand on AIX (2/3)
      3. InfiniBand on AIX (3/3)
        1. Hardware requirements
        2. AIX software requirements
        3. Overview of cluster software components
      4. InfiniBand on System p5 running AIX (1/2)
      5. InfiniBand on System p5 running AIX (2/2)
        1. Implementation of InfiniBand architecture (IBA) on System p5
        2. IP over InfiniBand (IPoIB) implementation
        3. AIX InfiniBand filesets and components
      6. Test cluster layout and description
        1. Planning for installation
        2. Our environment
      7. Installation and configuration of the AIX CSM Management server (1/2)
      8. Installation and configuration of the AIX CSM Management server (2/2)
        1. Installation of AIX
        2. Installing the AIX 5L management server
        3. NIM configuration
        4. Updating NIM
        5. Verify InfiniBand filesets
      9. Installation and configuration of AIX nodes (1/6)
      10. Installation and configuration of AIX nodes (2/6)
      11. Installation and configuration of AIX nodes (3/6)
      12. Installation and configuration of AIX nodes (4/6)
      13. Installation and configuration of AIX nodes (5/6)
      14. Installation and configuration of AIX nodes (6/6)
        1. Pre-installation tasks
        2. Get network adapter information
        3. Further configuration
        4. Preparing NIM for nodes (clients) installation
        5. Verification of the AIX installation
        6. Configuring InfiniBand adapters on AIX nodes
        7. Verification of the InfiniBand configuration
      15. GPFS installation and configuration (1/2)
      16. GPFS installation and configuration (2/2)
        1. Communication considerations for GPFS
        2. GPFS installation
        3. Monitoring GPFS over InfiniBand
    2. Chapter 5: IBM System p cluster with InfiniBand and SUSE SLES 9
      1. InfiniBand considerations for SLES 9 (1/3)
      2. InfiniBand considerations for SLES 9 (2/3)
      3. InfiniBand considerations for SLES 9 (3/3)
        1. Supported hardware
        2. Software components and versions
        3. InfiniBand implementation on SLES 9
      4. Introduction of cluster software components for SLES 9 (1/3)
      5. Introduction of cluster software components for SLES 9 (2/3)
      6. Introduction of cluster software components for SLES 9 (3/3)
        1. HPC cluster overview
        2. System management software for SLES9 clustering
        3. Software packages for High Performance Computing
      7. Installation and configuration (1/8)
      8. Installation and configuration (2/8)
      9. Installation and configuration (3/8)
      10. Installation and configuration (4/8)
      11. Installation and configuration (5/8)
      12. Installation and configuration (6/8)
      13. Installation and configuration (7/8)
      14. Installation and configuration (8/8)
        1. Planning
        2. Sample SLES 9 Cluster layout and description
        3. Installation steps for setting up a SLES 9 cluster
  5. Part 3: Support
    1. Chapter 6: Problem determination
      1. IB switch troubleshooting
        1. Physical layer issues
        2. InfiniBand switch firmware upgrade process
      2. System p troubleshooting
        1. HCA troubleshooting
        2. Logs available for troubleshooting
        3. HMC/IBM Network Manager troubleshooting
      3. AIX troubleshooting
      4. Troubleshooting IB on SLES 9 (1/4)
      5. Troubleshooting IB on SLES 9 (2/4)
      6. Troubleshooting IB on SLES 9 (3/4)
      7. Troubleshooting IB on SLES 9 (4/4)
        1. The dmesg tool
        2. eHCA device driver version
        3. Troubleshooting IP over InfiniBand issues in Linux
        4. Troubleshooting an HPC User Space issue under Linux
      8. Application layer troubleshooting (1/2)
      9. Application layer troubleshooting (2/2)
        1. LoadLeveler issues
        2. CSM issues
    2. Chapter 7: Best practices
      1. CSM
        1. CSM and NIM strategy
        2. Back up CSM data
        3. NIM and resolv.conf
        4. Nodegroups
      2. Automatic InfiniBand configuration for many nodes (1/2)
      3. Automatic InfiniBand configuration for many nodes (2/2)
        1. Configuting IB adapters in CSM for AIX
        2. SLES
      4. PowerPC productivity tools for SLES
      5. Physical server build considerations
    3. Chapter 8: Monitoring tools for InfiniBand adapter
      1. Monitoring tools for AIX 5L and SLES 9 (1/3)
      2. Monitoring tools for AIX 5L and SLES 9 (2/3)
      3. Monitoring tools for AIX 5L and SLES 9 (3/3)
        1. Useful commands for AIX 5L
        2. Monitoring tools for SLES 9
      4. Useful commands for LoadLeveler with InfiniBand (1/2)
      5. Useful commands for LoadLeveler with InfiniBand (2/2)
  6. Part 4: Appendixes
    1. Appendix A: InfiniBand security
      1. IB Protocol layer
      2. IP layer
    2. Appendix B: Cluster Ready Hardware Server
      1. Cluster Ready Hardware Server basics
      2. CRHS and InfiniBand switches
    3. Appendix C: Function cross table: Linux for AIX sysadmins
      1. Major features
      2. Common system files
      3. Task-specific command comparison
    4. Appendix D: Installing OFED and eHCA on Linux Kernel 2.6.16, 2.6.17, and 2.6.18
      1. Steps to install OFED and eHCA on Kernel 2.6.16, 2.6.17, and 2.6.18 (1/2)
      2. Steps to install OFED and eHCA on Kernel 2.6.16, 2.6.17, and 2.6.18 (2/2)
  7. Abbreviations and acronyms
  8. Related publications
    1. IBM Redbooks publications
    2. Other publications
    3. Online resources
    4. How to get IBM Redbooks publications
    5. Help from IBM
  9. Index (1/2)
  10. Index (2/2)
  11. Back cover

Product information

  • Title: Implementing InfiniBand on IBM System p
  • Author(s): Dino Quintero, Dr. Norbert Conrad, Rob Desjarlais, Marc-Eric Kahle, Jung-Hoon Kim, Hoang-Nam Nguyen, Tony Pirraglia, Fernando Pizzano, Shi Lei Yao, Octavian Lascu
  • Release date: September 2007
  • Publisher(s): IBM Redbooks
  • ISBN: 9780738486512