Building a Linux HPC Cluster with xCAT

Book description

This IBM Redbooks publication will guide system architects and systems engineers toward a basic understanding of cluster technology, terminology, and the installation of a Linux High-Performance Computing (HPC) cluster (a Beowulf type of cluster) into an IBM eServer Cluster 1300/Cluster 1350.

This book focus on xCAT Version 1.1.0 (Extreme Cluster Administration Toolkit) for installation and administration. All nodes and components of the cluster, such as compute nodes and management nodes, are installed with xCAT. This toolkit is a collection of scripts, tables, and commands used to build and administer a Beowulf type of cluster or a farm of replicated nodes.

xCAT commands and configuration files are explained in the appendixes of the book. Detailed procedures on how to properly configure the Red Hat Linux 7.3 operating system in the nodes of an HPC cluster are also presented.

Table of contents

  1. Figures
  2. Tables
  3. Notices
    1. Trademarks
  4. Preface
    1. The team that wrote this redbook
    2. Acknowledgements
    3. Become a published author
    4. Comments welcome
  5. Chapter 1: HPC clustering concepts
    1. What a cluster is
      1. High-Performance Computing cluster
      2. Beowulf clusters
    2. IBM Linux clusters
      1. xSeries custom-order cluster
      2. IBM eServer Cluster 1300
      3. The new IBM eServer Cluster 1350
    3. Making up an HPC cluster
      1. Logical functions that a node can provide
      2. xSeries models used in our cluster
      3. Other cluster components
    4. Software
      1. IBM Cluster Systems Management for Linux
  6. Chapter 2: xCAT introduction
    1. What xCAT is
      1. Download xCAT
      2. Directory structure
    2. Installing a Linux cluster with xCAT
      1. Planning
      2. Hardware preparation
      3. Management node installation
      4. Cluster installation
  7. Chapter 3: Hardware preparation
    1. Node hardware installation
    2. Populating the rack and cabling (1/2)
    3. Populating the rack and cabling (2/2)
    4. Cables in our cluster
  8. Chapter 4: Management node installation
    1. Resources to install Red Hat Linux
    2. Red Hat installation steps
    3. Post-installation steps
      1. Copy Red Hat install CD-ROMs
      2. Install Red Hat errata
      3. Updating third party drivers
  9. Chapter 5: Management node configuration
    1. Install xCAT
    2. Populate tables
      1. Site definition
      2. Hosts file
      3. List of nodes and groups
      4. Installation resources
      5. Node types
      6. Node hardware management
      7. MPN topology
      8. MPA configuration
      9. Power control with APC MasterSwitch
      10. MAC address collection using Cisco 3500-series
      11. Console server configuration
      12. Password table
    3. Configure management node services
      1. Turn off services you do not want
      2. Configure system logging
      3. Configure SNMP
      4. Configure TFTP
      5. Configure NFS
      6. Configure NTP
      7. Configure SSH
      8. Configure the console server
      9. Configure DNS
      10. Configure DHCP
    4. Final preparation
      1. Prepare the boot files for stages 2 and 3
      2. Prepare the Kickstart files
      3. Prepare the post installation directory structure
  10. Chapter 6: Cluster installation
    1. Stage 1: Hardware setup
      1. Network switch setup
      2. Management Processor Adapter setup
      3. Terminal server setup
      4. APC MasterSwitch setup
      5. BIOS and firmware updates
    2. Stage 2: MAC address collection
    3. Stage 3: Management processor setup
    4. Stage 4: Node installation
      1. Creating a template file
      2. Creating a custom kernel RPM image
      3. Creating a custom kernel tarball image
      4. Installing the nodes
      5. Post-installation
  11. Appendix A: xCAT commands
    1. Command reference
    2. addclusteruser - Add a cluster user
      1. Options
      2. Files
      3. Diagnostics
      4. Examples
      5. Bugs
      6. Author
    3. mpacheck - Check MPA and MPA settings
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    4. mpareset - Reset MPAs
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    5. mpascan - Scan MPA for RS485 chained nodes
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    6. mpasetup - Set MPA settings
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Author
      8. Bugs
      9. See also
    7. nodels - List node properties from tables
      1. Synopsis
      2. Description
      3. Options
      4. Author
    8. noderange - Generate a list of node names
      1. Synopsis
      2. Description
      3. Options
      4. Environmental variables
      5. Files
      6. Example
      7. Bugs/features
      8. Author
    9. nodeset - Set the boot state for a noderange
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    10. pping - Parallel ping
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    11. prcp - Parallel remote copy
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    12. prsync - parallel rsync
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    13. psh - Parallel remote shell
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    14. rcons - remote console
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    15. reventlog - Retrieve or clear remote hardware event logs
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    16. rinstall - Remote network install
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    17. rinv - Remote hardware inventory
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    18. rpower - Remote power control
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    19. rreset - Remote hard reset
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    20. rvid - Remote video (VGA)
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    21. rvitals - Remote hardware vitals
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    22. wcons - Windowed remote console
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    23. winstall - Windowed remote network install
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    24. wkill - Windowed remote console kill
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Examples
      7. Bugs
      8. Author
      9. See also
    25. wvid - Windowed remote video (VGA)
      1. Synopsis
      2. Description
      3. Options
      4. Files
      5. Diagnostics
      6. Example
      7. Bugs
      8. Author
      9. See also
  12. Appendix B: xCAT configuration tables
    1. site.tab
    2. nodelist.tab
    3. noderes.tab
    4. nodetype.tab
    5. nodehm.tab
    6. mpa.tab
    7. apc.tab
    8. apcp.tab
    9. mac.tab
    10. cisco3500.tab
    11. passwd.tab
    12. conserver.tab
    13. rtel.tab
    14. tty.tab
  13. Appendix C: Other hardware components
    1. IBM Advanced Systems Management Adapter
    2. Equinox ESP Terminal Servers
    3. iTouch Communications IR-8000 Terminal Servers
    4. Myrinet
      1. Myrinet switch layout
      2. Setting up the Myrinet switch
      3. Installing the Myrinet software
  14. Appendix D: Application examples
    1. User accounts
    2. MPICH
    3. Persistance of Vision Raytracer (POVray)
      1. Serial POVray
      2. Distributed POVray using MPI-POVray
    4. High Performance Linpack (HPL)
      1. Installing ATLAS
      2. Installing HPL
  15. Related publications
    1. IBM Redbooks
      1. Other resources
    2. Referenced Web sites
    3. How to get IBM Redbooks
      1. IBM Redbooks collections
  16. Glossary
  17. Index (1/2)
  18. Index (2/2)
  19. Back cover

Product information

  • Title: Building a Linux HPC Cluster with xCAT
  • Author(s): Luis Ferreira, Christopher Turcksin, Brad Elkin, Scott Denham, Benjamin Khoo, Matt Bohnsack, Egan Ford
  • Release date: September 2002
  • Publisher(s): IBM Redbooks
  • ISBN: None