InfiniBand Network Architecture

Book description

InfiniBand is a new networking specification that revolutionizes the interconnect between processor and IO subsystems in the datacenter environment. InfiniBand delivers better performance, flexibility, and scalability than alternative network architectures. Copper wire and fiber optic InfiniBand links offer data rates ranging from 2.5 Gb/s to 30 Gb/s. This technology easily surpasses the performance of competing network infrastructures by providing kernel bypass and memory-to-memory transfer capabilities.

InfiniBand Network Architecture is a comprehensive guide to InfiniBand technology. It describes all hardware and software operational aspects of InfiniBand networking. Using the same building-block approach found in all of the books in the PC System Architecture Series, this book details important concepts relating to the design and implementation of data networks using this emerging standard. A broad overview of the InfiniBand specification is provided, as well as detailed descriptions of all the architecture's operational characteristics.

Specific topics of interest include:

  • Packet addressing, channel adapters, and the role of switches, routers, and repeaters

  • Queue Pair (QP) creation and operation

  • Transfer types, including connected, datagram, and raw datagram service

  • Send/receive operations, such as Send, RDMA Read and Write, and atomic Read/Modify/Write (RMW) operations

  • Link and physical layer descriptions

  • Subnet-local and global addressing

  • The Subnet Manager (SM) and the Subnet Administrator (SA)

  • Performance, communication, device, and baseboard managers

Created for hardware and software engineers and developers, InfiniBand Network Architecture is a critical resource for understanding and implementing this revolutionary technology.



0321117654B11042002

Table of contents

  1. Copyright
  2. PC System Architecture Series
  3. Figures
  4. Tables
  5. Acknowledgments
  6. About This Book
    1. Who Needs This Book?
    2. The MindShare Architecture Series
    3. Cautionary Note
    4. Specifications This Book Is Based On
    5. Specification Is the Final Word
    6. Organization of This Book
    7. Documentation Conventions
    8. Visit Our Web Site
    9. We Want Your Feedback
  7. Core Concepts
    1. Basic Terms and Concepts
      1. Definition of the Acronym “IBA”
      2. Packet Field Documentation Convention
      3. InfiniBand Advantages
      4. Some Preliminary Terminology
      5. Definition of a Subnet
      6. Packet Addressing Basics
      7. Every Packet Contains a BTH
      8. Channel Adapters
      9. Role of Switches and Routers
      10. Repeater's Role
      11. It's All About Message Passing
    2. Intro to Attributes and Managers
      1. Why Talk About Attributes and Managers Now?
      2. Definition of an Attribute
      3. Who Accesses Attributes?
      4. MAs Handle Access Requests
      5. MA's Response
      6. Managers Use Special Packets Called MADs
      7. Attribute Format and Documentation Conventions
    3. QP: Message Transfer Mechanism
      1. Introduction
      2. A QP Is a Bi-Directional Message Transport Engine
      3. Verb Layer Is an OS-Independent API
      4. QP Context Defines QP's Operational Characteristics
      5. Sending a Message to a Destination CA
    4. Intro to Transport Types
      1. Four IBA Transfer Protocol Flavors
      2. Two Non-IBA Service Types
    5. Intro to Send/Receive Operations
      1. How Is a Message Transmitted or Received?
      2. SQ Operation Types
      3. RQ Operation Types
    6. Division of Labor
      1. Introduction
      2. Brief Layer Descriptions
      3. Layers in CAs, Switches and Routers
      4. Simplified Example Scenarios
      5. Physical Layer Overview
      6. Link Layer Overview
      7. Network Layer Overview
      8. Transport Layer Overview
      9. Verbs Overview
    7. Subnet-Local Addressing
      1. Port Numbering
      2. LID Address Space
      3. LID's Purpose: Packet Routing Within Subnet
      4. Assigning Port's Base LID Address
      5. Why Assign a LID Range to a Port?
      6. Port's Decode of DLID Address
      7. Port's Selection of SLID on Packet Transmit
      8. SM Path Database
      9. LID Rule Summary
    8. Global Addressing
      1. Global Routing: Source/Destination CAs in Different Subnets
      2. IPv4 Addresses Too Limiting
      3. IBA Global Address = IPv6 Address
      4. IPv6 Addressing
      5. Assignment of Port's Subnet ID and GUID(s)
      6. Each Device Has Device-Level Identifiers
    9. Intro to the Managers
      1. The SM
      2. General Services Managers
      3. Subnet Administrator's Role
      4. Introduction to Event Notification
    10. Intro to Connection Establishment
      1. Questions Addressed in This Chapter
      2. A CA Is a Provider of Services
      3. Locating a Specific CA
      4. Discovering Services a CA Provides
      5. RC/UC Connection Establishment
      6. RD Connection Establishment
      7. UD Connection Issues
    11. PSN Usage
      1. Overall Size of PSN Range
      2. Requester QP's SQ Logic PSN Generation and Verification
      3. Responder QP's RQ Logic Request PSN Verification
  8. QP Creation and Operation
    1. QP Verbs and QP State Machine
      1. QP-Related Verbs
      2. The QP State Machine
      3. QP Creation
      4. Software Control of QP State
      5. QP Setup Is Performed in a Defined Sequence
      6. Reset State
      7. Initialized State
      8. Ready to Receive State
      9. Ready to Send State
      10. SQ Drain (SQD) State
      11. SQ Error State
      12. Error State
    2. WRs, WQEs, and CQEs
      1. Once Posted to SQ or RQ, WR Is Called a WQE
      2. WRs
      3. WQE Execution and Completion Order
      4. RDMA Read Relaxed Ordering Rules
      5. Completion Queues (CQs) and CQEs
    3. Asynchronous Events and Errors
      1. Why Asynchronous?
      2. Registering a Handler
      3. Affiliated Asynchronous Events
      4. Affiliated Asynchronous Errors
      5. Unaffiliated Asynchronous Errors
  9. Protection Mechanisms
    1. Memory Protection
      1. The Problems
      2. The Solutions
      3. Virtual-to-Physical Page Mapping Background
      4. Memory Regions
      5. Memory Windows
      6. Protection Domains
    2. Other Protection Mechanisms
      1. The IBA Protection Mechanisms
      2. Memory Access Protection (PD, L_Key, and R_Key)
      3. PDs and UD Service
      4. Partition Key (P_Key)
      5. SM-Related Protection Mechanisms
      6. RD Domain
      7. Queue Key (Q_Key)
      8. Baseboard Management Key (B_Key)
  10. Detailed Description of the Transport Services
    1. RC Transport Service
      1. RC Support Requirement
      2. RC Basic Operational Characteristics
      3. RC Connection Establishment
      4. Packet Opcodes
      5. RC Message Transfer Primer
      6. Structure of This Discussion
      7. QP State before Any Messages Are Transferred
      8. Standard Operation in Fast, Error-Free Environment
      9. Traffic Reduction
      10. Packet Delivery Delays
      11. Packet Loss
      12. Nak Errors
      13. RQ Logic's Error Detecting and Handling
      14. End-to-End Flow Control
      15. SQ Logic Can Use MSN to Complete WQEs
      16. Additional Reference Information
    2. UC Transport Service
      1. UC Support Requirement
      2. In RC, Responses Are Expected
      3. UC Is a Subset of RC
      4. UC Transport Service Type's Basic Characteristics
      5. Requester QP's SQ Logic Operation
      6. Responder QP's RQ Logic Operation
    3. RD Transport Service
      1. RD Support Requirement
      2. Introduction
      3. Many Similarities to RC
      4. RD Basic Operational Characteristics
      5. The Major Differences from RC
      6. The Scheduler
      7. Keep RDC Operational If a QP Goes Down
      8. Additional Reference Material
    4. UD Transport Service
      1. UD Support Requirement
      2. No Responses Expected
      3. The Only Operation Supported Is Send
      4. Maximum Message Length Is One PMTU
      5. Basic Operational Characteristics
      6. Messaging with the Desired Remote Service
      7. Address Handles
      8. PD Check Performed before Message Is Processed
      9. SQ Logic Operation
      10. RQ Logic Operation
    5. Raw Transport Service Types
      1. Goal: Tunneling Non-IBA Packets through IBA Network
      2. Solution: Disguise It as Special-Purpose IBA Packet
      3. Raw QPs Are Used to Transmit/Receive Non-IBA Packets
      4. Raw QP Support Requirement
      5. Raw Transport Services Are Unreliable
      6. Send and Receive Are Only Supported Operation Types
      7. Basic Operational Description
      8. Raw Datagrams Do Not Have an ICRC
      9. Raw Datagrams Do Not Have a Destination QP
      10. LRH:LNH Indicates Packet Type
      11. Raw IPv6 Datagrams
      12. Raw EtherType Datagrams
      13. Additional Reference Material
    6. Multicasting
      1. Definition Of Multicasting
      2. Support for Multicasting Is Optional
      3. Only UD and Raw QPs Can Participate in Multicasting
      4. UD Multicasting
      5. Raw Packet Multicasting
      6. Additional Reference Material
    7. Automatic Path Migration
      1. Definition of APM
      2. CA Support Is Optional
      3. Causes of a Path Migration
      4. APM-related Elements
      5. Normal Operation before APM Enabled
      6. Enabling APM
      7. Automatic Hardware Trigger of APM
      8. Loading a New Path or a Tertiary Path
      9. Additional Reference Information
    8. Static Rate Control
      1. How Fast Can a Port Transmit Packets?
      2. Problem: Fast CA Port Can Cause Problems
      3. Solution
      4. IPD Calculation and Source
      5. How IPD Is Provided to a QP or an EEC
  11. Link and Physical Layer Descriptions
    1. Detailed Description of the Link Layer
      1. Link Layer Functional Overview
      2. Link State Machine
      3. Detailed Description of LRH
      4. QoS within the Subnet: SL and VLs
      5. Detailed Description of VL Arbitration
      6. Link-Level Flow Control
      7. Packet CRCs
      8. Intro to the Packet Delimiters
      9. Packet Receive State Machine
      10. Data Packet Check
      11. Link Packet (Flow Control Packet) Check
      12. Switch Performs Packet Forwarding
      13. Overview of Router Port's Link Layer
    2. Detailed Physical Layer Description
      1. Module Basics
      2. General
      3. Port Types
      4. Signal Naming Conventions
      5. Electrical Signaling and Copper Cable
      6. Link Layer to Physical Layer Interface
      7. Transmit Logic Functions
      8. Receiver Logic Functions
      9. Link Training
      10. Physical Layer Error Handling
      11. Repeaters
      12. Performance Counters
  12. The SM and SA
    1. The SMI
      1. Purpose of the SMI (QP0)
      2. The SMI on a CA and Router
      3. The SMI on Switches
      4. Detailed Switch Handling of SMPs
      5. Detailed CA or Router Handling of SMPs
      6. SM Wishes to Access an Attribute in Its Local Device
      7. SM Can Reside in a Switch
      8. SMI Is a Privileged Resource
      9. SMI Only Communicates with Other SMIs
      10. Port States SMPs Can Be Sent and Received In
      11. SMPs Never Leave the Subnet
      12. Setting Up an HCA Port's SMI
      13. How the SM Sends a Message and Handles a Response
      14. SMP Source and Destination
      15. The SMI and the Q_Key
      16. The SMI and Partitions
      17. Additional Reference Material
    2. Detailed Description of MADs
      1. Definition of a MAD
      2. Software Times Return of MAD Response
      3. Basic MAD Contents
      4. SMP MADs
      5. GMP MADs
      6. Traps
      7. Event Subscription and Event Forwarding
      8. The Notice Queue
    3. SM Methods and Attributes
      1. SM MAD Formats
      2. SM Methods
      3. SM Attributes
      4. SM Traps
      5. SMA Notice Support
    4. Multiple SMs
      1. Many Issues Are Outside the Scope of the Specification
      2. Multi-Vendor SM Failover Is Not Supported
      3. A Subnet Can Have More Than One SM
      4. How SM Issues Requests
      5. Introduction to the SM States
      6. Are You Alive?
      7. SM Control Packets
      8. Multiple Master SMs
      9. SM States
    5. Discovery
      1. Who Performs Discovery?
      2. How Packets Are Normally Routed
      3. Packet Routing During Discovery
      4. Scenario: Sweep at Startup
      5. Accessing Device along Partially Configured Path
    6. The GSI
      1. The GSMs, GSAs, and GSIs
      2. QP1 Is the GSI
      3. Preparing an HCA GSI for Use
      4. Required/Optional GSAs
      5. The SA and the GSI
      6. GMPs Use VL Buffer Determined by the GMP's SL Value
      7. GMPs Can Transit Routers
      8. QP1 Is a Controlled-Access QP
      9. P_Key Insertion and Checking
      10. Additional Reference Material
    7. Detailed Description of SA
      1. Purpose of the SA
      2. SA Accessed Using GMPs
      3. Location of the SA
      4. Requester Access Authorization
      5. SA Methods and Attributes
      6. Record Identifier (RID) Definition
      7. SubnAdmGet() Operation
      8. SubnAdmSet() Operation
      9. Definition of a Table
      10. SubnAdmConfig() Operation
      11. Database Queries Using SubnAdmGetTable()
      12. Fetch Entire Database
      13. Reliable Multi-Packet Transaction Protocol
      14. Additional Reference Information
  13. General Services
    1. Baseboard Management
      1. Roles of the Other Managers
      2. The BM Reaches behind the IBA Front-End
      3. Chassis and Module
      4. Chassis Baseboard Management Elements
      5. Passively Managed Chassis
      6. Module BM Elements
      7. Non-Module IBA Devices
      8. BM MAD Format
      9. BM Methods
      10. BM Attributes
      11. BM Sending a Command to the MME
      12. CME Sends a Command to the BM
      13. BM-related Traps
    2. Performance Management
      1. The Role of Performance Management (PM)
      2. Required Features
      3. Optional Features
      4. Performance Management MAD Format
      5. Performance Methods
      6. Mandatory Performance Attributes
      7. Optional PM Attributes
    3. Communications Management
      1. Introduction
      2. CM MAD Format
      3. CM Methods
      4. CM Attributes
      5. CM MADs
      6. Definition of Client and Server
      7. Definition of Active and Passive CM
      8. Three Models Are Supported
      9. Stale Communications Channel
    4. Device Management
      1. Definition of IOU and IOCs
      2. IOU Contains a DMA
      3. DM MAD Format
      4. DM Methods
      5. DM Attributes
      6. Caveats
    5. Glossary
  14. Index

Product information

  • Title: InfiniBand Network Architecture
  • Author(s): Tom Shanley, MindShare, Inc.
  • Release date: October 2002
  • Publisher(s): Addison-Wesley Professional
  • ISBN: 9780321117656