Achieving the Highest Levels of Parallel Sysplex Availability

Book description

This IBM Redbooks publication provides an example of the "ideal" Parallel Sysplex environment (one that is configured to deliver the highest levels of application availability), and describes the features and functions used in this environment. For each function, we describe briefly what it does and what its benefit is, and refer to the appropriate implementation documentation.

In this document, we discuss how to configure hardware and software, and how to manage systems processes for maximum availability in a Parallel Sysplex environment. We discuss the basic concepts of continuous availability and describe a structured approach to developing and implementing a continuous availability solution.

This document provides a list of items to consider for trying to achieve near-continuous application availability and should be used as a guide when creating a high-availability Parallel Sysplex.

Information is provided in recommendations lists and will help you configure and manage your IT environment to meet your availability requirements.

This publication is intended to help customers’ systems and operations personnel and IBM systems engineers to plan, implement, and use a Parallel Sysplex in order to get closer to a goal of continuous availability. It is not intended to be a guide to implementing or using Parallel Sysplex as such. It only covers topics related to continuous availability.

Table of contents

  1. Notices
    1. Trademarks
  2. Preface
    1. The team that wrote this redbook
    2. Become a published author
    3. Comments welcome
  3. Chapter 1: Introduction
    1. Why availability is important to you
    2. Cost of an outage
      1. Component outage versus service outage
      2. Availability overview
    3. Continuous availability in a Parallel Sysplex
      1. Availability definitions
      2. Spectrum of availability factors
    4. What this book is all about
  4. Chapter 2: Hardware
    1. Environmental
      1. Power
      2. Cooling
      3. Geographic location
      4. Physical security
      5. Automation
      6. Physical configuration control
    2. Central Processing Complexes (CPCs)
      1. How many CPCs to have
      2. Availability features
      3. Concurrent upgrade
      4. Redundant capacity
      5. Hardware configuration
    3. Coupling Facilities
      1. Coupling Facility Capacity
      2. Failure isolation
      3. Recovering from CF failure
      4. How many CFs
      5. Coupling Facility Control Code Level considerations
      6. CF maintenance procedures
      7. CF volatility
      8. Nondisruptive Coupling Facilities hardware upgrades
    4. 9037 Sysplex Timers considerations
      1. Sysplex Timer® Models
      2. Recovering from loss of all timer signals
      3. Maximizing 9037 availability
      4. Message time ordering
    5. Intelligent Resource Director
      1. An IRD Illustration
      2. WLM LPAR CPU Management
      3. Dynamic Channel-path Management (DCM)
      4. Channel Subsystem I/O Priority Queueing
    6. Switches
      1. ESCON Directors
      2. FICON Switches
    7. DASD
      1. Peer to Peer Remote Copy (PPRC)
      2. Extended Remote Copy (XRC)
    8. Geographically Dispersed Parallel Sysplex™
      1. Data consistency
      2. The HyperSwap
    9. Other hardware equipment
      1. 3494 Tape library/VTS
      2. Stand-alone tape
      3. 3174, 2074
  5. Chapter 3: z/OS
    1. Configure software for high availability
      1. Couple Data Sets
      2. Other important data sets
      3. Sysres and master catalog sharing
    2. Consoles
      1. Addressing WTO and WTOR buffer shortages
      2. EMCS consoles
      3. Using the HMC as a console
      4. Hardware consoles
      5. Console setup recommendations
    3. Coupling Facility management
      1. Defining CFs and structures
      2. Structure placement
      3. Structure rebuild considerations
      4. Structure duplexing
      5. Structure monitoring
      6. Structure recommendations
    4. CF operations
    5. IBM Health Checker for z/OS and Sysplex
      1. Health Checker description
      2. IBM Health Checker recommendations
    6. z/OS msys for Operations
      1. Automated Recovery Actions
      2. Sysplex operation
      3. z/OS msys for Operations recommendations
    7. Sysplex Failure Management (SFM)
      1. Configuring for status update missing conditions
      2. Configuring for signaling connectivity failures
      3. Configuring for Coupling Facility failures
      4. SFM recommendations
    8. Automatic Restart Manager (ARM)
      1. Configuring for Automatic Restart Management
      2. ARMWRAP - The ARM JCL Wrapper
      3. ARM recommendations
    9. System Logger (LOGR)
      1. Logstream types
      2. CF structure considerations
      3. System-Managed CF Structure Duplexing
      4. DASD based staging data set considerations (DASD-Only)
      5. DASD-based staging data set considerations (Coupling Facility)
      6. DASD-based log data set considerations
      7. Offload considerations
      8. Log data retention
      9. GMT considerations
      10. System Logger recovery
      11. System Logger recommendations
    10. Cross-system Coupling Facility (XCF)
      1. XCF systems, groups, and members
      2. XCF signaling paths
      3. XCF Transport Classes
      4. XCF signal path performance problems
      5. XCF message buffer length performance problems
      6. XCF message buffer space performance problems
      7. XCF Coupling Facility performance problems
      8. XCF recommendations
    11. GRS
      1. GRS start options
      2. Dynamic RNLs
      3. GRS Ring Availability considerations - Fully connected complex
      4. GRS Ring Availability considerations - Mixed complex
      5. GRS Star Availability considerations
      6. SYNCHRES option
      7. Resource Name Lists (RNLs)
      8. RNL design
      9. GRS monitor (ISGRUNAU)
      10. RNL syntax checking
      11. GRS recommendations
    12. Tape sharing
      1. IEFAUTOS
      2. ATS Star
      3. Coexistence between Dedicated, IEFAUTOS, and ATS Star
      4. Tape-sharing recommendations
    13. JES2
      1. JES2 SPOOL considerations
      2. JES2 Checkpoint considerations
      3. JES2 Checkpoint access
      4. JES2 Checkpoint performance
      5. JES2 Checkpoint management
      6. JES2 Health Monitor
      7. Scheduling environment
      8. WLM-managed initiators
      9. JESLOG SPIN data sets
      10. JES2 recommendations
    14. WLM
      1. Service classes
      2. WLM recommendations
    15. UNIX System Services
      1. Shared HFS
      2. Automove
      3. zFS
      4. BRLM issues
      5. UNIX System Services recommendations
    16. RACF
      1. RACF sysplex communication
      2. RACF non-data sharing mode
      3. RACF data sharing mode
      4. RACF read-only mode
      5. RACF recovery procedures
      6. PKI Services
      7. RACF recommendations
    17. DFSMShsm
      1. Common Recall Queue
      2. Hot standby (Secondary Host promotion)
      3. Use of record level sharing (RLS) for CDSs
      4. DFSMShsm recommendations
    18. Catalog
      1. VVDS mode catalog sharing
      2. Enhanced catalog sharing (ECS)
      3. Catalog integrity
      4. Catalog performance
      5. Catalog sizing
      6. Catalog backup and recovery
      7. Catalog security
      8. Catalog recommendations
    19. Software maintenance
      1. Types of maintenance
      2. Classification of maintenance
      3. Sources of maintenance
      4. Consolidated Service Test (CST)
      5. Enhanced Holddata
      6. Software maintenance recommendations
    20. Testing the sysplex
      1. Test Sysplex
      2. Sysplex testing recommendations
    21. Planned outages
      1. APPC/MVS configuration
      2. APPC/MVS Transaction Scheduler
      3. Authorized Program Facility (APF)
      4. Diagnostics
      5. Dump options
      6. Dump Analysis and Elimination (DAE)
      7. Console management
      8. Console group management
      9. Exits
      10. Global Resource Serialization (GRS)
      11. IODF management
      12. IOS
      13. LNKLST
      14. LOGREC error recording
      15. LPALST
      16. Message Processing Facility (MPF)
      17. MVS Message Service (MMS)
      18. Local Page Data Sets
      19. Parmlib concatenation
      20. Products
      21. Program properties table
      22. Run-time library services (RTLS)
      23. SLIP
      24. System Measurement Facility (SMF)
      25. Storage Management Subsystem (SMS)
      26. Subsystem Names (SSN)
      27. System Resources Manager (SRM)
      28. Time Sharing Option (TSO)
      29. UNIX System Services (USS)
      30. XCF
      31. Planned outages recommendations
    22. Unplanned outages
      1. Dump options
      2. ABEND dumps
      3. SVC dumps
      4. Stand-alone dump
      5. Dump suppression
      6. SLIP traps
      7. System Hardcopy log
      8. Environmental Record Editing and Printing (EREP)
      9. Unplanned outages recommendations
  6. Chapter 4: Systems Management
    1. Overall Availability Management processes
      1. Develop availability practices
      2. Develop standards
    2. Service Level Management
      1. Business requirements
      2. Negotiating objectives
      3. Documenting agreements - Managing expectations
      4. Building infrastructure - Technical and support
      5. Measuring availability
      6. Track and report availability
      7. Customer satisfaction
    3. Change Management
      1. Develop and prepare change
      2. Assess and minimize risk
      3. Testing
      4. Back-out planning
      5. Verify change readiness
      6. Schedule change
      7. Communicate change
      8. Implement and document
      9. Change record content
      10. Review quality
    4. Organization
      1. Skills
      2. Help desk activities
      3. Operations
      4. Automation
      5. Application testing
      6. Passive and active monitoring
    5. Recovery Management
      1. Terminology
      2. Recovery potential initiatives
      3. Recovery Management activities
      4. Event Management
      5. Incident Management
      6. Crisis Management
    6. Problem Management
      1. Data tracking and reporting
      2. Causal analysis
      3. Maintenance policies
    7. Performance Management
    8. Capacity planning
    9. Security Management
      1. Security policy
      2. Physical security
    10. Configuration Management
      1. Component Failure Impact Analysis
    11. Enterprise architecture
      1. Infrastructure simplification
      2. Ideal mainframe implementations
      3. Ideal BladeCenter implementation
      4. Examples and scenarios demonstrate real infrastructure simplification efforts
    12. IBM IGS High Availability Services
      1. More than just technology
    13. Summary
  7. Related publications
    1. IBM Redbooks
    2. Other publications
    3. Online resources
    4. How to get IBM Redbooks
    5. Help from IBM
  8. Index (1/3)
  9. Index (2/3)
  10. Index (3/3)
  11. Back cover

Product information

  • Title: Achieving the Highest Levels of Parallel Sysplex Availability
  • Author(s): Frank Kyne, Christian Matthys, Uno Bengtsson, Andy Clifton, Steve Cox, Gary Hines, Dougie Lawson, Glenn McGeoch, Geoff Nicholls, David Raften, Tom Russell
  • Release date: December 2004
  • Publisher(s): IBM Redbooks
  • ISBN: None