650 WebSphere Application Server V8.5 Administration and Configuration Guide for the Full Profile
For more information about creating a cluster, go to the following website:
http://pic.dhe.ibm.com/infocenter/wasinfo/v8r5/index.jsp?topic=%2Fcom.ibm.webspher
e.zseries.doc%2Fae%2Ftrun_wlm_cluster_v61.html
18.2 High availability
The high availability framework that is provided with the product eliminates single points of
failure and provides peer-to-peer failover for applications and processes that are running
within the product environment.
In this section, we provide an overview of the high-availability features in WebSphere
Application Server for z/OS.
18.2.1 High availability manager
High availability manager provides services for product components so that they can make
themselves highly available. By default, the high availability manager instance runs on every
application server, proxy server, node agent, and deployment manager in a cell.
A cell can be divided into multiple high-availability domains known as
core groups. Each high
availability manager instance establishes network connectivity with all other high availability
manager instances in the same core group, using a specialized, dedicated, and configurable
transport channel. The transport channel provides mechanisms that allow the high availability
manager instance to detect when other members of the core group start, stop, or fail.
Automatic restart manager (ARM), Tivoli System Automation, or other automation software
can be configured to restart failed WebSphere Application Server controllers.
Within a core group, high availability manager instances are elected to coordinate high
availability activities. An instance that is elected is known as a
core group coordinator. The
coordinator is highly available, such that if a process that is serving as a coordinator stops or
fails, another instance is elected to assume the coordinator role, without loss of continuity.
The high availability manager periodically runs a number of background tasks, such as
checking the health of highly-available singleton services that it is managing. Most of these
background tasks consume trivial amounts of CPU.
The exceptions are the regularly scheduled Discovery and Failure Detection Protocols:
The Discovery Protocol discovers when other core group processes start and open
network connections to these other members.
The View Synchrony Protocol establishes reliable messaging with other core group
members after the connections are opened.
The Failure Detection Protocol detects when other core group members stop or become
unreachable because of a network partition.
Distribution and Consistency Services
The Distribution and Consistency Services (DCS) transport chain provides the underlying
group services framework for the high availability manager, such that each application server
process knows the health and status of JVMs and singleton services. DCS provides a view of
synchronous services to the high availability manager. DCS itself uses reliable multicast
messaging (RMM) as its publish and subscribe message framework.

Get WebSphere Application Server V8.5 Administration and Configuration Guide for the Full Profile now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.