66 Achieving the Highest Levels of Parallel Sysplex Availability
3.1 Configure software for high availability
One of the first things to consider is the definition, use, and placement of data sets critical to
the availability of the whole sysplex. As far as possible, you need to make sure that there are
no single points of failure that could affect these data sets. You also have to be careful to
define and place the data sets so that they achieve the required performance.
3.1.1 Couple Data Sets
z/OS requires a sysplex to have a sysplex Couple Data Set (CDS), and optionally a number
of policy CDSs, depending upon which sysplex features are being utilized. These data sets
are shared by all members of the sysplex and are used to store information relating to the
status of the sysplex, or to manage certain aspects of the sysplex.
All types of CDSs may be defined as a primary data set only, or as a primary/alternate data
set pair. Updates are made to both the primary and alternate data set concurrently. This
allows XCF to automatically switch to the alternate CDS should the primary CDS fail or if
there is any type of transient or permanent I/O error related to the primary CDS.
Sysplex Couple Data Sets
The sysplex CDS is used to store information about the sysplex, about the members of the
sysplex, and about the XCF groups and XCF group members currently active within the
sysplex. Heartbeat monitoring of systems in the sysplex is primarily done via the sysplex
CDS. As such, performance of the volumes containing the CDSs is important, as well as
choosing volumes with minimal possibility of any lockouts (for example, RESERVEs). These
data sets are critical to the sysplex, as any member that loses access to the sysplex CDSs
will be placed in a non-restartable wait state.
Tip: There are various recommendations throughout this chapter to have just one copy of
many system elements (sysres, master catalog, and so on). Having just one copy
significantly eases systems management, results in fewer entities that can break, and
reduces the systems programmer’s workload. However, having just one copy can also
create a single point of failure.
Rather than addressing this issue over and over, we want to say here that the use of
HyperSwap, provided with GDPS/PPRC, removes the related DASD volumes as single
points of failure. GDPS/PPRC HyperSwap allows you to nondisruptively swap between
PPRC primary and secondary devices. If HyperSwap is enabled, a failure of a device or a
whole DASD subsystem results in a pause in processing of between 30 seconds and one
minute, rather than requiring an IPL. As a result, you can have, for example, a single sysres
volume shared between all members in the sysplex, and still not have a single point of
Important: Even though it is possible to run with just a primary CDS, you should never do
so, as this constitutes a single point of failure.
In this section, whenever we refer to a CDS, we always assume that you will actually have
a failure-isolated pair (primary/alternate) of each type of CDS you use.