58 WebSphere eXtreme Scale Best Practices for Operation and Management
4.1.6 Topology considerations for failure handling
Now, you must account for failures. This part is always a judgment call, but there are three
failure scenarios that are commonly considered:
The topology must survive a single JVM failure and be able to reconstruct the full set of
primary and replica shards (maxSynchReplicas or maxAsynchReplicas). The simplest and
best way to allow for this reconstruction is to increase C by 1. For this reason, we did not
previously round up C so it was evenly divisible by N, your number of servers. After
increasing by 1, if the new C is not evenly divisible by N, you might further increase C until
it is. Extra memory never hurt any grid.
The topology must survive a single server failure and be able to reconstruct the full set of
primary and replica shards (maxSynchReplicas or maxAsynchReplicas). This rule allows
any one server to be down for an extended period of time, such as for maintenance or
diagnostics. To ensure this capability, add one more server with the same amount of
memory as the others (if servers vary in memory size, add one more of the larger servers).
If you want to be less conservative, you can decide that your requirement is only that the
grid remain operational. If your topology to this point has allowed for one or more replicas,
one server going down will result in the (extremely temporary we hope) loss of a number
of replicas (and a warning message), but application operation will continue because only
primaries are used in operation. Get that server running again immediately so that a
subsequent JVM failure will not trigger the loss of actual primary data.
The topology must survive a single server failure followed by a single JVM failure. If you
have shut down a server for the maintenance of the OS or another reason, a single JVM
failing must not disable your grid. You can determine if you can survive this situation with
the following conditions:
– Normal state is C containers over N servers, or C/N containers per server.
– One server down means you have C - C/N containers remaining.
– A subsequent JVM down means that you have C - C/N - 1 containers remaining.
– If and only if the following condition is true, will you survive a double failure:
(C - C/N - 1) * Available heap @ 60% use > PrimaryDataSize
And now, you know why we included the “Available heap @ 60% use” column in Table
4-1 on page 54.
4.1.7 Determining the number of partitions
The value that you choose for numberOfPartitions in the deployment XML is only significant
if you choose a number that is too small or far too large. There are various subtle effects from
choosing a large or a small number but the primary consideration results from the fact that a
given partition must live wholly in a single container (just as a given object – or entity tree if
using the WebSphere eXtreme Scale entity model - must live wholly in a single partition).
eXtreme Scale is extremely elastic in that you can grow or shrink the number of containers
(C) while the existing container servers (and the catalog servers) are running and in use.
Each new container simply uses the same XML file pair as the previous containers did. The
only limit on the number of containers you have running is the number of partitions; if
numberOfPartitions is P, you cannot have more than P containers started to host data.
Additional containers will start, but they will not receive any partition to host.
For certain applications, C (the number of containers calculated previously) is the number that
you expect to always have started. For other grids, you know that C is fine most of the time
but you periodically have spikes of load (quarterly, end of month, and so on) where you will