Configuring Failure Detection

Detecting the failure of the primary server in a cluster is managed by a service called ServerMgtFailureDetectorService. This service is used only in a clustered environment; in a stand-alone environment, it only waits for the server to join a cluster. Options for configuring this service are contained in the file

Changes to configuration options must be made on the primary server in a cluster.

To change configuration options for failover detection
  1. In a text editor on the primary server, open the file, located in your ../mcs/WEB-INF/data/servermgt/settings directory.
    Note If you selected Secure the Integrity of Critical MCS Files on the Security Services Configuration page, you must clear that option before continuing. Otherwise, MCS may not function correctly after you modify the file.
  2. Adjust the values for the following options as necessary:
    FailureDetection.PingInterval The interval of time that servers in a cluster wait between pings to the primary server. The default value is 10 seconds.

    Secondary servers ping the primary server to monitor its status. When a secondary server pings the primary server and the primary server responds, the primary server is assumed to be running. If the primary server does not respond and the number of allowed retries is exceeded, failover begins.

    If this value is set too low for a cluster with many servers, the primary server could be flooded with ping requests.

    FailureDetection.PingRetries The number of times a server accepts failed pings to the primary server before it starts the failover process. The default value is zero (0); using this value, failover is started as soon as a secondary server fails to contact the primary server.
  3. Save and close the properties file.

  4. Wait several minutes for the data to be replicated to the other servers in the cluster, and then use the MCS console on the primary server to restart the ServerMgtFailureDetectorService service on all servers in the cluster.
    Note If you do not want to wait while the data is replicated, you can update the properties file on each server in the cluster, beginning with the primary server, and then restart the service on all the servers in the cluster.
Related Topics
Bullet Managing MCS Servers and Clusters, Overview
Bullet Data Replication and Failover Protection
Bullet Managed Services in MCS