Management & Control Services Help Management & Control Services Help

Configuring Failure Detection

Detecting the failure of the primary server in a cluster is managed by a service called ServerMgtFailureDetectorService. This service is used only in a clustered environment; in a stand-alone environment, it only waits for the server to join a cluster. Options for configuring this service are contained in the file

To change configuration options for failover detection
  1. In a text editor on the primary server, open the file, located in your ../mcs/WEB-INF/data/servermgt/settings directory.
    Note Changes to configuration options must be made on the primary server in a cluster.
  2. Adjust the values for the following options as necessary:
    FailureDetection.PingInterval The interval of time that servers in a cluster wait between pings to the primary server. The default value is 10 seconds.

    Secondary servers ping the primary server to monitor its status. When a secondary server pings the primary server and the primary server responds, the primary server is assumed to be running. If the primary server does not respond and the number of allowed retries is exceeded, failover begins.

    If this value is set too low for a cluster with many servers, the primary server could be flooded with ping requests.

    FailureDetection.PingRetries The number of times a server accepts failed pings to the primary server before it starts the failover process. The default value is zero (0); using this value, failover is started as soon as a secondary server fails to contact the primary server.
  3. Save and close the properties file.

  4. Wait several minutes for the data to be replicated to the other servers in the cluster, and then use the MCS console on the primary server to restart the ServerMgtFailureDetectorService service on all servers in the cluster.
    Note If you do not want to wait while the data is replicated, you can update the properties file on each server in the cluster, beginning with the primary server, and then restart the service on all the servers in the cluster.

Related Topics
Bullet Starting and Stopping Services
Bullet Managed Services in MCS