Setting up automatic evacuation for the case of an immediate start of evacuation in case of loss of a computing node

To set up automatic evacuation for the case of an immediate start of evacuation when the computing node is lost, do the following:

  1. Open configuration file cloud_manager.conf and adjust settings:

    Section [host_tasks]:

    • allow_evacuate_host = True (parameter allows (or disables) host evacuation, default value is True)
    • evacuation_retries = 2 (parameter defines the number of attempts to evacuate instances from the compute node, the default value is 2)

    Note

    You also need to make sure that the values of the deny_evacuate parameter do not specify nodes for which evacuation is prohibited. The nodes specified in this parameter will not be automatically evacuated.

    Section``[node_tracker]``:

    • enabled = True (parameter allows checking the status of compute nodes, default value is True)
    • max_down_hosts = 1, 2, 3... (≥ 1) (the parameter defines the maximum allowed number of computing nodes in the down status, except for backup ones. If this number is exceeded, automatic evacuation is not performed for any of the nodes. Negative numbers are not allowed. By default, the parameter is set to 0, automatic evacuation is not performed)
    • mutex = 3 (the parameter determines the number of attempts to determine the status of the hypervisor when switching to the down status before starting the handler, by default the parameter has a value of 3)
    • loop_time = 30 (the parameter defines the time interval between checks of the status of computing nodes in seconds, the default value is 30)
  2. Restart the CloudManager module services for changes made to the configuration file to take effect:

    sudo i - superuser mode
    systemctl restart aos-cloud-manager-*
    
  3. Create instance in Dashboard with the “Fault Tolerance” checkbox checked:

    ../../_images/vm_high_availability.png

    Fault Tolerance

As a result of the actions taken, in the event of a loss of a computing node and the start of auto-evacuation, instances with this priority will be evacuated first.