Installation and configuration

Installation

Important

First need to do setting up the environment. All commands are executed only from superuser.

Mode superuser:

sudo -i

Important

Installation is performed on the control node. It is need to install the OpenStack Nova API service for the module to work.

  1. Save the list of previously installed packages before starting the installation, this will allow you to painlessly restore the system in case of damage. Run following commands to do this:

    mkdir -p /tmp/rollback/cloud_manager
    pip3 freeze > /tmp/rollback/cloud_manager/
    

    After that, directory /tmp/rollback/cloud_manager will contain file pip_before.txt with list of installed applications.

  2. Also save migration versions:

    openstack aos db list -n cloud_manager > /tmp/rollback/cloud_manager/migrations.txt
    

    Where:

    • /tmp/rollback/cloud_manager/ is file directory;
    • migrations.txt is name of file with migration versions.
  3. Install the package CloudManager:

    • from Python package repository:

      pip3 install cloud-manager
      
  4. Save the list of installed packages after installation to be able to roll back changes:

    pip3 freeze > /tmp/rollback/cloud_manager/pip_after.txt
    
  5. Add user:

    useradd -m aos
    passwd aos
    

Note

To install CloudManager on Astra Linux (Smolensk) do following:

  1. Connect the provided repository with AccentOS packages.

  2. Install the package with the command:

    sudo apt install -y cloud-manager
    

Installation on two or more controllers

It is need when installing CloudManager on two or more controllers:

  1. to replicate database for each of controllers;
  2. to install a module with the same parameters for each of controllers;
  3. to processes of monitoring, managment and synchronization of compute nodes should only be active on one controller. Access to the server must be configured message broker for the same controller;
  4. Initialization compute nodes and power management devices need to be carried out only on one of the controllers.

Note

Deleting and diagnostics of the module on each controller is performed in the same way as in the case of one controller.

Configuration

Note

We consider setting up the launch of the API service through WSGI- the server supplied with the eventlet library. Sse the documentation for the corresponding server to configure the launch of the service through another WSGI-server (Nginx + Gunicorn, Apache + mod_wsgi, etc.). Path to WSGI applications is cloud_manager.api.cloud_manager_api.wsgi.

  1. Perform initial configuration of the module:

    openstack aos configure -n cloud_manager
    
  2. Create directory for logs with the required permissions:

    mkdir -p /var/log/aos/cloud-manager
    chown -R aos:aos /var/log/aos/cloud-manager
    
  1. Copy sample configuration file, if using non-standard parameters, edit them (for details, see Configuration file):

    cp /etc/aos/aos.conf.example /etc/aos/aos.conf
    cp /etc/aos/cloud_manager.conf.example /etc/aos/cloud_manager.conf
    
  2. Create database using MySQL as an example, set rights, database type and other parameters:

    # Login to the database using the root password
    mysql -uroot -p
    # Create cloud_manager database
    CREATE DATABASE cloud_manager;
    # Give permission to read, edit, perform any actions on all tables in cloud_manager database
    GRANT ALL PRIVILEGES ON cloud_manager.* TO 'aos'@'localhost' IDENTIFIED BY 'password';
    GRANT ALL PRIVILEGES ON cloud_manager.* TO 'aos'@'%' IDENTIFIED BY 'password';
    # Exit the database
    
  3. Edit section [database] of the configuration file etc/aos/cloud_manager.conf, for example:

    [database]
    url = mysql+pymysql://aos:password@tst.stand.loc:3306/cloud_manager?charset=utf8
    
  4. Migrate database:

    openstack aos db migrate -n cloud_manager
    
  5. Configure RabbitMQ Server message broker:

    rabbitmqctl add_user aos password
    rabbitmqctl add_vhost aos
    rabbitmqctl set_permissions -p aos aos ".*" ".*" ".*"
    rabbitmqctl set_permissions aos ".*" ".*" ".*"
    
  6. Create user in OpenStack for API services:

    openstack user create --domain default --project service --project-domain default --password password --or-show aos
    
  7. Assign user service role:

    openstack role add --user aos --user-domain default --project service --project-domain default service
    
  8. Enable and start systemd services:

    systemctl daemon-reload
    systemctl enable aos-cloud-manager-agent.service
    systemctl start aos-cloud-manager-agent.service
    systemctl enable aos-cloud-manager-api.service
    systemctl start aos-cloud-manager-api.service
    systemctl enable aos-cloud-manager-listener.service
    systemctl start aos-cloud-manager-listener.service
    systemctl enable aos-cloud-manager-beat.service
    systemctl start aos-cloud-manager-beat.service
    systemctl enable aos-cloud-manager-worker.service
    systemctl start aos-cloud-manager-worker.service
    
  9. Create CloudManager API service:

    openstack service create --name cloud-manager --description "Cloud Manager Service" cloud-manager
    
  10. Create endpoints:

    openstack endpoint create --region RegionOne cloud-manager internal http://controller:9362
    openstack endpoint create --region RegionOne cloud-manager admin http://controller:9362
    openstack endpoint create --region RegionOne cloud-manager public http://controller:9362
    

Configuration file

Note

Config file allows to override sections and parameters of common file aos.conf for specific module.

Note

There are no lines with the level logging by default in the file cloud_manager.conf.example, it is specified if necessary. Level logging is set by default in the general configuration file. More information about the configuration files can be found in the corresponding section.

Configuration file is presented in ini format and consists of the following sections and parameters:

Section Parameter Description Default value
DEFAULT dhcp_leases Path to directory with .leases files of DHCP server. /var/lib/dhcp/d hcpd/state/dhcp d.leases, /var/ lib/dhcp/dhcpd. leases
agent agent_confirm_timeout Maximum waiting time for confirmation of response from aos-agent service. 10
agent agent_response_timeout Maximum time to wait for a response from aos-agent service. 60
agent backoff_factor Increasing wait time with each connection iteration. 0.3
agent enabled Parameter for enabling or disabling the node management functionality provided by Agent module. False
agent logfile Path to log file of aos-cloud-manager-agent service.  
api audit_enabled Flag enabling auditing of API requests. True
api host IP address where the CloudManager API service will run. 0.0.0.0
api logfile Path to log file of aos-cloud-manager-api service.  
api port Port where the CloudManager API service will run. 9362
balancing approx Balancing accuracy. 0.1
balancing enabled Parameter for enabling or disabling the nodes balancing. False
balancing migration_retries Number of attempts to migrate instances. 3
balancing periodic_time Period between host aggregates balancing. It is measured in hours. 24
database url Setting up connection to database. mysql+pymysql:/ /aos:password@l ocalhost:3306/c loud_manager?ch arset=utf8
drs host Host where DRS system is running.  
drs password User password of DRS system.  
drs port Server port with DRS system. 80
drs username User of DRS system.  
drs_trigger enabled Parameter to enable automatic application of audit results. True
drs_trigger logfile Path to log file of audit results automatic application.  
drs_trigger trigger_interval Interval for starting automatic application of audit results. It is measured in seconds. 600
extra_availabil ity_check attempts Number of attempts to read the file. 2
extra_availabil ity_check delay Delay in retrying to read the compute node state file. It is measured in seconds. 60
extra_availabil ity_check enabled Parameter to enable or disable additional checking for availability of compute nodes through storage. False
extra_availabil ity_check instance_rate Required percentage of running instances. 100
host_tasks allow_evacuate_host This parameter enables or disables host evacuation. True
host_tasks deny_evacuate This parameter disables evacuation for specified hosts.  
host_tasks evacuation_retries Number of attempts to evacuate instances. 2
host_tasks prioritized_evacuation_timeout Interval between evacuating instance groups with the same recovery priorities. It os measured in seconds. 60
host_tasks recovery_priority Priority of restoration of the instance during auto evacuation in case of problems on the hypervisor. 5
host_tasks retries_wait_for_node_state Maximum number of state polling attempts hypervisors. 240
host_tasks retries_wait_for_vm_status Maximum number of state polling attempts instances. 60
listener durability Durability of RabbitMQ queue and exchange. True
listener logfile Path to log file of aos-cloud-manager-listener service.  
listener nova_rabbit_vhost Nova RabbitMQ service virtual host. /
node_sync enabled Parameter enabling synchronization of compute nodes. True
node_sync logfile Path to compute nodes synchronization log file.  
node_sync reserve_interval Waiting time before start of transfer of free hypervisor to reserve. 60
node_sync sync_interval Compute nodes synchronization interval in seconds. 60
node_tracker allow_host_auto_power_off Parameter enables or disables restarting compute node in case of its transition to the down status. False
node_tracker enabled Parameter that enables checking the status of compute nodes. True
node_tracker host_restart_timeout Restart timeout compute node. It is measured in seconds. 600
node_tracker logfile Path to log file check status of compute nodes.  
node_tracker loop_time Time interval between checks of the status of compute nodes (in seconds). 30
node_tracker max_down_hosts Maximum allowed number of compute nodes in the down status, excluding standby ones. If this number is exceeded, automatic evacuation is not performed for any of the nodes. Negative numbers are not allowed. 0
node_tracker mutex Number of attempts to determine the status of the hypervisor when switching to the down status before starting the handler. 3
node_tracker mutex_up Number of attempts to determine the status of the hypervisor when switching to up status before starting the handler. 1
node_tracker timeout_reserv_node_up Waiting time for raising the standby compute node in minutes. 15
nova_database url Setting up connection to database.  
power shutdown_interval Waiting time until the next iteration starts when checking the status of the compute node when the host is shut down with evacuation of virtual machines when the console utility is powered off the hypervisor. It is measured in seconds. 30
power shutdown_max_tick Maximum number of iterations for checking the status of a compute node when the host is shutdown with evacuation of instances when the hypervisor is powered off by the console utility. 10
pxe conf_dir Directory for configuration files of PXE. /var/lib/tftpbo ot/pxelinux.cfg /
storage_sync enabled Parameter to enable storage sync. True
storage_sync logfile Path to storages synchronization log file.  
storage_sync sync_interval Storage sync start interval. It is measured in seconds. 60

Note

approx in balancing section of config file is parameter for determining the loading on the hypervisor. Node is considered loaded if the ratio of the average value of the occupied RAM of the hypervisor to the total amount of memory of this hypervisor exceeds the average value of the occupied RAM of all nodes accepted for balancing to the amount of RAM of these nodes + approx.

Example:

Two compue nodes with total amount of 8192 MB of RAM, 4086 MB each, are accepted for balancing, while 3072 MB of RAM are occupied on one compute node, and 256 MB of RAM are occupied on the second compute node.

Total load of balanced nodes will be calculated as: (3072 + 256) / 8192 = 0.40

The load of the first hypervisor will be calculated as 3072/4086 = 0.75

The load of the second hypervisor will be calculated as 256/4086 = 0.06

approx = 0.1

After adding approx for each hypervisor, compare resulting values ​​to the total load:

for first node: 0.75> 0.40 + 0.1 - this node is loaded and needs to be balanced;

for second node: 0.06 <0.40 + 0.1 - this node is not loaded and will not be accepted for balancing.

Important

It is need to perform the procedure described in the section “Updating the configuration file” when changing the parameters of the configuration file for them to take effect.

Recovery plan

Roll back if CloudManager fails to install or update:

  1. Compare the versions of the migrations in the file /tmp/rollback/cloud_manager/migrations.txt with the current ones. If there are differences migrate to the previous version. Migration example:

    openstack aos db list -n cloud_manager
    openstack aos db migrate -n cloud_manager --migration 27
    
  2. Revert to the previous state of the packages:

    cd /tmp/rollback/cloud_manager
    diff --changed-group-format='%>' --unchanged-group-format='' pip_before.txt pip_after.txt > pip_uninstall.txt
    diff --changed-group-format='%<' --unchanged-group-format='' pip_before.txt pip_after.txt > pip_install.txt
    pip3 uninstall -r pip_uninstall.txt
    pip3 install -r pip_install.txt