Installation and configuration¶
Installation¶
Important
First need to do setting up the environment. All commands are executed only from superuser.
Mode superuser:
sudo -i
Important
Installation is performed on the control node. It is need to install the OpenStack Nova API service for the module to work.
Save the list of previously installed packages before starting the installation, this will allow you to painlessly restore the system in case of damage. Run following commands to do this:
mkdir -p /tmp/rollback/cloud_manager pip3 freeze > /tmp/rollback/cloud_manager/
After that, directory
/tmp/rollback/cloud_manager
will contain filepip_before.txt
with list of installed applications.Also save migration versions:
openstack aos db list -n cloud_manager > /tmp/rollback/cloud_manager/migrations.txt
Where:
/tmp/rollback/cloud_manager/
is file directory;migrations.txt
is name of file with migration versions.
Install the package CloudManager:
from Python package repository:
pip3 install cloud-manager
Save the list of installed packages after installation to be able to roll back changes:
pip3 freeze > /tmp/rollback/cloud_manager/pip_after.txt
Add user:
useradd -m aos passwd aos
Note
To install CloudManager on Astra Linux (Smolensk) do following:
Connect the provided repository with AccentOS packages.
Install the package with the command:
sudo apt install -y cloud-manager
Installation on two or more controllers¶
It is need when installing CloudManager on two or more controllers:
- to replicate database for each of controllers;
- to install a module with the same parameters for each of controllers;
- to processes of monitoring, managment and synchronization of compute nodes should only be active on one controller. Access to the server must be configured message broker for the same controller;
- Initialization compute nodes and power management devices need to be carried out only on one of the controllers.
Note
Deleting and diagnostics of the module on each controller is performed in the same way as in the case of one controller.
Configuration¶
Note
We consider setting up the launch of the API service through WSGI- the server supplied with the eventlet
library. Sse the documentation for the corresponding server to configure the launch of the service through another WSGI-server (Nginx + Gunicorn, Apache + mod_wsgi, etc.). Path to WSGI applications is cloud_manager.api.cloud_manager_api.wsgi
.
Perform initial configuration of the module:
openstack aos configure -n cloud_manager
Create directory for logs with the required permissions:
mkdir -p /var/log/aos/cloud-manager chown -R aos:aos /var/log/aos/cloud-manager
Copy sample configuration file, if using non-standard parameters, edit them (for details, see Configuration file):
cp /etc/aos/aos.conf.example /etc/aos/aos.conf cp /etc/aos/cloud_manager.conf.example /etc/aos/cloud_manager.conf
Create database using MySQL as an example, set rights, database type and other parameters:
# Login to the database using the root password mysql -uroot -p # Create cloud_manager database CREATE DATABASE cloud_manager; # Give permission to read, edit, perform any actions on all tables in cloud_manager database GRANT ALL PRIVILEGES ON cloud_manager.* TO 'aos'@'localhost' IDENTIFIED BY 'password'; GRANT ALL PRIVILEGES ON cloud_manager.* TO 'aos'@'%' IDENTIFIED BY 'password'; # Exit the database
Edit section
[database]
of the configuration fileetc/aos/cloud_manager.conf
, for example:[database] url = mysql+pymysql://aos:password@tst.stand.loc:3306/cloud_manager?charset=utf8
Migrate database:
openstack aos db migrate -n cloud_manager
Configure RabbitMQ Server message broker:
rabbitmqctl add_user aos password rabbitmqctl add_vhost aos rabbitmqctl set_permissions -p aos aos ".*" ".*" ".*" rabbitmqctl set_permissions aos ".*" ".*" ".*"
Create user in OpenStack for API services:
openstack user create --domain default --project service --project-domain default --password password --or-show aos
Assign user service role:
openstack role add --user aos --user-domain default --project service --project-domain default service
Enable and start systemd services:
systemctl daemon-reload systemctl enable aos-cloud-manager-agent.service systemctl start aos-cloud-manager-agent.service systemctl enable aos-cloud-manager-api.service systemctl start aos-cloud-manager-api.service systemctl enable aos-cloud-manager-listener.service systemctl start aos-cloud-manager-listener.service systemctl enable aos-cloud-manager-beat.service systemctl start aos-cloud-manager-beat.service systemctl enable aos-cloud-manager-worker.service systemctl start aos-cloud-manager-worker.service
Create CloudManager API service:
openstack service create --name cloud-manager --description "Cloud Manager Service" cloud-manager
Create endpoints:
openstack endpoint create --region RegionOne cloud-manager internal http://controller:9362 openstack endpoint create --region RegionOne cloud-manager admin http://controller:9362 openstack endpoint create --region RegionOne cloud-manager public http://controller:9362
Configuration file¶
Note
Config file allows to override sections and parameters of common file aos.conf
for specific module.
Note
There are no lines with the level logging by default in the file cloud_manager.conf.example
, it is specified if necessary. Level logging is set by default in the general configuration file. More information about the configuration files can be found in the corresponding section.
Configuration file is presented in ini
format and consists of the following sections and parameters:
Section | Parameter | Description | Default value |
---|---|---|---|
DEFAULT | dhcp_leases | Path to directory with .leases files of
DHCP server. |
/var/lib/dhcp/d hcpd/state/dhcp d.leases, /var/ lib/dhcp/dhcpd. leases |
agent | agent_confirm_timeout | Maximum waiting time for confirmation of response from aos-agent service. | 10 |
agent | agent_response_timeout | Maximum time to wait for a response from aos-agent service. | 60 |
agent | backoff_factor | Increasing wait time with each connection iteration. | 0.3 |
agent | enabled | Parameter for enabling or disabling the node management functionality provided by Agent module. | False |
agent | logfile | Path to log file of aos-cloud-manager-agent service. | |
api | audit_enabled | Flag enabling auditing of API requests. | True |
api | host | IP address where the CloudManager API service will run. | 0.0.0.0 |
api | logfile | Path to log file of aos-cloud-manager-api service. | |
api | port | Port where the CloudManager API service will run. | 9362 |
balancing | approx | Balancing accuracy. | 0.1 |
balancing | enabled | Parameter for enabling or disabling the nodes balancing. | False |
balancing | migration_retries | Number of attempts to migrate instances. | 3 |
balancing | periodic_time | Period between host aggregates balancing. It is measured in hours. | 24 |
database | url | Setting up connection to database. | mysql+pymysql:/ /aos:password@l ocalhost:3306/c loud_manager?ch arset=utf8 |
drs | host | Host where DRS system is running. | |
drs | password | User password of DRS system. | |
drs | port | Server port with DRS system. | 80 |
drs | username | User of DRS system. | |
drs_trigger | enabled | Parameter to enable automatic application of audit results. | True |
drs_trigger | logfile | Path to log file of audit results automatic application. | |
drs_trigger | trigger_interval | Interval for starting automatic application of audit results. It is measured in seconds. | 600 |
extra_availabil ity_check | attempts | Number of attempts to read the file. | 2 |
extra_availabil ity_check | delay | Delay in retrying to read the compute node state file. It is measured in seconds. | 60 |
extra_availabil ity_check | enabled | Parameter to enable or disable additional checking for availability of compute nodes through storage. | False |
extra_availabil ity_check | instance_rate | Required percentage of running instances. | 100 |
host_tasks | allow_evacuate_host | This parameter enables or disables host evacuation. | True |
host_tasks | deny_evacuate | This parameter disables evacuation for specified hosts. | |
host_tasks | evacuation_retries | Number of attempts to evacuate instances. | 2 |
host_tasks | prioritized_evacuation_timeout | Interval between evacuating instance groups with the same recovery priorities. It os measured in seconds. | 60 |
host_tasks | recovery_priority | Priority of restoration of the instance during auto evacuation in case of problems on the hypervisor. | 5 |
host_tasks | retries_wait_for_node_state | Maximum number of state polling attempts hypervisors. | 240 |
host_tasks | retries_wait_for_vm_status | Maximum number of state polling attempts instances. | 60 |
listener | durability | Durability of RabbitMQ queue and exchange. | True |
listener | logfile | Path to log file of aos-cloud-manager-listener service. | |
listener | nova_rabbit_vhost | Nova RabbitMQ service virtual host. | / |
node_sync | enabled | Parameter enabling synchronization of compute nodes. | True |
node_sync | logfile | Path to compute nodes synchronization log file. | |
node_sync | reserve_interval | Waiting time before start of transfer of free hypervisor to reserve. | 60 |
node_sync | sync_interval | Compute nodes synchronization interval in seconds. | 60 |
node_tracker | allow_host_auto_power_off | Parameter enables or disables restarting compute node in case of its transition to the down status. | False |
node_tracker | enabled | Parameter that enables checking the status of compute nodes. | True |
node_tracker | host_restart_timeout | Restart timeout compute node. It is measured in seconds. | 600 |
node_tracker | logfile | Path to log file check status of compute nodes. | |
node_tracker | loop_time | Time interval between checks of the status of compute nodes (in seconds). | 30 |
node_tracker | max_down_hosts | Maximum allowed number of compute nodes in the down status, excluding standby ones. If this number is exceeded, automatic evacuation is not performed for any of the nodes. Negative numbers are not allowed. | 0 |
node_tracker | mutex | Number of attempts to determine the status of the hypervisor when switching to the down status before starting the handler. | 3 |
node_tracker | mutex_up | Number of attempts to determine the status of the hypervisor when switching to up status before starting the handler. | 1 |
node_tracker | timeout_reserv_node_up | Waiting time for raising the standby compute node in minutes. | 15 |
nova_database | url | Setting up connection to database. | |
power | shutdown_interval | Waiting time until the next iteration starts when checking the status of the compute node when the host is shut down with evacuation of virtual machines when the console utility is powered off the hypervisor. It is measured in seconds. | 30 |
power | shutdown_max_tick | Maximum number of iterations for checking the status of a compute node when the host is shutdown with evacuation of instances when the hypervisor is powered off by the console utility. | 10 |
pxe | conf_dir | Directory for configuration files of PXE. | /var/lib/tftpbo ot/pxelinux.cfg / |
storage_sync | enabled | Parameter to enable storage sync. | True |
storage_sync | logfile | Path to storages synchronization log file. | |
storage_sync | sync_interval | Storage sync start interval. It is measured in seconds. | 60 |
Note
approx
in balancing
section of config file is parameter for determining the loading on the hypervisor. Node is considered loaded if the ratio of the average value of the occupied RAM of the hypervisor to the total amount of memory of this hypervisor exceeds the average value of the occupied RAM of all nodes accepted for balancing to the amount of RAM of these nodes + approx
.
Example:
Two compue nodes with total amount of 8192 MB of RAM, 4086 MB each, are accepted for balancing, while 3072 MB of RAM are occupied on one compute node, and 256 MB of RAM are occupied on the second compute node.
Total load of balanced nodes will be calculated as: (3072 + 256) / 8192 = 0.40
The load of the first hypervisor will be calculated as 3072/4086 = 0.75
The load of the second hypervisor will be calculated as 256/4086 = 0.06
approx = 0.1
After adding approx for each hypervisor, compare resulting values to the total load:
for first node: 0.75> 0.40 + 0.1 - this node is loaded and needs to be balanced;
for second node: 0.06 <0.40 + 0.1 - this node is not loaded and will not be accepted for balancing.
Important
It is need to perform the procedure described in the section “Updating the configuration file” when changing the parameters of the configuration file for them to take effect.
Recovery plan¶
Roll back if CloudManager fails to install or update:
Compare the versions of the migrations in the file
/tmp/rollback/cloud_manager/migrations.txt
with the current ones. If there are differences migrate to the previous version. Migration example:openstack aos db list -n cloud_manager openstack aos db migrate -n cloud_manager --migration 27
Revert to the previous state of the packages:
cd /tmp/rollback/cloud_manager diff --changed-group-format='%>' --unchanged-group-format='' pip_before.txt pip_after.txt > pip_uninstall.txt diff --changed-group-format='%<' --unchanged-group-format='' pip_before.txt pip_after.txt > pip_install.txt pip3 uninstall -r pip_uninstall.txt pip3 install -r pip_install.txt