Installation and configuration¶

Installation
- Installation on two or more controllers
Configuration
Configuration file
Possibilities of using the module
Recovery plan

Installation ¶

Important

First need to do setting up the environment. All commands are executed only from superuser.

sudo -i

Important

Installation is performed on the control node. It is need to install the OpenStack Nova API service for the module to work.

Save the list of previously installed packages before starting the installation, this will allow you to painlessly restore the system in case of damage. Run following commands to do this:
```
mkdir -p /tmp/rollback/cloud_manager
pip3 freeze > /tmp/rollback/cloud_manager/
```
After that, directory /tmp/rollback/cloud_manager will contain file pip_before.txt with list of installed applications.
Also save migration versions:
```
openstack aos db list -n cloud_manager > /tmp/rollback/cloud_manager/migrations.txt
```
Where:
- /tmp/rollback/cloud_manager/ is file directory;
- migrations.txt is name of file with migration versions.
Install the package CloudManager:
- from Python package repository:
```
pip3 install cloud-manager
```
Save the list of installed packages after installation to be able to roll back changes:
```
pip3 freeze > /tmp/rollback/cloud_manager/pip_after.txt
```
Add user:
```
useradd -m aos
passwd aos
```

Note

To install CloudManager on Astra Linux (Smolensk) do following:

Connect the provided repository with AccentOS packages.
Install the package with the command:
```
sudo apt install -y cloud-manager
```

Installation on two or more controllers ¶

It is need when installing CloudManager on two or more controllers:

to replicate database for each of controllers;
to install a module with the same parameters for each of controllers;
to processes of monitoring, managment and synchronization of compute nodes should only be active on one controller. Access to the server must be configured message broker for the same controller;
Initialization compute nodes and power management devices need to be carried out only on one of the controllers.

Note

Deleting and diagnostics of the module on each controller is performed in the same way as in the case of one controller.

We consider setting up the launch of the API service through WSGI- the server supplied with the eventlet library. Sse the documentation for the corresponding server to configure the launch of the service through another WSGI-server (Nginx + Gunicorn, Apache + mod_wsgi, etc.). Path to WSGI applications is cloud_manager.api.cloud_manager_api.wsgi.

Perform initial configuration of the module:

openstack aos configure -n cloud_manager

Create directory for logs with the required permissions:

mkdir -p /var/log/aos/cloud-manager
chown -R aos:aos /var/log/aos/cloud-manager

Copy sample configuration file, if using non-standard parameters, edit them (for details, see Configuration file):

cp /etc/aos/aos.conf.example /etc/aos/aos.conf
cp /etc/aos/cloud_manager.conf.example /etc/aos/cloud_manager.conf

Create database using MySQL as an example, set rights, database type and other parameters:

# Login to the database using the root password
mysql -uroot -p
# Create cloud_manager database
CREATE DATABASE cloud_manager;
# Give permission to read, edit, perform any actions on all tables in cloud_manager database
GRANT ALL PRIVILEGES ON cloud_manager.* TO 'aos'@'localhost' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON cloud_manager.* TO 'aos'@'%' IDENTIFIED BY 'password';
# Exit the database

Edit section [database] of the configuration file etc/aos/cloud_manager.conf, for example:

[database]
url = mysql+pymysql://aos:password@tst.stand.loc:3306/cloud_manager?charset=utf8

Migrate database:

openstack aos db migrate -n cloud_manager

Configure RabbitMQ Server message broker:

rabbitmqctl add_user aos password
rabbitmqctl add_vhost aos
rabbitmqctl set_permissions -p aos aos ".*" ".*" ".*"
rabbitmqctl set_permissions aos ".*" ".*" ".*"

Create user in OpenStack for API services:

openstack user create --domain default --project service --project-domain default --password password --or-show aos

Assign user service role:

openstack role add --user aos --user-domain default --project service --project-domain default service

Enable and start systemd services:

systemctl daemon-reload
systemctl enable aos-cloud-manager-agent.service
systemctl start aos-cloud-manager-agent.service
systemctl enable aos-cloud-manager-api.service
systemctl start aos-cloud-manager-api.service
systemctl enable aos-cloud-manager-listener.service
systemctl start aos-cloud-manager-listener.service
systemctl enable aos-cloud-manager-beat.service
systemctl start aos-cloud-manager-beat.service
systemctl enable aos-cloud-manager-worker.service
systemctl start aos-cloud-manager-worker.service

Create CloudManager API service:

openstack service create --name cloud-manager --description "Cloud Manager Service" cloud-manager

Create endpoints:

openstack endpoint create --region RegionOne cloud-manager internal http://controller:9362
openstack endpoint create --region RegionOne cloud-manager admin http://controller:9362
openstack endpoint create --region RegionOne cloud-manager public http://controller:9362

Configuration file ¶

Note

Config file allows to override sections and parameters of common file aos.conf for specific module.

Note

There are no lines with the level logging by default in the file cloud_manager.conf.example, it is specified if necessary. Level logging is set by default in the general configuration file. More information about the configuration files can be found in the corresponding section.

Configuration file is presented in ini format and consists of the following sections and parameters:

Section	Parameter	Description	Default value
DEFAULT	dhcp_leases	Path to directory with `.leases` files of DHCP server.	/var/lib/dhcp/d hcpd/state/dhcp d.leases, /var/ lib/dhcp/dhcpd. leases
agent	agent_confirm_timeout	Maximum waiting time for confirmation of response from aos-agent service.	10
agent	agent_response_timeout	Maximum time to wait for a response from aos-agent service.	60
agent	enabled	Parameter for enabling or disabling the node management functionality provided by Agent module.	False
agent	logfile	Path to log file of aos-cloud-manager-agent service.
api	audit_enabled	Flag enabling auditing of API requests.	True
api	host	IP address where the CloudManager API service will run.	0.0.0.0
api	logfile	Path to log file of aos-cloud-manager-api service.
api	port	Port where the CloudManager API service will run.	9362
balancing	approx	Balancing accuracy.	0.1
balancing	enabled	Parameter for enabling or disabling the nodes balancing.	False
balancing	migration_retries	Number of attempts to migrate instances.	3
balancing	periodic_time	Period between host aggregates balancing. It is measured in hours.	24
database	url	Setting up connection to database.	mysql+pymysql:/ /aos:password@l ocalhost:3306/c loud_manager?ch arset=utf8
drs	host	Host where DRS system is running.
drs	password	User password of DRS system.
drs	port	Server port with DRS system.	80
drs	username	User of DRS system.
drs_trigger	enabled	Parameter to enable automatic application of audit results.	True
drs_trigger	logfile	Path to log file of audit results automatic application.
drs_trigger	trigger_interval	Interval for starting automatic application of audit results. It is measured in seconds.	600
extra_availabil ity_check	attempts	Number of attempts to read the file.	2
extra_availabil ity_check	delay	Delay in retrying to read the compute node state file. It is measured in seconds.	60
extra_availabil ity_check	enabled	Parameter to enable or disable additional checking for availability of compute nodes through storage.	False
extra_availabil ity_check	instance_rate	Required percentage of running instances.	100
host_tasks	allow_evacuate_host	This parameter enables or disables host evacuation.	True
host_tasks	deny_evacuate	This parameter disables evacuation for specified hosts.
host_tasks	evacuation_retries	Number of attempts to evacuate instances.	2
host_tasks	prioritized_evacuation_timeout	Interval between evacuating instance groups with the same recovery priorities. It os measured in seconds.	60
host_tasks	recovery_priority	Priority of restoration of the instance during auto evacuation in case of problems on the hypervisor.	5
host_tasks	retries_wait_for_node_state	Maximum number of state polling attempts hypervisors.	240
host_tasks	retries_wait_for_vm_status	Maximum number of state polling attempts instances.	60
listener	logfile	Path to log file of aos-cloud-manager-listener service.
listener	nova_rabbit_vhost	Nova RabbitMQ service virtual host.	/
node_sync	enabled	Parameter enabling synchronization of compute nodes.	True
node_sync	logfile	Path to compute nodes synchronization log file.
node_sync	reserve_interval	Waiting time before start of transfer of free hypervisor to reserve.	60
node_sync	sync_interval	Compute nodes synchronization interval in seconds.	60
node_tracker	allow_host_auto_power_off	Parameter enables or disables restarting compute node in case of its transition to the down status.	False
node_tracker	enabled	Parameter that enables checking the status of compute nodes.	True
node_tracker	host_restart_timeout	Restart timeout compute node. It is measured in seconds.	600
node_tracker	logfile	Path to log file check status of compute nodes.
node_tracker	loop_time	Time interval between checks of the status of compute nodes (in seconds).	30
node_tracker	max_down_hosts	Maximum allowed number of compute nodes in the down status, excluding standby ones. If this number is exceeded, automatic evacuation is not performed for any of the nodes. Negative numbers are not allowed.	0
node_tracker	mutex	Number of attempts to determine the status of the hypervisor when switching to the down status before starting the handler.	3
node_tracker	mutex_up	Number of attempts to determine the status of the hypervisor when switching to up status before starting the handler.	1
node_tracker	timeout_reserv_node_up	Waiting time for raising the standby compute node in minutes.	15
nova_database	url	Setting up connection to database.
power	shutdown_interval	Waiting time until the next iteration starts when checking the status of the compute node when the host is shut down with evacuation of virtual machines when the console utility is powered off the hypervisor. It is measured in seconds.	30
power	shutdown_max_tick	Maximum number of iterations for checking the status of a compute node when the host is shutdown with evacuation of instances when the hypervisor is powered off by the console utility.	10
pxe	conf_dir	Directory for configuration files of PXE.	/var/lib/tftpbo ot/pxelinux.cfg /
storage_sync	enabled	Parameter to enable storage sync.	True
storage_sync	logfile	Path to storages synchronization log file.
storage_sync	sync_interval	Storage sync start interval. It is measured in seconds.	60

Note

approx in balancing section of config file is parameter for determining the loading on the hypervisor. Node is considered loaded if the ratio of the average value of the occupied RAM of the hypervisor to the total amount of memory of this hypervisor exceeds the average value of the occupied RAM of all nodes accepted for balancing to the amount of RAM of these nodes + approx.

Example:

Two compue nodes with total amount of 8192 MB of RAM, 4086 MB each, are accepted for balancing, while 3072 MB of RAM are occupied on one compute node, and 256 MB of RAM are occupied on the second compute node.

Total load of balanced nodes will be calculated as: (3072 + 256) / 8192 = 0.40

The load of the first hypervisor will be calculated as 3072/4086 = 0.75

The load of the second hypervisor will be calculated as 256/4086 = 0.06

approx = 0.1

After adding approx for each hypervisor, compare resulting values to the total load:

for first node: 0.75> 0.40 + 0.1 - this node is loaded and needs to be balanced;

for second node: 0.06 <0.40 + 0.1 - this node is not loaded and will not be accepted for balancing.

Important

It is need to perform the procedure described in the section “Updating the configuration file” when changing the parameters of the configuration file for them to take effect.

Possibilities of using the module ¶

Recovery plan ¶

Roll back if CloudManager fails to install or update:

Compare the versions of the migrations in the file /tmp/rollback/cloud_manager/migrations.txt with the current ones. If there are differences migrate to the previous version. Migration example:
```
openstack aos db list -n cloud_manager
openstack aos db migrate -n cloud_manager --migration 27
```

Revert to the previous state of the packages:

cd /tmp/rollback/cloud_manager
diff --changed-group-format='%>' --unchanged-group-format='' pip_before.txt pip_after.txt > pip_uninstall.txt
diff --changed-group-format='%<' --unchanged-group-format='' pip_before.txt pip_after.txt > pip_install.txt
pip3 uninstall -r pip_uninstall.txt
pip3 install -r pip_install.txt

Installation and configuration¶

Installation¶

Installation on two or more controllers¶

Configuration¶

Configuration file¶

Possibilities of using the module¶

Recovery plan¶

Installation ¶

Installation on two or more controllers ¶

Configuration ¶

Configuration file ¶

Possibilities of using the module ¶

Recovery plan ¶