Ways to restore MariaDB Galera Cluster¶
Article discusses a cluster of 3 servers with a working MySQL Galera Cluster configuration, as well as several examples of restoring its performance:
- GaleraServer-1
- GaleraServer-2
- GaleraServer-3
Option number 1¶
All cluster nodes were stopped in emergency mode.
- GaleraServer-1 - FAILED
- GaleraServer-2 - FAILED
- GaleraServer-3 - FAILED
To restore, on all nodes of the cluster we will write:
mysqld_safe --wsrep-recover
The output of the command will be like this:
GaleraServer-2 mysqld_safe: WSREP: Recovered position 983269fb-f7c1-11e6-b511-43f8ac2c2e03:1741309275417
where
983269fb-f7c1-11e6-b511-43f8ac2c2e03
is UUID of the cluster node;
1741309275417
is WSREP position.
Then we determine the node on which the last changes were made and run with the key --wsrep_new_cluster
:
/etc/init.d/mysql --wsrep_new_cluster
After a successful launch, we start the MySQL service on the remaining nodes.
Option number 2¶
Two cluster nodes were stopped in emergency mode.
- GaleraServer-1 - FAILED
- GaleraServer-2 - FAILED
- GaleraServer-3 - RUNNING
For some reason, two cluster nodes failed, one remained working. To recover, the node needs to be told that it is the Primary Component in the cluster.
Be sure to re-check that the other nodes are really turned off and after that on the working node we will run:
SET GLOBAL wsrep_provider_options='pc.bootstrap=true';
After that, we start the rest of the cluster nodes.
Option number 3¶
One of the cluster nodes was stopped in emergency mode.
- GaleraServer-1 - FAILED
- GaleraServer-2 - RUNNING
- GaleraServer-3 - RUNNING
In this case, everything is simple - we start the stopped node and the cluster will be restored automatically.
Option number 4¶
Two cluster nodes were stopped in the normal way.
- GaleraServer-1 - STOPPED
- GaleraServer-2 - STOPPED
- GaleraServer-3 - RUNNING
If we have two cluster nodes turned off, but one continues to work in normal mode, then on the first and second nodes you need to start the service as follows:
service mysql start --wsrep_sst_donor=<wsrep_node_name>
where wsrep_node_name
by default, this is the name of the server.
Option number 5¶
One of the nodes was stopped in the normal way.
- GaleraServer-1 - STOPPED
- GaleraServer-2 - RUNNING
- GaleraServer-3 - RUNNING
The node was shut down with the systemctl stop mariadb
command, for example, to change the configuration. After that, we start the service as usual and the cluster will recover automatically.
Option number 6¶
All cluster nodes were stopped in the normal way.
- GaleraServer-1 - STOPPED
- GaleraServer-2 - STOPPED
- GaleraServer-3 - STOPPED
For some reason, all cluster nodes were stopped in the normal way. In order to get it back to work, you need to compare the value of seqno
from the file /var/lib/mysql/grastate.dat
on all nodes. The node on which seqno has the highest value is run first:
service mysql start --wsrep_new_cluster
After a successful launch, then we launch the remaining nodes in the usual way:
service mysql start
Option number 7¶
The split brain situation.
To avoid getting this situation, the cluster must have an odd number of nodes. The best way to do this would be to use Galera Arbitrator.