Ways to restore MariaDB Galera Cluster

Article discusses a cluster of 3 servers with a working MySQL Galera Cluster configuration, as well as several examples of restoring its performance:

  • GaleraServer-1
  • GaleraServer-2
  • GaleraServer-3

Option number 1

All cluster nodes were stopped in emergency mode.

  • GaleraServer-1 - FAILED
  • GaleraServer-2 - FAILED
  • GaleraServer-3 - FAILED

To restore, on all nodes of the cluster we will write:

mysqld_safe --wsrep-recover

The output of the command will be like this:

GaleraServer-2 mysqld_safe: WSREP: Recovered position 983269fb-f7c1-11e6-b511-43f8ac2c2e03:1741309275417

where

983269fb-f7c1-11e6-b511-43f8ac2c2e03 is UUID of the cluster node;

1741309275417 is WSREP position.

Then we determine the node on which the last changes were made and run with the key --wsrep_new_cluster:

/etc/init.d/mysql --wsrep_new_cluster

After a successful launch, we start the MySQL service on the remaining nodes.

Option number 2

Two cluster nodes were stopped in emergency mode.

  • GaleraServer-1 - FAILED
  • GaleraServer-2 - FAILED
  • GaleraServer-3 - RUNNING

For some reason, two cluster nodes failed, one remained working. To recover, the node needs to be told that it is the Primary Component in the cluster.

Be sure to re-check that the other nodes are really turned off and after that on the working node we will run:

SET GLOBAL wsrep_provider_options='pc.bootstrap=true';

After that, we start the rest of the cluster nodes.

Option number 3

One of the cluster nodes was stopped in emergency mode.

  • GaleraServer-1 - FAILED
  • GaleraServer-2 - RUNNING
  • GaleraServer-3 - RUNNING

In this case, everything is simple - we start the stopped node and the cluster will be restored automatically.

Option number 4

Two cluster nodes were stopped in the normal way.

  • GaleraServer-1 - STOPPED
  • GaleraServer-2 - STOPPED
  • GaleraServer-3 - RUNNING

If we have two cluster nodes turned off, but one continues to work in normal mode, then on the first and second nodes you need to start the service as follows:

service mysql start --wsrep_sst_donor=<wsrep_node_name>

where wsrep_node_name by default, this is the name of the server.

Option number 5

One of the nodes was stopped in the normal way.

  • GaleraServer-1 - STOPPED
  • GaleraServer-2 - RUNNING
  • GaleraServer-3 - RUNNING

The node was shut down with the systemctl stop mariadb command, for example, to change the configuration. After that, we start the service as usual and the cluster will recover automatically.

Option number 6

All cluster nodes were stopped in the normal way.

  • GaleraServer-1 - STOPPED
  • GaleraServer-2 - STOPPED
  • GaleraServer-3 - STOPPED

For some reason, all cluster nodes were stopped in the normal way. In order to get it back to work, you need to compare the value of seqno from the file /var/lib/mysql/grastate.dat on all nodes. The node on which seqno has the highest value is run first:

service mysql start --wsrep_new_cluster

After a successful launch, then we launch the remaining nodes in the usual way:

service mysql start

Option number 7

The split brain situation.

To avoid getting this situation, the cluster must have an odd number of nodes. The best way to do this would be to use Galera Arbitrator.