Restoring Rabbit functionality¶
Environment
3 virtual machines and a cluster has been built. Check Cluster construction
Master node yzm1(128), slave node yzm2(129), slave node yzm3(130)
The master node is normal, but the slave node is down.
Check the cluster status, all 3 virtual machines are in normal operation:
Make yzm2 down, shut down the virtual machine corresponding to yzm2:
Remove the node of yzm2:
Restart the rabbitmq service of yzm2:
Error message: Node rabbit@yzm2 thinks it’s clustered with node rabbit@yzm3, but rabbit@yzm3 disagrees
It means that the yzm2 downtime is not a normal exit from the cluster, and the node configuration information of the cluster is still retained, but the yzm2 node has been kicked out of the cluster
Solution Clear the reserved node configuration information:
rm -rf /var/lib/rabbitmq/mnesia/*.
After the files in the mnesia directory are deleted, they will be regenerated after a while.
If you accidentally delete the mnesia directory, then recreate the mnesia directory, but you must ensure that the user and group of the directory are rabbitmq
Command:
chown Rabbitmq:rabbitmq /var/lib/rabbitmq/
yzm2 rejoins the cluster:
The master node is down, and the slave node is normal.
Close yzm3, yzm2
yzm1 is down, turn off the virtual machine
At this time, yzm1 is the last node, and the last node is also regarded as the master node.
Start yzm2, it will wait for yzm1 to start, and it will fail to start after timeout:
Operate yzm3 to eliminate yzm1:
Main command:
rabbitmqctl force_boot rabbitmqctl start_app rabbitmqctl forget_cluster_node rabbit@yzm1
I have used the –offline parameter many times, but it has not been used successfully, and I don’t know why
For example such usage:
rabbitmqctl forget_cluster_node --offline rabbit@yzm1 rabbitmqctl -n rabbit@yzm3 forget_cluster_node --offline rabbit@yzm1
Finally restart yzm1 and join the cluster
This step is the same as the above yzm2 restart to join the cluster, you need to delete the node configuration information first cluster status:
Node update before starting the node
The current cluster has nodes yzm2 and yzm3.
Shutdown node yzm2:
At the same time, start yzm1 to rejoin the cluster (the node configuration information needs to be cleared before starting the rabbitmq service):
Close the yzm3 node and start the yzm2 node:
yzm2 will wait until it times out and reports an error
It is correct to consult yzm1 that is running normally in the cluster before starting:
rabbitmqctl update_cluster_nodes rabbit@yzm1