MUC with clustering vs. HA/High Availability

Hi,

I set up a cluster of two ejabberd (16.03) nodes - so far so good.
The requirement is to have a HA-solution with 2 (or more) systems where one system can take over if the other fails.
With the current solution, MUC-rooms only seem to live on the node where they were created, so I have the following problem:
- MUC room "A" is created on Node_1
- Node_2 can see and connect to room "A" via the cluster
- Node_1 goes offline (e.g. Hardware fault)
- Node_2 has no information on room "A", clients receive an error.

Is there any working solution for this (having muc-information be shared so that a hot-standby szenario is possible)?

Savek

Hi Savek, This is something

Hi Savek,

This is something I'm trying to do as well. Have you found a solution yet?

It seems the built-in clustering does not provide this. I'm now considering the following construction, which feels a bit fragile, but could work:
• Have one ejabberd instance (the master) running, and the other (the standby) stopped.
• From the standby computer, perform the following actions over and over:
· make a database backup of the master using ejabberdctl backup;
· transfer the backup file to the standby;
· install the backup file on the standby using ejabberdctl install_fallback.
Whenever one of these actions fail (or you note in another way that the master is down), start the ejabberd instance on the standby. It will use the latest succesful database dump from the master.

How often you can make, transport and install the backup will depend on the size of the database, the speed of the server and the connection between master and standby. In my setup (very small server) it takes a only few seconds, so you could do this a few times per minute and thus miss very little data when the standby has to take over.

One big disadvantage of this scheme is that recovering from a failure might be tricky: when the standby takes over, new messages and other data will end up in the standby database. However, if the failure of the database transfer is due to network partitioning, the master could have some new messages as well. You would need to merge the two databases somehow if you don't want to lose any history.

Also, it feels silly to dump and transport the entire database every time, copying the same data over and over.

Apparently something like

Apparently something like that is provided in the ejabberd Business Edition: room automatic migration to another node when a node goes down. It isn't mentioned in https://www.process-one.net/en/ejabberd/protocols/ or maybe it's included in "Consistent hash clustering"

Syndicate content