Clustered Nodes lose communication if network connection is interrupted

I have two ejabberd 2.0.1 Nodes running in a clustered domain. The two nodes are connected via a WAN link between two sites. Everything works fine, except when there is any temporary disruption on the network link, the two nodes lose communication - i.e. when I log into the admin interface, only one node shows as "running". When this happens, the users logged into one node cannot see the users on the other node, and vise-versa.

If I restart either node (ejabberdctl restart), everything returns to normal and the nodes once again "see" each other. However, this gets to be a pain after a while. Has anyone else seen this behavior? Anyone know of a way to have the nodes automatically reconnect after a network communication outage?

TIA - TF

ejabberd 2.0.1 - OpenSuSE 10.2

Just an idea with erlang shell

The connection between the ejabberd nodes should be stable. So, the situation you described is not expected.

I have an idea, maybe it works. If you try it, please tell me if it works or not.

For this to work you need to enter an interactive erlang shell in one of the nodes, so you can execute erlang calls:

$ ejabberdctl debug

When two nodes cannot connect, for this call you should get a 'pang' response:

 > net_adm:ping(ejabberd@atenea).
pang

When two nodes are connect or can connect, this call tries to connect. If connection succeeds, is will show 'pong':

 > net_adm:ping(ejabberd@atenea).
pong

Maybe if you execute this call in one of the nodes after network comes back, the nodes will reconnect.

I had the same issue. If you

I had the same issue. If you try to pang it in debug mode it still wont force the databases to synchronize since it doesn't know which table to use as the master table.

In order to get around this i have a monitoring script that logs into each machine and sends a message across. If the message isn't received then i know the servers are out of sync and i restart a server. Our servers aren't on a WAN so this isn't a big issue and we have only had that happen once but over a wan it may happen frequently.

Your monitoring script..

FindAndy wrote:

In order to get around this i have a monitoring script that logs into each machine and sends a message across. If the message isn't received then i know the servers are out of sync and i restart a server.

I finally got clustering working and am seeing the same thing.

Would you mind sharing your monitoring script?

thanks.

Hello. I wish to access to

Hello. I wish to access to knowing people: - if disconnect link between hosts in a cluster that further nods not see each other (in the web interface local is started, and others as stopped) - if users are connected to different computers in a cluster, at rupture of link and after restarting of one ejabberd servers disappear all offline messages... Whether what or these solutions this problemm? In advance thanks...

I try
(ejabberd@server3.test.ru)4> net_adm:ping(ejabberd@server2.test.ru).
pong

but this not help, nodes dont see each other...

Started Nodes
ejabberd@server3.test.ru
Stoped Nodes
ejabberd@server2.test.ru

Is any solutions this problemm?

Clustered Nodes lose communication if network connection is inte

You monitoring script that logs into each machine and sends a message across. If the message isn't received then i know the servers are out of sync and i restart a server.

Hi! But it is possible to

Hi!

But it is possible to develop some module to monitoring this issue, right? It would just be a normal mod? or I have to change the "kernel" of ejabberd to do this?

Thks, regards

Yes, seems possible as ejabberd module.

Yes, I think you can develop as an ejabberd module, no need to include the code in a core ejabberd file.

Syndicate content