Dead connections, message loss, and outdated presence

Problem

I have two users Alice and Bob. Both are online and chat well. Then Alice loses her connection, but she still appears as online. Bob keeps sending her messages, but those messages are lost. Usually this occurs in wifi or 3G networks.

Explanations by Holger Weiß in an ejabberd mailing list thread:

What's the issue?

An XMPP client talks to ejabberd using a TCP connection. Such a connection can get "lost" without being properly closed. This can easily happen when clients on mobile or wireless networks loose their connectivity. In such a situation, the connection will appear as open to ejabberd until a TCP timeout is reached, which might take several minutes or hours (depending on the operating system configuration). During that time, ejabberd will happily continue sending messages to the client, as the server won't be notified of the issue. XMPP wasn't designed with such unreliable (mobile) connections in mind, so the original standard provides no mechanisms to deal with this issue. Therefore, those messages are lost.

How to detect a dead connection?

One way to detect a dead connection is to ping the client periodically and to kill the connection if the client doesn't respond. This can be done using mod_ping. However, these ping packets might wake up the client's radio, so a short ping interval might drain mobile batteries. Therefore, it's not generally recommended to use an interval of less than a few minutes. Either way, there's always some time window where messages can be lost.

So how to deal with a dead connection?

ejabberd 14.05 and newer support an XMPP extension called "Stream Management" which is meant to deal with the short outages that are common on mobile networks. This extension lets both the client and the server request message acknowledgements. It also allows clients to resume sessions when they come back online after they lost connectivity. Any messages that weren't delivered over the previous connection are retransmitted during session resumption, so message loss no longer occurs.

What happens if session resumption fails?

By default, even if ejabberd does detect a dead connection (e.g., by means of mod_ping), it keeps the session open for a few minutes to give the client a chance to resume the session. During that time, the client will necessarily appear as online to its peers even though it's not. In most cases, the client resumes the session within that time frame. But if it doesn't, ejabberd will by default bounce error messages for any unacknowledged messages back to the sender.

Why aren't unacknowledged messages written to offline storage?

ejabberd can be configured to resend unacknowledged messages instead of generating error bounces by setting resend_on_timeout: true in the ejabberd_c2s listener configuration. If no other client of the recipient is online and mod_offline is enabled, they will end up in offline storage. But if another client is online, that client will receive those messages. The problem is that the same client might've already received copies of those messages some minutes earlier (if they were sent to the bare JID and he had the same priority as the now-offline client, or if carbon copies were enabled). Receiving another copy of a bunch of messages is not what users would expect, so this setting is only recommended to admins who know their server is not going to be used that way.

ejabberd 14.12 and newer also supports setting resend_on_timeout to if_offline, which means those messages are going to be resent (to offline storage) only if no other client is online when the timeout occurs. Otherwise, error messages are bounced. In many (but not all) situations, this does what users want.

How to disable that Stream Management stuff?

To disable the Stream Management features described above, just disable mod_stream_mgmt in your ejabberd configuration file.

In older ejabberd versions, how to disable that Stream Management stuff?

To disable the Stream Management features described above, just set stream_management: false in the ejabberd_c2s listener configuration.

Syndicate content