Messages lost: looking for the reason why it happened

Hi,

I'm currently working at the backend of a mobile IM application.

We are experiencing some messages lost and it's very frustrating for us and of course for our users.

Some data that might help us:
- We have a low number of registered users (~1000) with a very traffic.
- The messages lost seems to happen specially for a user with a high number of contacts (~100).
- The messages lost seems to stop when we restart the server, then they appear again after two or three days on.

My main goal is to determine if the problem comes from the backend or from the frontend applications, so ideally I want to be able to work on the backend to solve the problem or to demonstrate to the frontend's developers that there's a problem with the application and provide them as many information as possible.

As an example, I provide here the flow of a message lost we've experienced this morning.
The flow corresponds to one message, but we've experienced many other lost with the same behaviour.
(What I interpreted from the logs, then the logs):
1. The message is received from the transmitter.
2. The receiver is offline, so the message is stored in the pool table.
3. The receiver connect to the server.
4. The message is retrieved (and deleted) from the spool table.
5. The message is send to the receiver.

So, where/when the lost may happened?
How can I detect where is the problem?
How can I determine if it is an application problem?

The only case where I can see a chance to loss this message is if the connection with the receiver is lost again while the message is being retrieved from the database and being sent, but I'm not sure about that, neither if it may happen.

Thanks a lot! =)

Here the logs:

..........
other logs
..........
2017-05-29 08:48:18.125 [debug] <0.1254.1>@ejabberd_receiver:process_data:284 Received XML on stream = <<"<message to='receiver' id='2375' type='chat'><body>content</body><subtype xmlns=\"urn:xmpp:ourtype\"></subtype><request xmlns='urn:xmpp:receipts'/></message><r xmlns='urn:xmpp:sm:3'/>">>
2017-05-29 08:48:18.125 [debug] <0.1254.1>@shaper:update:120 State: {maxrate,50000,485.02086186042015,1496040494992071}, Size=231
M=2.3212585884115224, I=3133.117
2017-05-29 08:48:18.125 [debug] <0.1255.1>@ejabberd_router:do_route:351 route
from {jid,<<"sender">>,<<"server">>,<<"354565077274573">>,<<"sender">>,<<"server">>,<<"354565077274573">>}
to {jid,<<"receiver">>,<<"server">>,<<>>,<<"receiver">>,<<"server">>,<<>>}
packet {xmlel,<<"message">>,[{<<"xml:lang">>,<<"en">>},{<<"to">>,<<"jid-receiver">>},{<<"id">>,<<"2375">>},{<<"type">>,<<"chat">>}],[{xmlel,<<"body">>,[],[{xmlcdata,<<"content">>}]},{xmlel,<<"subtype">>,[{<<"xmlns">>,<<"urn:xmpp:ourtype">>}],[]},{xmlel,<<"request">>,[{<<"xmlns">>,<<"urn:xmpp:receipts">>}],[]}]}
2017-05-29 08:48:18.125 [debug] <0.1255.1>@ejabberd_local:do_route:265 local route
from {jid,<<"sender">>,<<"server">>,<<"354565077274573">>,<<"sender">>,<<"server">>,<<"354565077274573">>}
to {jid,<<"receiver">>,<<"server">>,<<>>,<<"receiver">>,<<"server">>,<<>>}
packet {xmlel,<<"message">>,[{<<"xml:lang">>,<<"en">>},{<<"to">>,<<...>>},{<<...>>,...},{...}],[{xmlel,<<...>>,...},{xmlel,...},{...}]}
2017-05-29 08:48:18.126 [debug] <0.1255.1>@ejabberd_sm:do_route:423 session manager
from {jid,<<"sender">>,<<"server">>,<<"354565077274573">>,<<"sender">>,<<"server">>,<<"354565077274573">>}
to {jid,<<"receiver">>,<<"server">>,<<>>,<<"receiver">>,<<"server">>,<<>>}
packet {xmlel,<<"message">>,[{<<"xml:lang">>,<<"en">>},{<<"to">>,<<...>>},{<<...>>,...},{...}],[{xmlel,<<...>>,...},{xmlel,...},{...}]}
2017-05-29 08:48:18.127 [debug] <0.476.0>@extauth:loop:121 extauth call '[<<"isuser">>,<<"receiver">>,<<"server">>]' received data response:
[0,1]
2017-05-29 08:48:19.687 [debug] <0.1255.1>@ejabberd_c2s:send_text:1814 Send XML on stream = <<"<a xmlns='urn:xmpp:sm:3' h='7'/>">>
2017-05-29 08:48:19.687 [debug] <0.374.0>@ejabberd_odbc:sql_query_internal:557 SQL: "select count(*) from spool where username='receiver'"
2017-05-29 08:48:19.687 [debug] <0.419.0>@ejabberd_odbc:sql_query_internal:557 SQL: "begin;"
2017-05-29 08:48:19.688 [debug] <0.419.0>@ejabberd_odbc:sql_query_internal:557 SQL: "insert into spool(username, xml) values ('receiver', '<message from=''sender@server/354565077274573'' to=''receiver@server'' xml:lang=''en'' id=''2375'' type=''chat''><body>content</body><subtype xmlns=''urn:xmpp:ourtype''/><request xmlns=''urn:xmpp:receipts''/><delay xmlns=''urn:xmpp:delay'' from=''server'' stamp=''2017-05-29T06:48:19.687Z''>Offline Storage</delay></message>');"
2017-05-29 08:48:19.692 [debug] <0.419.0>@ejabberd_odbc:sql_query_internal:557 SQL: "commit;"
..........
other logs
..........
2017-05-29 08:58:40.745 [debug] <0.1313.1>@ejabberd_receiver:process_data:284 Received XML on stream = <<"<enable xmlns=\"urn:xmpp:sm:3\" resume=\"true\" max=\"100\"/>">>
2017-05-29 08:58:40.745 [debug] <0.1313.1>@shaper:update:120 State: {maxrate,50000,54.93168108938995,1496041119734086}, Size=55
M=0.5503022902991348, I=1011.179
2017-05-29 08:58:40.745 [info] <0.1314.1>@ejabberd_c2s:handle_enable:2678 Stream management without resumption enabled for receiver@server/BAD5B026-46FB-4478-8B17-241BCA253EC1
2017-05-29 08:58:40.745 [debug] <0.1314.1>@ejabberd_c2s:send_text:1823 Send XML on stream = <<"<enabled xmlns='urn:xmpp:sm:3'/>">>
2017-05-29 08:58:41.451 [debug] <0.1313.1>@ejabberd_receiver:process_data:284 Received XML on stream = <<"<presence type=\"available\"><show>away</show></presence><presence to=\"a-room@conference.server/receiver\" from=\"receiver@server\"><x xmlns=\"http://jabber.org/protocol/muc\"><history maxstanzas=\"200\" since=\"2017-05-03T16:04:28Z\"/></x></presence><presence to=\"another-room@conference.server/receiver\" from=\"receiver@server\"><x xmlns=\"http://jabber.org/protocol/muc\"><history maxstanzas=\"200\" since=\"2017-04-07T18:04:14Z\"/></x></presence>.......>
2017-05-29 08:58:41.452 [debug] <0.1313.1>@shaper:update:120 State: {maxrate,50000,54.65877789612266,1496041120745378}, Size=1092
M=10.925972002769987, I=706.761
2017-05-29 08:58:41.453 [debug] <0.1314.1>@ejabberd_c2s:session_established2:1235 presence_update({jid,<<"receiver">>,<<"server">>,<<"BAD5B026-46FB-4478-8B17-241BCA253EC1">>,<<"receiver">>,<<"server">>,<<"BAD5B026-46FB-4478-8B17-241BCA253EC1">>},
{xmlel,<<"presence">>,[{<<"type">>,<<"available">>}],[{xmlel,<<"show">>,[],[{xmlcdata,<<"away">>}]}]},
{state,{socket_state,fast_tls,{tlssock,#Port<0.32951>,#Port<0.32952>},<0.1313.1>},ejabberd_socket,#Ref<0.0.786433.80581>,false,<<"2390383597950620704">>,undefined,c2s,c2s_shaper,false,true,false,true,[verify_none,compression_none,{certfile,<<"certs/path">>}],true,{jid,<<"receiver">>,<<"server">>,<<"BAD5B026-46FB-4478-8B17-241BCA253EC1">>,<<"receiver">>,<<"server">>,<<"BAD5B026-46FB-4478-8B17-241BCA253EC1">>},<<"receiver">>,<<"server">>,<<"BAD5B026-46FB-4478-8B17-241BCA253EC1">>,{{1496,41102,155305},<0.1314.1>},{100,{{<<"contact1">>,<<"server">>,<<>>},{{<<"contact2">>,<<"server">>,<<>>},...},...},...},...},...}}},...}},...})
2017-05-29 08:58:41.453 [debug] <0.1314.1>@ejabberd_c2s:presence_update:2026 from unavail = true
2017-05-29 08:58:41.453 [debug] <0.1314.1>@mod_shared_roster:user_available:789 user_available for <<"receiver">> @ <<"server">> (1 resources)
2017-05-29 08:58:41.454 [debug] <0.373.0>@ejabberd_odbc:sql_query_internal:557 SQL: "select grp from sr_user where jid='receiver@server';"
2017-05-29 08:58:41.454 [debug] <0.375.0>@ejabberd_odbc:sql_query_internal:557 SQL: "select name from sr_group;"
2017-05-29 08:58:41.454 [debug] <0.349.0>@ejabberd_odbc:sql_query_internal:557 SQL: "begin;"
2017-05-29 08:58:41.455 [debug] <0.349.0>@ejabberd_odbc:sql_query_internal:557 SQL: "select username, xml from spool where username='receiver' order by seq;"
2017-05-29 08:58:41.456 [debug] <0.349.0>@ejabberd_odbc:sql_query_internal:557 SQL: "delete from spool where username='receiver';"
2017-05-29 08:58:41.457 [debug] <0.349.0>@ejabberd_odbc:sql_query_internal:557 SQL: "commit;"
.........................
route process starts here
.........................
2017-05-29 08:58:41.459 [debug] <0.1314.1>@ejabberd_router:do_route:351 route
from {jid,<<"sender">>,<<"server">>,<<"354565077274573">>,<<"sender">>,<<"server">>,<<"354565077274573">>}
to {jid,<<"receiver">>,<<"server">>,<<>>,<<"receiver">>,<<"server">>,<<>>}
packet {xmlel,<<"message">>,[{<<"from">>,<<"sender@server/354565077274573">>},{<<"to">>,<<"receiver@server">>},{<<"xml:lang">>,<<"en">>},{<<"id">>,<<"2375">>},{<<"type">>,<<"chat">>}],[{xmlel,<<"body">>,[],[{xmlcdata,<<"content">>}]},{xmlel,<<"subtype">>,[{<<"xmlns">>,<<"urn:xmpp:ourtype">>}],[]},{xmlel,<<"request">>,[{<<"xmlns">>,<<"urn:xmpp:receipts">>}],[]},{xmlel,<<"delay">>,[{<<"xmlns">>,<<"urn:xmpp:delay">>},{<<"from">>,<<"server">>},{<<"stamp">>,<<"2017-05-29T06:48:19.687Z">>}],[{xmlcdata,<<"Offline Storage">>}]}]}
2017-05-29 08:58:41.459 [debug] <0.1314.1>@ejabberd_local:do_route:265 local route
from {jid,<<"sender">>,<<"server">>,<<"354565077274573">>,<<"sender">>,<<"server">>,<<"354565077274573">>}
to {jid,<<"receiver">>,<<"server">>,<<>>,<<"receiver">>,<<"server">>,<<>>}
packet {xmlel,<<"message">>,[{<<"from">>,<<"mlem"...>>},{<<"to">>,<<...>>},{<<...>>,...},{...}|...],[{xmlel,<<...>>,...},{xmlel,...},{...}|...]}
2017-05-29 08:58:41.460 [debug] <0.1314.1>@ejabberd_sm:do_route:423 session manager
from {jid,<<"sender">>,<<"server">>,<<"354565077274573">>,<<"sender">>,<<"server">>,<<"354565077274573">>}
to {jid,<<"receiver">>,<<"server">>,<<>>,<<"receiver">>,<<"server">>,<<>>}
packet {xmlel,<<"message">>,[{<<"from">>,<<"mlem"...>>},{<<"to">>,<<...>>},{<<...>>,...},{...}|...],[{xmlel,<<...>>,...},{xmlel,...},{...}|...]}
2017-05-29 08:58:41.460 [debug] <0.1314.1>@ejabberd_sm:route_message:593 sending to process <0.1314.1>
..........................................
end route process, same for other messages
..........................................
..........
other logs
more logs
..........
2017-05-29 08:58:41.552 [debug] <0.1314.1>@ejabberd_c2s:send_text:1814 Send XML on stream = <<"<message from='sender@server/354565077274573' to='receiver@server' xml:lang='en' id='2375' type='chat'><body>content</body><subtype xmlns='urn:xmpp:ourtype'/><request xmlns='urn:xmpp:receipts'/><delay xmlns='urn:xmpp:delay' from='server' stamp='2017-05-29T06:48:19.687Z'>Offline Storage</delay></message>">>
..........
other logs
..........
Syndicate content