2.1.8 high cpu, memory, then erlang crash

I upgraded to 2.1.8 recently, and since then ejabberd has crashed every 1-2 weeks. In fact, it appears all of erlang crashes due to being out of memory.

erlang crash dump as well as ejabberd and erlang logs are here:

http://dl.dropbox.com/u/20965684/ejab-msrl-logs.tar.bz2

I should note that I am using the updated mod_shared_roster_ldap described here:
https://support.process-one.net/browse/EJAB-1480

for improved AD support.

Thanks for any insight you can provide.

Memory allocated:

Memory allocated: 	1.289,840.896 bytes

that is 1289 MB

Cannot allocate 583,848.200 bytes of memory (of type "heap").

That is 583 MB

Process: <0.365.0>
State:	Garbing (limited info)
Stack+heap:	145,962.050
Msg queue length:	169.927

That is 145 MB alone for this process, and it has long queue of messages that it hasn't yet handled. The process was "garbage collecting" when the machine crashed.

[<0.365.0>,"jw-rbc.org",mod_pubsub,iq_sm]

This process apparently handles Pubsub IQs.

Thank you, that is helpful.

Thank you, that is helpful. Is there any way to determine what client is creating all of the pubsub IQs, or to see while it's running what the queue length is or get other stats on it?

Is it possible this is related to mod_shared_roster_ldap caching and somehow the cache is being internally handled as pubsub?

Edit: my server is in this state again. I was able to attach a debug console and this snippet from i(). was there:

'ejabberd_mod_pubsub_ gen_server:loop/6 9
<0.360.0> gen_iq_handler:init/1 74732575 23044526 1060
dets:req/2 75
<0.361.0> gen_iq_handler:init/1 4181 28374 0
gen_server:loop/6 9
<0.365.0> mod_pubsub:send_loop/1 4181 3307241 0
'ejabberd_mod_pubsub_ mod_pubsub:send_loop/1 8

How can I take the next step and figure out what is causing mod_pubsub to be in this condition? I did a dump of the entire mnesia database; pubsub components didn't seem out of control. File sizes were fairly small on disk. I guess I could disable mod_pubsub entirely but I'd rather not if I don't have to.

I ended up disabling

I ended up disabling mod_pubsub entirely and things have been stable ever since, no memory growth. Perhaps there is a bug or leak of some kind in mod_pubsub in 2.1.8?

Now that it's disabled, many clients cannot get avatars, so I'd like to re-enable it. How can I determine what is causing this?

Syndicate content