Unknown ejabberd stability problem

Hi,

We have the following cluster of 3 load-balanced ejabberd servers set up on Win2003 machines:

Server 1:
---------
Intel Xeon 3.06GHz 7.75GB RAM

Server 2:
---------
Intel Pentium 4 3.60Ghz 3.11GB RAM

Server 3:
---------
Intel Pentium 4 3.60Ghz 3.11GB RAM

ejabberd version 0.9.8 runs as a service with the following startup parameters:
-s ejabberd +P 250000 -env ERL_MAX_PORTS 32000 -env ERL_MAX_ETS_TABLES 20000 -pa ebin -env EJABBERD_LOG_PATH g:/ejabber_logs/ejabberd.log -kernel inetrc \"./inetrc\" -sasl sasl_error_logger {file,\"g:/ejabber_logs/sasl.log\"} -cookie XXXXXXXXXXXXXXXX

Mnesia tables are set up to synchronise between RAM and disk on each server.

Our servers crash at (seemingly) sporadic times, but usually after about two days of uptime. We haven't been able to solve the mystery. It's never the cluster as a whole, but one or maybe two servers that crash within a few minutes of each other. A server's CPU load goes up to 100% for a minute or two before it crashes. Our user load usually peaks at 8000 per server on any given day (thus, 24 000 in total), which results in system instability due to high load. However, this is not the only time when the servers crash.

I should also mention that we have made changes to some of the ODBC modules to enable us to interface with a MSSQL DB: The built-in queries have been substituted with stored procedures (SPs) -- let me know if you would like to view the changed source.

The following are the heads of crash dumps at different times for the respective servers:

Server 1 dump:
--------------
=erl_crash_dump:0.1
Mon Dec 19 12:02:53 2005
Slogan: eheap_alloc: Cannot reallocate 153052320 bytes of memory (of type "heap").
System version: Erlang (BEAM) emulator version 5.4.9 [threads:0]
Compiled: Tue Aug 30 01:23:58 2005
Atoms: 9087
=memory
total: 688010870
processes: 635511554
processes_used: 632819026

Server 2 dump:
--------------
=erl_crash_dump:0.1
Tue Dec 13 16:53:39 2005
Slogan: eheap_alloc: Cannot allocate 373662860 bytes of memory (of type "heap").
System version: Erlang (BEAM) emulator version 5.4.9 [threads:0]
Compiled: Tue Aug 30 01:23:58 2005
Atoms: 9086
=memory
total: 1725872637
processes: 1676066076
processes_used: 1672698772

Server 3 dump:
--------------
=erl_crash_dump:0.1l_crash.dump
Fri Dec 16 22:28:57 2005
Slogan: eheap_alloc: Cannot allocate 78362800 bytes of memory (of type "heap").
System version: Erlang (BEAM) emulator version 5.4.9 [threads:0]
Compiled: Tue Aug 30 01:23:58 2005
Atoms: 9093
=memory
total: 998708688
processes: 943052032
processes_used: 941732472

My questions:
1. What may cause the large chunks of memory being allocated? Is this normal?
2. Could this have anything to do with Mnesia and consequently memory fragmentation?
3. Could this have anything to do with the changes made to the ODBC modules?
4. Would kernel polling help (when supported under Windows)?
5. Would an upgrade help?

Please let me know if you need more information. Any help/hints would be appreciated.

Thank you
Pieter Rautenbach

it is true: there is some bug in ejabberd

Hi, i experience the same problem in clustered enviroment - some of nodes ocassionaly segfault like that:
(ejabberd svn version: 522, bug also appear on elier versions):

=erl_crash_dump:0.1
Mon Mar 27 12:28:22 2006
Slogan: eheap_alloc: Cannot allocate 191315400 bytes of memory (of type "old_heap").
System version: Erlang (BEAM) emulator version 5.4.13 [source] [hipe] [threads:0] [kernel-poll]
Compiled: Fri Mar 17 17:26:09 2006
Atoms: 11822
=memory
total: 1015490905
processes: 611757611
processes_used: 611606867
system: 403733294
atom: 536673
atom_used: 523526
binary: 27081542
code: 4620141
ets: 360456528

Solution?

Did you guys ever find a solution to this problem?

Most of the time, this is

Most of the time, this is because Erlang R10B is used without the supervisor patch.

--
Mickaël Rémond
Process-one

We are seeing similar errors

We are seeing similar errors using the installer for 2.0.0 rc1 on Linux
ejabberd-2.0.0-rc1-linux-x86-installer.bin

"erl -v" shows the version as 5.5.5
Erlang (BEAM) emulator version 5.5.5 [source] [async-threads:0] [kernel-poll:false]

The installer installs a supervisor.beam file with the following information:
filesize: 8100 location: /opt/ejabberd-2.0.0/lib/stdlib-1.14.5/ebin/supervisor.beam

the latest supervisor.beam file seems to be:
filesize: 13960
However, the patch file (located at http://support.process-one.net/doc/display/CONTRIBS/Supervisor+-+Perform...) has a date of May 07,2006 (Added by Mickaël Rémond, last edited by Mickaël Rémond on May 07, 2006 (view change))
It also references R10B-7 to R10B-10 which I believe is erl 5.4.x

Is the supervisor.beam installed with the 2.0.0-rc1 installer the most recent and correct for this version?
Also, is it being used by erl or do I need to change some settings for it to get used?

thanks.

Yes. The supervisor patch

Yes. The supervisor patch helped and also moving away from Windows as platform.

Syndicate content