Ejabberd 2.0.1 crashes when process memory reach close to 2GB in Linux

We are very much happy using ejabberd for our business and it is working fine except for last couple of weeks. We are experiencing ejabberd getting crashed when process memory (beam.smp) reaches close to 2GB limit, this exactly occur when there was increase in chat traffic that led to 1500 concurrent MUC online rooms with close to 2000 concurrent user sessions.

Here is the crash dump1 (head -50 lines)

=erl_crash_dump:0.1
Mon Feb  1 01:58:24 2010
Slogan: eheap_alloc: Cannot allocate 3328160 bytes of memory (of type "old_heap").
System version: Erlang (BEAM) emulator version 5.6.5 [source] [smp:16] [async-threads:256] [hipe] [kernel-poll:true]
Compiled: Mon Jan 18 10:14:45 2010
Atoms: 13572
=memory
total: 2564168880
processes: 2270958140
processes_used: 2270922444
system: 293210740
atom: 621293
atom_used: 604146
binary: 23256960
code: 6167719
ets: 11424008
=hash_table:atom_tab
size: 9643
used: 7277
objs: 13572
depth: 7
=index_table:atom_tab
size: 14336
limit: 1048576
entries: 13572
=hash_table:module_code
size: 397
used: 179
objs: 247
...
...

Here is the crash dump2 (head -50 lines)

=erl_crash_dump:0.1
Thu Jan 28 00:39:13 2010
Slogan: eheap_alloc: Cannot allocate 5385076 bytes of memory (of type "old_heap").
System version: Erlang (BEAM) emulator version 5.6.5 [source] [smp:16] [async-threads:0] [hipe] [kernel-poll:true]
Compiled: Mon Jan 18 10:14:45 2010
Atoms: 13386
=memory
total: 2428284024
processes: 2370573312
processes_used: 2370563440
system: 57710712
atom: 616937
atom_used: 597215
binary: 26513024
code: 6020365
ets: 11399276
=hash_table:atom_tab
size: 9643
used: 7233
objs: 13386
depth: 7
=index_table:atom_tab
size: 14336
limit: 1048576
entries: 13386
=hash_table:module_code
size: 197
used: 151
objs: 240
depth: 5
=index_table:module_code
size: 1024
limit: 65536
entries: 240
=hash_table:export_list
size: 4813
used: 3261
objs: 5483
depth: 8
=index_table:export_list
size: 6144
limit: 65536
entries: 5483
=hash_table:secondary_export_table
size: 97
used: 0
objs: 0
depth: 0
=hash_table:process_reg
size: 97

Here is our setup

- ejabberd 2.0.1 (build from source with flash policy file serving patch)
- enabled ODBC authentication support
- Linux CentOS  (Kernel ver: 2.6.18-164.10.1.el5PAE)
- following ejabberd modules are enabled
   mod_muc_log, mod_http_bind, web_admin, mod_http_poll
   mod_http_hello (custom module to support heart beat on monitoring port 5280 for LVS load balancer) 
- clustered setup running with two nodes
- two virtual hosts, one host with auth_method odbc and another with anonymous
- mod_muc with history_size set to 100

Here is the ejabberd startup parameters options value

# define default configuration
POLL=true
SMP=auto
ERL_MAX_PORTS=1000000
ERL_PROCESSES=50000000
ERL_MAX_ETS_TABLES=140000
ERL_ASYNC_THREAD_CNT=256

Here is the mnesia db node status

use fallback at restart = false
running db nodes   = ['ejabberd@first.abc.com','ejabberd@debug.abc.com']
stopped db nodes   = ['ejabberd@third.abc.com','ejabberd@second.abc.com','ejabberd@fourth.abc.com']
master node tables = []
remote             = [acl,anonymous,caps_features,config,disco_publish,
                      http_bind,iq_response,irc_custom,last_activity,
                      local_config,mod_register_ip,motd,motd_users,
                      muc_online_room,muc_registered,muc_room,offline_msg,
                      privacy,private_storage,pubsub_item,pubsub_node,
                      pubsub_state,roster,route,s2s,session,sr_group,sr_user,
                      user_caps,user_caps_resources,vcard,vcard_search]
ram_copies         = []
disc_copies        = [schema]
disc_only_copies   = []
[] = [user_caps_resources,user_caps,local_config,mod_register_ip]
[{'ejabberd@debug.abc.com',disc_copies},
 {'ejabberd@first.abc.com',disc_copies}] = [schema]
[{'ejabberd@first.abc.com',disc_copies}] = [config,privacy,irc_custom,
                                                   roster,sr_user,motd,acl,
                                                   sr_group,vcard_search,
                                                   motd_users,muc_room,
                                                   pubsub_state,
                                                   muc_registered,pubsub_node]
[{'ejabberd@first.abc.com',disc_only_copies}] = [last_activity,
                                                        offline_msg,
                                                        disco_publish,vcard,
                                                        private_storage,
                                                        pubsub_item]
[{'ejabberd@first.abc.com',ram_copies}] = [http_bind,route,s2s,
                                                  anonymous,caps_features,
                                                  session,iq_response,
                                                  muc_online_room]
4 transactions committed, 0 aborted, 0 restarted, 1 logged to disc
0 held locks, 0 in queue; 0 local transactions, 0 remote

Here is the linux parameters

[root@www35 ejabberd]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 212992
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 212992
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

The crash says that ejabberd try to allocate 5MB (using old_heap) and memory could not be found, but our linux machine was 12GB of memory capacity with 8GB was available at the time of crash, this indicate that the system resources are highly enough to handle much more thousands of more concurrent connections. So I strong hypothesis there might be some configuration setting limit not allowing to use available memory. I would be happy to solve this problem or someone throws the light on what might be going wrong in our end to make it 100% stable for our chat traffic.

Also I does not have hands on experience working with erlang programming language to fiddle around with the code, but nothing stop me to learn in near future. If need more data would be able to provide.

Re: Ejabberd 2.0.1 crashes when process memory reach

Submitted by zinid on Mon, 2010-02-01 16:19.

Make sure you're using 64-bit Erlang. To check that, run:
$ erl
1> erlang:system_info(wordsize).
8
Here 8 means 64-bit Erlang is used. This is important because of limitation of memory allocation on 32-bit platforms.
In your crash dumps you should find memory consuming process. Typically there are only 1 or 2 such processes. To do that grep the crashdump with 'Message queue length' pattern, sort the result. Copy the snippet from =proc: to =proc: of most consuming process and paste it here.
Please tell us the version of your ejabberd and Erlang.

Thanks for response,

Submitted by GP on Mon, 2010-02-01 17:16.

Thanks for response, Zinid.

erlang:system_info(wordsize) returned 4,

we use Linux 32 bit multi-core processor machine with ejabberd 2.0.1 and erlang R12B-5 build from the source.

Does Linux has the 2GB process memory limitation something similar in windows? Also the message queue length is zero across the crash dump

Do we can run 64-bit erlang in 32 linux machine?

Re: Ejabberd 2.0.1 crashes when process memory

Submitted by zinid on Tue, 2010-02-02 02:10.

Quote:

Does Linux has the 2GB process memory limitation something similar in windows?

This is a limitation of Erlang: it is unable to allocate more than 2-2.5Gb of heap on 32-bit machine.

I think no.

Compiled Erlang on 64bit

Submitted by GP on Wed, 2010-02-03 15:02.

Compiled Erlang on 64bit Linux machine and not able to reproduce the issue using jabsimul stress test tool...the memory consumption of ejabberd process crossed more than 2GB even sometime 3GB with heavy load...require some more testing, will update soon

Thanks once again, Zinid

Erlang 2GB memory limit on 32bit processor

Submitted by GP on Thu, 2010-02-04 12:54.

I'm wondering why this limitation is maintained in Erlang?
why not relaxed this limitation based on OS? since linux support more than 2GB address space should not the erlang runtime depend on it?

Re: Erlang 2GB memory limit on 32bit processor

Submitted by zinid on Sat, 2010-02-06 16:26.

No idea. You better ask Erlang developers about that.

Looking for ejabberd docs?