gen_fsm in state loop terminated with reason system_limit

Hi Everyone,

We are working on ejabberd setup with external auth and a few custom modules. Everything works fine, but our stress test crashes the node. When server accept more than 16300+ connection it crashes with message:

[error] <0.22081.3> gen_fsm <0.22081.3> in state loop terminated with reason: {system_limit,[{erlang,open_port,[{spawn,"expat_erl"},[binary]],[]},{xml_stream,new,2,[{file,"src/xml_stream.erl"},{line,182}]},{ejabberd_http_ws,parse,2,[{file,"src/ejabberd_http_ws.erl"},{line,312}]},{ejabberd_http_ws,handle_info,3,[{file,"src/ejabberd_http_ws.erl"},{line,213}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
2015-04-22 06:05:34.391 [error] <0.22081.3> CRASH REPORT Process <0.22081.3> with 1 neighbours exited with reason: {system_limit,[{erlang,open_port,[{spawn,"expat_erl"},[binary]],[]},{xml_stream,new,2,[{file,"src/xml_stream.erl"},{line,182}]},{ejabberd_http_ws,parse,2,[{file,"src/ejabberd_http_ws.erl"},{line,312}]},{ejabberd_http_ws,handle_info,3,[{file,"src/ejabberd_http_ws.erl"},{line,213}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]} in gen_fsm:terminate/7 line 622
2015-04-22 06:05:34.391 [error] <0.32163.0> Supervisor ejabberd_http_sup had child undefined started with {ejabberd_http,start_link,undefined} at <0.22080.3> exit with reason {system_limit,[{erlang,open_port,[{spawn,"expat_erl"},[binary]],[]},{xml_stream,new,2,[{file,"src/xml_stream.erl"},{line,182}]},{ejabberd_http_ws,parse,2,[{file,"src/ejabberd_http_ws.erl"},{line,312}]},{ejabberd_http_ws,handle_info,3,[{file,"src/ejabberd_http_ws.erl"},{line,213}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]} in context child_terminated
2015-04-22 06:05:34.501 [error] <0.32296.0>@ejabberd_listener:accept:316 (#Port<0.37951>) Failed TCP accept: system_limit
2015-04-22 06:05:34.569 [error] <0.22095.3> gen_fsm <0.22095.3> in state loop terminated with reason: {system_limit,[{erlang,open_port,[{spawn,"expat_erl"},[binary]],[]},{xml_stream,new,2,[{file,"src/xml_stream.erl"},{line,182}]},{ejabberd_http_ws,parse,2,[{file,"src/ejabberd_http_ws.erl"},{line,312}]},{ejabberd_http_ws,handle_info,3,[{file,"src/ejabberd_http_ws.erl"},{line,213}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
2015-04-22 06:05:34.570 [error] <0.22095.3> CRASH REPORT Process <0.22095.3> with 1 neighbours exited with reason: {system_limit,[{erlang,open_port,[{spawn,"expat_erl"},[binary]],[]},{xml_stream,new,2,[{file,"src/xml_stream.erl"},{line,182}]},{ejabberd_http_ws,parse,2,[{file,"src/ejabberd_http_ws.erl"},{line,312}]},{ejabberd_http_ws,handle_info,3,[{file,"src/ejabberd_http_ws.erl"},{line,213}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]} in gen_fsm:terminate/7 line 622
2015-04-22 06:05:34.570 [error] <0.32163.0> Supervisor ejabberd_http_sup had child undefined started with {ejabberd_http,start_link,undefined} at <0.22094.3> exit with reason {system_limit,[{erlang,open_port,[{spawn,"expat_erl"},[binary]],[]},{xml_stream,new,2,[{file,"src/xml_stream.erl"},{line,182}]},{ejabberd_http_ws,parse,2,[{file,"src/ejabberd_http_ws.erl"},{line,312}]},{ejabberd_http_ws,handle_info,3,[{file,"src/ejabberd_http_ws.erl"},{line,213}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]} in context child_terminated
2015-04-22 06:05:34.661 [error] <0.32296.0>@ejabberd_listener:accept:316 (#Port<0.37951>) Failed TCP accept: system_limit
2015-04-22 06:05:34.758 [error] <0.22101.3> gen_fsm <0.22101.3> in state loop terminated with reason: {system_limit,[{erlang,open_port,[{spawn,"expat_erl"},[binary]],[]},{xml_stream,new,2,[{file,"src/xml_stream.erl"},{line,182}]},{ejabberd_http_ws,parse,2,[{file,"src/ejabberd_http_ws.erl"},{line,312}]},{ejabberd_http_ws,handle_info,3,[{file,"src/ejabberd_http_ws.erl"},{line,213}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
2015-04-22 06:05:34.758 [error] <0.22101.3> CRASH REPORT Process <0.22101.3> with 1 neighbours exited with reason: {system_limit,[{erlang,open_port,[{spawn,"expat_erl"},[binary]],[]},{xml_stream,new,2,[{file,"src/xml_stream.erl"},{line,182}]},{ejabberd_http_ws,parse,2,[{file,"src/ejabberd_http_ws.erl"},{line,312}]},{ejabberd_http_ws,handle_info,3,[{file,"src/ejabberd_http_ws.erl"},{line,213}]},{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,505}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]} in gen_fsm:terminate/7 line 622

System limits sets to:

Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             15845                15845                processes
Max open files            40960                40960                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       15845                15845                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Apparently (due to references to "spawn" and "proc_lib" in the crash log) it hits processes limit? But it is still pretty confusing, as the system limit is set to lower value than actual number of connections accepted. So, does this kind of crash actually relate to the system "max processes" limit setting, or this is a pure coincidence?

Another important question related to the single node capacity is how to limit the number of client connections accepted by a particular node? The scenario we have in mind is to allow about 10K client connections to the node and when this limit is reached the node should start refusing further new client connections while still be able to respond normally to other connection types (s2s, xml-rpc, etc). We'd rather not be able to achieve the desired behavior by simply setting ERL_MAX_PORTS - right?

You probably hit Erlang

You probably hit Erlang process limit, not Linux System limits. You should increase the number of allowed process either by editing ejabberdctl.cfg or using erl command line +P parameter (depending on how you launch ejabberd).

Syndicate content