CPU usage very high/epoll

We have a cluster setup with 3 ejabberd nodes on the same network. Each of these is configured with their own ram/disk copy of our mnesia tables for redundency. Ejabberd is using a full cpu on each of our boxes regardless of load. A full CPU seems like a lot considering that at peak we have about 1k sessions but, even over the weekend when we have tens of sessions its using a full CPU.

Its running on linux with kernel version 2.6. Kernel polling is enabled although I don't see a /dev/epoll on the machine. Would the erlang console report kernel-poll:true if it were unable to leverage this in the kernel?

Erlang version is R16B.

Anyone have any insights?

I ran ejabberdctl debug and

I ran ejabberdctl debug and then etop:start([{output, text}]).

I see:
Pid Time Reds Memory MsgQ Current Function
<5591.511.0> ejabberd_mod_pubsub_ '-' 127139******** 62378 dets:req/2

We had a bug in our client that caused subscriptions to be sent over and over again for the same items. We are in the process of desubscribing from these which I think explains the the MsgQ size. Question is why is Time set to '-' and why does the CPU seem to get fixed at about 1 CPU?

Full output:
========================================================================================
'ejabberd@prod-xmpp-barker0.prod.djrd.dowjones.net' 13:14:25
Load: cpu 117 Memory: total 2338890 binary 24916
procs 2155 processes 1822205 code 15571
runq 1 atom 549 ets 441591

Pid Name or Initial Func Time Reds Memory MsgQ Current Function
----------------------------------------------------------------------------------------
<5591.245.0> proc_lib:init_p/5 '-' 1860068 29872 0 dets:open_file_loop2
<5591.501.0> ejabberd_mod_pubsub_ '-' 51033022445928 0 gen_server:loop/6
***************proc_lib:init_p/5 '-' 219408 460048 0 prim_inet:recv0/3
***************proc_lib:init_p/5 '-' 143353 4119712 0 prim_inet:recv0/3
***************proc_lib:init_p/5 '-' 143053 689576 0 prim_inet:recv0/3
***************proc_lib:init_p/5 '-' 133426 514232 0 prim_inet:recv0/3
***************proc_lib:init_p/5 '-' 128552 426560 0 prim_inet:recv0/3
<5591.511.0> ejabberd_mod_pubsub_ '-' 127139******** 62378 dets:req/2
***************proc_lib:init_p/5 '-' 113503 4720688 0 gen:do_call/4
***************proc_lib:init_p/5 '-' 97313 284704 0 prim_inet:recv0/3
========================================================================================

Using etops I see that the

Using etops I see that the pubsub module is processing a very large MsgQ. This is due to a bug we had in our client that was causing new subscriptions to be made on each launch of the client. We are in the process of getting those clients to desubscribe.

Question is why is CPU fixed at around 1 full CPU? This doesnt seem to go up or down much. How can I get it to process this queue faster?

========================================================================================
'ejabberd@prod-xmpp-barker0.prod.djrd.dowjones.net' 13:14:25
Load: cpu 117 Memory: total 2338890 binary 24916
procs 2155 processes 1822205 code 15571
runq 1 atom 549 ets 441591

Pid Name or Initial Func Time Reds Memory MsgQ Current Function
----------------------------------------------------------------------------------------
<5591.245.0> proc_lib:init_p/5 '-' 1860068 29872 0 dets:open_file_loop2
<5591.501.0> ejabberd_mod_pubsub_ '-' 51033022445928 0 gen_server:loop/6
***************proc_lib:init_p/5 '-' 219408 460048 0 prim_inet:recv0/3
***************proc_lib:init_p/5 '-' 143353 4119712 0 prim_inet:recv0/3
***************proc_lib:init_p/5 '-' 143053 689576 0 prim_inet:recv0/3
***************proc_lib:init_p/5 '-' 133426 514232 0 prim_inet:recv0/3
***************proc_lib:init_p/5 '-' 128552 426560 0 prim_inet:recv0/3
<5591.511.0> ejabberd_mod_pubsub_ '-' 127139******** 62378 dets:req/2
***************proc_lib:init_p/5 '-' 113503 4720688 0 gen:do_call/4
***************proc_lib:init_p/5 '-' 97313 284704 0 prim_inet:recv0/3
========================================================================================

Syndicate content