CPU Usage Issue with Ejabberd 2.1.8/Erlang R13B04

Ejabberd 2.1.8
Erlang R13B04

I've compiled Ejabberd 2.1.8 with Erlang R13B04 on a CentOS 5 system using the package found at http://www.process-one.net/downloads/ejabberd/2.1.8/ejabberd-2.1.8.tar.gz and ran a few tests using a python client (xmpppy-0.5.0rc1); every single time the CPU usage jumps to just about 100% total until the beam process dies.

I was also thinking about using Tsung for benchmarking but that seems counter productive since I can't test as many as 100 users without killing the beam process when using xmpppy.

I've searched/read about high CPU usage and possible causes, and looked into a few different solutions yet nothing seems to cut the usage down.

Things I've tried that didn't seem to make a difference:
- compiling Ejabberd with different versions of Erlang (R12B-5, R13B04, and R14B03)
- using both internal authentication and odbc (recompiled with the odbc option for the latter)
- using bare-minimum configuration to cut down possible unnecessary overhead

I was also thinking this may be related to the error stanza issue documented at https://support.process-one.net/browse/EJAB-930, but the fix is marked for Ejabberd version 2.1.4; I looked at the source code to double check and, sure enough, the fix is included.

Here's the ejabberd.cfg file I'm using (masked domain/IP information): http://www.sourcepod.com/hedlvp99-5494

Also, here's the ejabberdctl.cfg file: http://www.sourcepod.com/mnittq44-5495

Any insight on what may be happening would be greatly appreciated.

Thanks.

You wrote a script that uses

You wrote a script that uses xmpppy to login 100 sessions into your ejabberd. Then ejabberd consumes all CPU, and crashes. That crash is strange.

What do you users do while connected? It is different if they are idle, or if they send one million messages per second each one.

Does it give a crash dump? It may have some additional clue about why it crashes.

Can you share your stress tool? Other people may try to reproduce the problem and investigate on their own.

Do you get such strange crashes with Jabsimul? http://www.ejabberd.im/benchmark

We thought it was strange

We thought it was strange too.

Here's a version of the python client we used to run the connection test (sorry if the code is messy, we changed it back and forth a few times): http://www.sourcepod.com/lzinnd38-5502

In short, the client is supposed to connect 100 users into a predefined test room on the ejabberd server, and have each user send one message then sleep until we kill the script. It was supposed to be a simple test to verify the python client can connect to the server, but the CPU usage blew us away and we're still scratching our heads.

Jabsimul was the first thing we ran against ejabberd, actually... we got it to connect but it didn't seem like any messages were being exchanged. Since we weren't ready for benchmarking yet, we moved on to the python script to verify its usability and that's where we are now.

I fear that the code in the python client isn't completely ejabberd compliant. If that's the case, we can jump back to Jabsimul or Tsung to verify some basic benchmarks, but eventually we will have to go back to the client and figure out why (if at all) it was causing problems. That said, any insight or suggestions regarding the client would also be greatly appreciated.

Thanks.

Edit: This thread (http://www.ejabberd.im/node/3985) talks about adding max_fsm_queue to the server configuration. I'll give that a try and post back with the results.

Syndicate content