Crash dumps from munin plugins

Hello all,

does anybody run Munin and the ejabberd plugins for it?

I set this up some time ago and it works - well at least some of the features of the actual plugin from the Debian packages works.

Munin produces nice graphs, see here: http://www3.hot-chilli.net/munin/hot-chilli.net/hyperion.hot-chilli.net/... , so I did not dig in further.

After migrating our Jabber server to a new machine this night I stumbled upon the logfiles of Munin. This here I see in munin-node.log every five minutes:

2010/05/05-19:00:11 [20807] Error output from ejabberd_registered:
2010/05/05-19:00:11 [20807]
2010/05/05-19:00:11 [20807] Crash dump was written to: /var/log/ejabberd/erl_crash.dump
2010/05/05-19:00:11 [20807] Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
2010/05/05-19:00:12 [21024] Error output from ejabberd_uptime:
2010/05/05-19:00:12 [21024]
2010/05/05-19:00:12 [21024] Crash dump was written to: /var/log/ejabberd/erl_crash.dump
2010/05/05-19:00:12 [21024] Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: illegal character: ^M
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: syntax error
2010/05/05-19:00:12 [21024] (standard_in) 1: illegal character: '
2010/05/05-19:00:12 [21024] (standard_in) 1: illegal character: '
2010/05/05-19:05:12 [24886] Error output from ejabberd:
2010/05/05-19:05:12 [24886]
2010/05/05-19:05:12 [24886] Crash dump was written to: /var/log/ejabberd/erl_crash.dump
2010/05/05-19:05:12 [24886] Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
2010/05/05-19:05:13 [24910] Error output from ejabberd_registered:
2010/05/05-19:05:13 [24910]
2010/05/05-19:05:13 [24910] Crash dump was written to: /var/log/ejabberd/erl_crash.dump
2010/05/05-19:05:13 [24910] Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
2010/05/05-19:05:13 [24910] Crash dump was written to: /var/log/ejabberd/erl_crash.dump
2010/05/05-19:05:13 [24910] Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
2010/05/05-19:05:13 [24910] Crash dump was written to: /var/log/ejabberd/erl_crash.dump
2010/05/05-19:05:13 [24910] Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})

The interesting thing is that I see different things every five minutes. It seems that something is going wrong but not produces the same outcome every time it happens.

Plus: The plugins create graphs which show accurate data.

I cannot read the crash (which btw is ~250kb big), can somebody help?

And does this affect the ejabberd thread in any way?

I think that 700-800 megs of RAM for 100 online users is way too much, isn't it?

Kindest regards,
Martin

Roi wrote: I cannot read the

Roi wrote:

I cannot read the crash (which btw is ~250kb big), can somebody help?

The first 4 or 5 lines are plain text, and may be useful to determine what's the problem. Post them here.

Roi wrote:

I think that 700-800 megs of RAM for 100 online users is way too much, isn't it?

In a server that runs ejabberd 2.1.3 for typical Jabber chats, it takes 800 MB for 700 online users, and 200 s2s connections.

still crashes after upgrade to 2.1.3

Sorry it took me some time to reply, just did not have any time the last days.

Roi wrote:

I cannot read the crash (which btw is ~250kb big), can somebody help?

badlop wrote:

The first 4 or 5 lines are plain text, and may be useful to determine what's the problem. Post them here.

I just upgraded to ejabberd 2.1.3 (from 2.1.2, Debian package finally is available), the same crashes still happen.

Here you are, the first some lines of the erl_crash.dump file:

=erl_crash_dump:0.1
Sat May 8 14:25:12 2010
Slogan: Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})
System version: Erlang R13B04 (erts-5.7.5) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:0] [kernel-poll:false]
Compiled: Thu Apr 22 19:57:18 2010
Taints:
Atoms: 3957
=memory
total: 5005776
processes: 483880
processes_used: 469400
system: 4521896
atom: 284097
atom_used: 256048
binary: 166312
code: 2089687
ets: 59944
=hash_table:atom_tab
size: 3203
used: 2275
objs: 3957
depth: 6
=index_table:atom_tab
size: 4096
limit: 1048576
entries: 3957
=hash_table:module_code
size: 47
used: 32
objs: 45
depth: 3

Roi wrote:

I think that 700-800 megs of RAM for 100 online users is way too much, isn't it?

badlop wrote:

In a server that runs ejabberd 2.1.3 for typical Jabber chats, it takes 800 MB for 700 online users, and 200 s2s connections.

Hm well, then there has to be some problem or strange thing going on, see here:
http://www3.hot-chilli.net/munin/hot-chilli.net/hyperion.hot-chilli.net/...
http://www3.hot-chilli.net/munin/hot-chilli.net/hyperion.hot-chilli.net/...

I just restarted ejabberd at 2pm when upgrading it to 2.1.3, but it then directly grabbed 700 megs. Now climbing up again to 1 gig and staying there. At least it looks so. We have between 50 and 120 online users and between 30 and 170 s2s connections at the moment. The machine is a fresh installed Debian amd64 SMP server, running just official Debian (squeeze and unstable) plus some backport packages.

Regards,
Martin

How to get error messages about erlang starting problem

Oh, those first lines aren't indicative enough.

If you still have this problem, there's a method to get more information.

Try this:

  1. Stop ejabberd, kill beam, beam.smp, epmd, ... processes
  2. Start ejabberd by using the "live" method. In my case it's "ejabberdctl live"
  3. This will show you on the console all the log messages.
  4. Most importantly, it may provide an initial error message in cases where erlang doesn't fully start (like in your case).
  5. The error that most probably you may get is this one:
    {error_logger,{{2010,5,10},{23,8,50}},"Protocol: ~p: register error: ~p~n",["inet_tcp",{{badmatch,{error,duplicate_name}

So, does that show anything new?

Regarding the RAM usage: I forgot to mention that my server is 32bits, not 64bits. And that it doesn't run any ICQ/MSN/... transport. That way there are less roster items on average. Also, I configured (using ejabberd WebAdmin, for example) some tables to be stored on Disk only, not RAM.

Yes, the problem is still

Yes, the problem is still there. I will do what you suggested, but as this is a live server, I have to do this sometimes during the night. So this could take some days until I return with the results.

What you say about RAM allocation makes sense for me. We have a lot of transports running and as much or even more users on the transport than logged into the Jabberserver itself.

Does it make sense to store some of the tables on the disk? It's not that the machine cannot handle more (it has 8gigs of RAM), but I'm afraid the task takes more and more, as longer as it is running. Although it looks like it is satisfied with about 1.1gigs of RAM. But to be sure I really have to run it some days and not restart it all the time because of configuration changes.

Syndicate content