scalable xmpp bots representing large numbers of virtual users

I'm looking at using XMPP/ejabberd as a notification routing system for a large scale service. We have a particular data service P on the back-end that acts as a event generating bot for each user bound to it. I'm new to XMPP/ejabberd so I'm looking for any thoughts or suggestions about the suitability of XMPP/ejabberd to this type of scenario, and on the viability of various approaches.

Here's the basic scenario:

  • The user has some data set stored in the service
  • Another service S wants to be notified when certain changes happen in a user's data set
  • S sends a message to P describing what it want to be notified of
  • P periodically sends S notifications

The scenarios is somewhat like pubsub except:

  • I have no requirement that messages be routed if either S or P happens to be offline
  • Messages don't need to be persisted after they've been received
  • There are ACL's on the user's data set, so not all notifications regarding a given type of data should be sent to all subscribers to the same resource. For example, S1 and S2 want to know when a new picture is added to a users data set. S1 is allowed to see private pictures, S2 is not. S2 being allowed to see a given piece of data is user controlled, and independent of S2's notion of what the data looks like (i.e., S2 doesn't ask for public pictures, it simply asks for pictures). Is there any notion of filtering in pubsub?

I'm looking for any thoughts or suggestions about the suitability of XMPP/ejabberd to this type of scenario.

I've thought of the following approaches:

  • let each P be a separate client instance. On a given node, 10000+ users may be represented, so this sounds undesirable. Particularly as the user bases grows into the millions. Each P will exist and be online all the time (i.e., they're services, not actual users), plus the consumers will be online and connected all the time. Couple this with the actual volume of online users at any given moment, and the number of concurrent connections exceeds any of the upper bound numbers people have thrown out for ejabberd installations
  • use s2s connections, and let each node act as the server for the identities bound to that node. I'm a bit fuzzy on how the routing works. It sounds like I would have to represent each user as a separate domain (which I'm perfectly happy to do, that's how they're represented in the rest of the system). But can I dynamically add/remove domains at run-time from a server node/cluster? This approach seems promising. I would have a bot session representing each users data set living in my server instance and wouldn't accept "real" client connections on these nodes. Clients and these s2s nodes would all connect to a main ejabberd cluster for routing.
  • as a variation on the previous idea, integrate the bot nodes directly into the main cluster and represent each bot as a user/resource within the domain (instead of a seperate domain). This would seem to have all the benifits of the previous approach, but broaden the distribution of my erlang cluster cookies to more server nodes.

Thanks for taking the time to read this long post. I look forward to your comments and suggestions

Roger

Syndicate content