Events System Configuration
Consult events-template.xml in GIT to get some examples and hints.
Use this configuration to define the type of Event System you want.
James relies on an event system. Each operations performed on the mailbox will trigger related events. Some software
components (MailboxListeners) can register themselves on this event system to be called when an event is fired.
Here are typical use cases for Mailbox Listeners (non exhaustive list) :
- Message search indexation, for instance in Lucene or ElasticSearch
- Local cache invalidation (caching mailbox project)
- Quota calculation
- IMAP IDLE feature : live notification of actions performed on a mailbox, allowing publish subscribe on mailboxes events
- Message Sequence Number consistence
The Mailbox Listeners can be classified in two categories :
- Mailbox registered : The mailbox listener is only notified on events affecting this mailbox. IDLE is a good example of this.
- Global Listeners : This event listener is triggered upon each events.
Note that Global Listeners can also be classified in two categories :
- Those which needs to be triggered only once in your cluster. For instance ElasticSearch indexing is an example of this.
- Those which needs to be triggered on each servers. For instance, each Lucene indexer needs to be triggered on each server
for the search feature to stay consistent.
The default implementation is a synchronous in memory event system. The performance are really good, as their is no need to serialize
events, and no network overhead. However, this event system is limited to one computer and you might want a distributed systems.
Other implementations, distributed environment friendly are available.
The simplest one is broadcast based. Each James servers listen the same message queue, and each James server will be notified upon events.
Here are the pros and cons of this implementations :
Pros:
- It supports every type of listener described above
- It allows you to scale your James infrastructure without changing your middlewares. You just need a message queue
Cons :
- Your scalability is limited as each servers is notified on all events
- Network overhead on event transmissions
- Event serialization and deserialization
To use this implementation, you need two other components (that will be discussed) : a publishing system and an event serializer
The other mode is based on registrations.
Each server reads messages from a dedicated message queue, and other servers send messages addressed to this sever on this message queue.
Registrations are performed on an eternal data-store supporting document deletion after a fixed amount of time.
These registrations are periodically refreshed. This data-store is then triggered on event generation, that, if needed are serialized and
send to the given queues.
The pros and cons of this implementations are :
Pros :
- Linear scalability
- A server receives only events concerning him
Cons :
- Possible event serialization costs
- Registration and registration refresh costs
- Need to find interested servers on event delivery
- Network overhead on event transmissions
Failure modes
Default implementation :
- The default implementation might not deliver some events on server stop.
Broadcast implementation :
- The broadcast implementation might not deliver some events on server stop.
- The broadcast implementation is tight to limitation of the underlying publisher.
Registered implementation :
- The registered implementation might not deliver some events on server stop.
- The registered implementation is tight to limitation of the underlying publisher, and underlying registration system.
Publisher
Available implementation is Kafka based. Kafka ensure at least one delivery. This means some messages might be
delivered two times. You need to compile and run James using Java 8 in order to use the Kafka messaging system.
Event serializer
There are two types of event serialization systems :
- Json : events are converted to JSON
- Message Pack : a binary representation of JSON. 2 times smaller in average but two times longer to compute. It allows you to trade
bandwidth and data readability against CPU time.
Registration systems
Available implementation is based on Cassandra. It is used on an AP fashion, enforcing availability instead of consistency. Some
messages might get delivered to no more registered servers. This is just extra work. Worst, messages might not be delivered to
recently registered servers. But we make sure that we have the more up to date version of the registrations we can, and will not time out in
the face of network partitions, nor enforce some default behaviour.