This page last changed on Oct 03, 2008 by aidan.

Broker memory usage

It is possible for the broker to recieve frames at a rate faster than it can process. When this occurs a large number of Jobs and Events are produced. This can further slow down the system by increasing memory usage, causing the GC to to run frequently and generally compound the issue. This is undesireable.

High level solution

Ultimately, the broker needs to decide to cease creating new jobs until those that already exist have been processed. The broker will stop reading frames from the network layer. The servers network buffer will fill, and for OS will cease to read the socket as TCP flow control kicks in. The corresponding client side buffer to fill, and then writes to it will block.

When memory usage falls as the events are processed, the broker will start to process frames again, and normal operation will resume. However, if the broker does not recover sufficently quickly it is possible that the socket will time out and the connection will be closed.

Required changes

  1. To completely implement the above solution, a number of changes are required.
  2. The broker needs to be able to determine currently used / available memory. This can be obtained via JMX.
  3. The threads which process Jobs and Events need to be signalled to pause, and to resume.
  4. Protectio in the MINA layer of both the client and the broker needs to be enabled by default.
  5. When a memory threshold is reached, the broker should fire an event which signals the job processing threads to pause. In future this event should be listened for by other mechanisms designed to mitigate the issue - such as flow to disk.

A few thoughts/questions ...

So, if a client publishing to one queue triggers the 'stop' threshold will all clients publishing to all queues get 'stopped' ? All virtual hosts too I guess.

What about clients mid-transaction - do we rollback the whole thing, allow it to timeout (with the same result) ?

If all publishing clients get stopped, we need to be mindful of the heartbeating solutions some of our users have put in to detect app problems. I'm not sure what the solution might be, but certainly logging & alerting so that they can detect whats going on when the thresholds are reached.

We should follow the same model for queue threshold triggers to i.e. block the publication until the threshold clears. Same problem, only a microcosm ... but simpler

Need to analyse/document config for queue thresholds on a broker for our users i.e. give them useful advice about what can helpfully be done to avoid the big block (all publishers blocked) scenario.

A Low Level Solution section would be good here, outlining what the work implied is on the classes/layers on the broker and also discussing the protectio work done as it applies. Diagrams would be ideal about the flow from a logical perspective, particularly.

I think some of my feedback on client side flow control might apply here too (see pervious thread on qpid-dev with links) Client side behaviour would be ? client blocks until buffers free then sends as normal when .... Would transaction timeout ?

Bit dis-ordered, apologies

Posted by mmccorma at Oct 06, 2008 01:40

All clients on all virtual hosts will have their sockets suspended, yeah. Clients in the middle of a transaction would not be explictly rolled back. If the broker did not recover in time the socket would be closed and thre transaction would be aborted.

We need to log a WARN when this kicks in I think. Probably a CRITICAL if we don't unsuspend in 30 seconds (when sockets will start timing out).

From the clients perspective, calls to send() or commit() would block until the broker rights itself or the connection times out and an exception would be thrown if it couldn't fail over.

Posted by aidan at Oct 06, 2008 07:15

Have you seen what the C++ broker does, some of it is here http://cwiki.apache.org/qpid/cheat-sheet-for-configuring-queue-options.html

In addition to this, it has a process memory size checker.

I have also been hatching a plan to use broker issued credit, and add some policies that allow you to grant credit based on rate for example. Such policies could also be expanded to be applied to the constraints already existing in the link above.

On the TXN question, if a REJECT is issued, then what to do with the txn is up to the client.

Posted by cctrieloff at Oct 06, 2008 07:50

One other thing. if reject is configured then it will be the same on 0-8/9 and 0-10 but if reject is not issued, then on 0-10 we would revoke the credit and NOT block the thread.

How would you do this in 0-8/9? Is it even possible without hogging threads?

Carl.

Posted by cctrieloff at Oct 06, 2008 08:26

OK - some comments from me:

"It is possible for the broker to recieve frames at a rate faster than it can process. When this occurs a large number of Jobs and Events are produced. This can further slow down the system by increasing memory usage, causing the GC to to run frequently and generally compound the issue. This is undesireable"

the maximum number of jobs is equal to the number of currently active connections, it's therefore pretty useless to try to trigger anything off the number of jobs that are active (I know, I've tried, before slapping myself on the forehead and realising why it's a pointless thing to do).

1. To completely implement the above solution, a number of changes are required.

2. The broker needs to be able to determine currently used / available memory. This can be obtained via JMX.

I think that it may be easier to instead just work off approximations/upper bounds on the amount of memory consumed. We can account for message sizes as they come in and should be able to use approximations for memory use per connection / subscription / delivery to a queue / etc... In the first stage we can just use an arbitrary hard limit per queue based on underlying message size.

3. The threads which process Jobs and Events need to be signalled to pause, and to resume.

Pausing the threads isn't what you want to do, since this will make the broker lock up (these are threads from a threadpool... if they just pause then they can't be used to perform actions which will reduce the number of messages in the broker). What you really need to do is "suspend" the inbound socket connection. Possibly linking this to some sort of message level flow control so you can unsuspend to receive acks.

4. Protectio in the MINA layer of both the client and the broker needs to be enabled by default.

First it needs to work. When enabled previously we saw errors I believe.

5. When a memory threshold is reached, the broker should fire an event which signals the job processing threads to pause. In future this event should be listened for by other mechanisms designed to mitigate the issue - such as flow to disk.

As above - thread "pausing" is not the answer. What i suggest is that queues have threshold limits when these are triggered then you would expect the queue to move into a "flow-to-disk" mode.

Fundamentally what we are trying to achieve is to bound all our unbounded buffers.

Theoretically we can have buffers at the following points in our code (i.e. discounting TCP stack buffers):

i) Undecoded bytes read from the wire
ii) Decoded but unprocessed frames
iii) Messages on queues
iv) Unencoded Frames to be sent to clients
v) Encoded bytes to be sent on the wire

From the work that Rafi and I did previously we looked to replace i) and v) with fixed size byte buffers, and we removed all buffers at points ii) and iv).

For fixing the queue size problem you need to take action at a higher level than the i/o layer.

Having said all that, if we can get protectio reliably working then that is a good first step.

Posted by godfrer at Oct 07, 2008 06:29

I completely agree with everything you write Rob. For points ii and iv, how hard was that to do? On first glance it seems nice to provide some buffering for events but it just duplicates what the TCP kernel buffer does so I think it is in fact pointless.

Posted by rgreig at Nov 11, 2008 04:22

Occurred to me that unsuspending in some timely fashion would def be impacted by flow to/from disk and cache reload. Need to consider how this should work with ftd.

Posted by mmccorma at Dec 02, 2008 13:12
Document generated by Confluence on May 26, 2010 10:33