Consider prefetching messages for receivers (see Section 2.6, “Receiver Capacity (Prefetch)”). This helps eliminate roundtrips and increases throughput. Prefetch is disabled by default, and enabling it is the most effective means of improving throughput of received messages.
Send messages asynchronously. Again, this helps eliminate roundtrips and increases throughput. The C++ and .NET clients send asynchronously by default, however the python client defaults to synchronous sends.
Acknowledge messages in batches (see Section 2.7, “Acknowledging Received Messages”). Rather than acknowledging each message individually, consider issuing acknowledgements after n messages and/or after a particular duration has elapsed.
Tune the sender capacity (see Section 2.5, “Sender Capacity and Replay”). If the capacity is too low the sender may block waiting for the broker to confirm receipt of messages, before it can free up more capacity.
If you are setting a reply-to address on messages being sent by the c++ client, make sure the address type is set to either queue or topic as appropriate. This avoids the client having to determine which type of node is being refered to, which is required when hanling reply-to in AMQP 0-10.
For latency sensitive applications, setting tcp-nodelay on qpidd and on client connections can help reduce the latency.