- Don't believe all of the so called "performance-tips" on the net. If you think you have found an optimization try it out in the real implementation and not in a simulation. Let the implementation run at minimum 10 times, calculate the average time, do the same with the not optimized implementation and compare. - Keep in mind that there are different VM vendors and that they can behave differently. - The java runtime optimizer seems to do an excellent job. So coding style is more important than micro-level optimizations. - Prevent as far as possible to create an iterator to iterate over an ArrayList. - Prevent unnecessary instantiation of List's, Map's and such. Collections.emptyList() should be used when we now beforehand that the List will be empty. - switch instead of if..else (e.g. switch(XMLEvent.getEventType()) { case XMLStreamConstants.START_ELEMENT: //do something with the start element break; case XMLStreamConstants.END_ELEMENT: //do something with the end element break; } - Dynamic CPU frequency-scaling (ondemand) has a big impact (it doubles the processing time in the worst case) to the streaming wss performance. An exact explanation for this is outstanding. I guess it has something to-do with the pipe in the decryptionProcessor and its decryptionThread. - Newer Intel-CPU's (Core i) are going to overclock itself (turbo mode) when just one core is in use. This is an advantage for WSS4J-DOM but swssf can't take profit of it since we have two threads for decryption.