Apache™ FOP Design: Optimisations
Introduction
Apache™ FOP should be able to handle very large documents. A document can be supplied using SAX and the information should be passed entirely through the system, from fo elements to rendered output as soon as possible.
A top level block area, immediately below the flow, can be added to the page layout as soon as the element is complete.
The fo elements used to construct a page can be discarded as soon as the layout for the page is complete. Some information may be stored in the area tree of the page in order to handle unresolved page references and links.
Once the layout of a page has been completed, all elements are fully resolved, then the page can be rendered. Some renderers may support out of order rendering of pages.
The main problem that will remain is that any page with forward references will need to be stored until the refence is resolved. This means that the information contained in the page should be as minimal as possible.
Line areas can be optimised once the layout for the line has been finalised. Consecutive characters with the same properties can be combined into a "word" to hold the information with limited overhead.
If there are a large number of pages where forward references cannot be resolved the a method of writing a page onto disk could be used to save memory. The easiest way to achieve this is to make the page and all children serializable.
by Keiron Liddle
version 1298724