Apache
Home » Documentation » Bundles

Output Rewriting Pipelines (org.apache.sling.rewriter)

The Apache Sling Rewriter is a module for rewriting the output generated by a usual Sling rendering process. Some possible use cases include rewriting or checking all links in an HTML page, manipulating the HTML page, or using the generated output as the base for further transformation. An example of further transformation is to use XSLT to transform rendered XML to some output format like HTML or XSL:FO for generating PDF.

For supporting these use cases, the rewriter uses the concept for a processor. The processor is a component that is injected through a servlet filter into the response. By implementing the Processor interface one is able to rewrite the whole response in one go. A more convenient way of processing the output is by using a so called pipeline; the Apache Sling rewriter basically uses the same concept as the famous Apache Cocoon: an XML based pipeline for further post processing of the output. The pipeline is based on SAX events.

SAX Pipelines

The rewriter allows to configure a pipeline for post processing of the generated response. Depending on how the pipeline is assembled the rewriting process might buffer the whole output in order to do proper post processing - for example this is required if an HTML response is "transformed" to XHTML or if XSLT is used to process the response.

As the pipeline is based on SAX events, there needs to be a component that generates these events and sends them through the pipeline. By default the Sling rendering scripts write to an output stream, so there is a need to parse this output and generate the SAX events.

The first component in the pipeline generating the initial SAX events is called a generator. The generator gets the output from Sling, generates SAX events (XML), and streams these events into the pipeline. The counterpart of the generator is the serializer which builds the end of the pipeline. The serializer collects all incomming SAX events, transforms them into the required response by writing into output stream of the response.

Between the generator and the serializer so called transformers can be placed in a chain. A transformer receives SAX events from the previous component in the pipeline and sends SAX events to the next component in the pipeline. A transformer can remove events, change events, add events or just pass on the events.

Sling contains a default pipeline which is executed for all HTML responses: it starts with an HTML generator, parsing the HTML output and sending events into the pipeline. An HTML serializer collects all events and serializes the output.

The pipelines can be configured in the repository as a child node of /apps/APPNAME/config/rewriter (or /libs/APPNAME/config/rewriter). (In fact the configured search paths of the resource resolver are observed.) Each node can have the following properties:

As you can see from the configuration there are several possibilities to define when a pipeline should be used for a response, like paths, extensions, content types, or resource types. It is possible to specify several of them at once. In this case all conditions must be met.

If a component needs a configuration, the configuration is stored in a child node which name is {componentType}-{name}, e.g. to configure the HTML generator (named html-generator), the node should have the name generator-html-generator. In the case that the pipeline contains the same transformer several times, the configuration child node should have the formant {componentType}-{index} where index is the index of the transformer starting with 1. For example if you have a pipeline with the following transformers, xslt, html-cleaner, xslt, link-checker, then the configuration nodes should be named transformer-1 (for the first xslt), transformer-html-cleaner, transformer-3 (for the second xslt), and transformer-link-checker.

Default Pipeline

The default pipeline is configured for the text/html mime type and the html extensions and consists of the html-generator as the generator, and the html-serializer for generating the final response. As the HTML generated by Sling is not required to be valid XHTML, the HTML parser is using an HTML parser to create valid SAX events. In order to perform this, the generator needs to buffer the whole response first.

Implementing Pipeline Components

Each pipeline component type has a corresponding Java interface (Generator, Transformer, and Serializer) together with a factory interface (GeneratorFactory, TransformerFactory, and SerializerFactory). When implementing such a component, both interfaces need to be implemented. The factory has only one method which creates a new instance of that type for the current request. The factory has to be registered as a service. For example if you're using the Maven SCR plugin, it looks like this:

@scr.component metatype="no" 
@scr.service interface="TransformerFactory"
@scr.property value="pipeline.type" value="validator"

The factory needs to implement the according interface and should be registered as a service for this factory interface (this is a plain service and not a factory service in the OSGi sense). Each factory gets a unique name through the pipeline.type property. The pipeline configuration in the repository just references this unique name (like validator).

Extending the Pipeline

With the possibilities from above, it is possible to define new pipelines and add custom components to the pipeline. However, in some cases it is required to just add a custom transformer to the existing pipeline. Therefore the rewriting can be configured with pre and post transformers that are simply added to each configured pipeline. This allows a more flexible way of customizing the pipeline without changing/adding a configuration in the repository.

The approach here is nearly the same. A transformer factory needs to be implemented, but instead of giving this factory a unique name, this factory is marked as a global factory:

@scr.component metatype="no"
@scr.service interface="TransformerFactory"
@scr.property name="pipeline.mode" value="global"
@scr.property name="service.ranking" value="RANKING" type="Integer"

RANKING is an integer value (don't forget the type attribute otherwise the ranking is interpreted as zero!) specifying where to add the transformer in the pipeline. If the value is less than zero the transformer is added at the beginning of the pipeline right after the generator. If the ranking is equal or higher as zero, the transformer is added at the end of the pipeline before the serializer.

The TransformerFactory interface has just one method which returns a new transformer instance. If you plan to use other services in your transformer you might declare the references on the factory and pass in the instances into the newly created transformer.

Since the transformer carries information about the current response it is not advisable to reuse the same transformer instance among multiple calls of TransformerFactory.createTransformer.

Implementing a Processor

A processor must conform to the Java interface org.apache.sling.rewriter.Processor. It gets initializd (method init) with the ProcessingContext. This context contains all necessary information for the current request (especially the output writer to write the rewritten content to). The getWriter method should return a writer where the output is written to. When the output is written or an error occured finished is called.

Like the pipeline components a processor is generated by a factory which has to be registered as a service factory, like this:

@scr.component metatype="no" 
@scr.service interface="ProcessorFactory"
@scr.property value="pipeline.type" value="uniqueName"

Configuring a Processor

The processors can be configured in the repository as a child node of /apps/APPNAME/config/rewriter (or libs or any configured search path). Each node can have the following properties:

Rev. 1731032 by olli on Thu, 18 Feb 2016 09:47:50 +0000
Apache Sling, Sling, Apache, the Apache feather logo, and the Apache Sling project logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.