Apache Camel: Splitter

Splitter

The Splitter from the EIP patterns allows you split a message into a number of pieces and process them individually

You need to specify a Splitter as split(). In earlier versions of Camel, you need to use splitter().

Options

confluenceTableSmall

Name	Default Value	Description
`strategyRef`		Refers to an AggregationStrategy to be used to assemble the replies from the sub-messages, into a single outgoing message from the Splitter. See the defaults described below in What the Splitter returns. From Camel 2.12 onwards you can also use a POJO as the `AggregationStrategy`, see the Aggregate page for more details. If an exception is thrown from the aggregate method in the AggregationStrategy, then by default, that exception is not handled by the error handler. The error handler can be enabled to react if enabling the shareUnitOfWork option.
`strategyMethodName`		Camel 2.12: This option can be used to explicit declare the method name to use, when using POJOs as the `AggregationStrategy`. See the Aggregate page for more details.
`strategyMethodAllowNull`	`false`	Camel 2.12: If this option is `false` then the aggregate method is not used for the very first splitted message. If this option is `true` then `null` values is used as the `oldExchange` (for the very first message splitted), when using POJOs as the `AggregationStrategy`. See the Aggregate page for more details.
`parallelProcessing`	`false`	If enabled then processing the sub-messages occurs concurrently. Note the caller thread will still wait until all sub-messages has been fully processed, before it continues.
`parallelAggregate`	`false`	Camel 2.14: If enabled then the `aggregate` method on `AggregationStrategy` can be called concurrently. Notice that this would require the implementation of `AggregationStrategy` to be implemented as thread-safe. By default this is `false` meaning that Camel synchronizes the call to the `aggregate` method. Though in some use-cases this can be used to achieve higher performance when the `AggregationStrategy` is implemented as thread-safe.
`executorServiceRef`		Refers to a custom Thread Pool to be used for parallel processing. Notice if you set this option, then parallel processing is automatically implied, and you do not have to enable that option as well.
`stopOnException`	`false`	Camel 2.2: Whether or not to stop continue processing immediately when an exception occurred. If disable, then Camel continue splitting and process the sub-messages regardless if one of them failed. You can deal with exceptions in the AggregationStrategy class where you have full control how to handle that.
`streaming`	`false`	If enabled then Camel will split in a streaming fashion, which means it will split the input message in chunks. This reduces the memory overhead. For example if you split big messages its recommended to enable streaming. If streaming is enabled then the sub-message replies will be aggregated out-of-order, eg in the order they come back. If disabled, Camel will process sub-message replies in the same order as they where splitted.
`timeout`		Camel 2.5: Sets a total timeout specified in millis. If the Recipient List hasn't been able to split and process all replies within the given timeframe, then the timeout triggers and the Splitter breaks out and continues. Notice if you provide a TimeoutAwareAggregationStrategy then the `timeout` method is invoked before breaking out. If the timeout is reached with running tasks still remaining, certain tasks for which it is difficult for Camel to shut down in a graceful manner may continue to run. So use this option with a bit of care. We may be able to improve this functionality in future Camel releases.
`onPrepareRef`		Camel 2.8: Refers to a custom Processor to prepare the sub-message of the Exchange, before its processed. This allows you to do any custom logic, such as deep-cloning the message payload if that's needed etc.
`shareUnitOfWork`	`false`	Camel 2.8: Whether the unit of work should be shared. See further below for more details.

Exchange properties

The following properties are set on each Exchange that are split:

property	type	description
`CamelSplitIndex`	int	A split counter that increases for each Exchange being split. The counter starts from 0.
`CamelSplitSize`	int	The total number of Exchanges that was splitted. This header is not applied for stream based splitting. From Camel 2.9 onwards this header is also set in stream based splitting, but only on the completed Exchange.
`CamelSplitComplete`	boolean	Camel 2.4: Whether or not this Exchange is the last.

Examples

The following example shows how to take a request from the direct:a endpoint the split it into pieces using an Expression, then forward each piece to direct:b

Using the Fluent Builders{snippet:id=splitter|lang=java|url=camel/trunk/camel-core/src/test/java/org/apache/camel/builder/RouteBuilderTest.java}The splitter can use any Expression language so you could use any of the Languages Supported such as XPath, XQuery, SQL or one of the Scripting Languages to perform the split. e.g.

from("activemq:my.queue").split(xpath("//foo/bar")).convertBodyTo(String.class).to("file://some/directory")

Using the Spring XML Extensions{snippet:id=example|lang=xml|url=camel/trunk/components/camel-spring/src/test/resources/org/apache/camel/spring/xml/buildSplitter.xml}For further examples of this pattern in use you could look at one of the junit test case

Splitting a Collection, Iterator or Array

A common use case is to split a Collection, Iterator or Array from the message. In the sample below we simply use an Expression to identify the value to split.

java

from("direct:splitUsingBody").split(body()).to("mock:result"); from("direct:splitUsingHeader").split(header("foo")).to("mock:result");

In Spring XML you can use the Simple language to identify the value to split.

xml

<split> <simple>${body}</simple> <to uri="mock:result"/> </split> <split> <simple>${header.foo}</simple> <to uri="mock:result"/> </split>

Using Tokenizer from Spring XML Extensions*

You can use the tokenizer expression in the Spring DSL to split bodies or headers using a token. This is a common use-case, so we provided a special tokenizer tag for this.
In the sample below we split the body using a @ as separator. You can of course use comma or space or even a regex pattern, also set regex=true.{snippet:id=e1|lang=xml|url=camel/trunk/components/camel-spring/src/test/resources/org/apache/camel/spring/processor/splitterTokenizerTest.xml}

What the Splitter returns

Camel 2.2 or older:
The Splitter will by default return the last splitted message.

Camel 2.3 and newer
The Splitter will by default return the original input message.

For all versions
You can override this by suppling your own strategy as an AggregationStrategy. There is a sample on this page (Split aggregate request/reply sample). Notice its the same strategy as the Aggregator supports. This Splitter can be viewed as having a build in light weight Aggregator.

Parallel execution of distinct 'parts'

If you want to execute all parts in parallel you can use special notation of split() with two arguments, where the second one is a boolean flag if processing should be parallel. e.g.

XPathBuilder xPathBuilder = new XPathBuilder("//foo/bar"); from("activemq:my.queue").split(xPathBuilder, true).to("activemq:my.parts");

The boolean option has been refactored into a builder method parallelProcessing so its easier to understand what the route does when we use a method instead of true|false.

XPathBuilder xPathBuilder = new XPathBuilder("//foo/bar"); from("activemq:my.queue").split(xPathBuilder).parallelProcessing().to("activemq:my.parts");

Stream based

Splitting big XML payloads

The XPath engine in Java and saxon will load the entire XML content into memory. And thus they are not well suited for very big XML payloads.
Instead you can use a custom Expression which will iterate the XML payload in a streamed fashion. From Camel 2.9 onwards you can use the Tokenizer language
which supports this when you supply the start and end tokens. From Camel 2.14, you can use the XMLTokenizer language which is specifically provided for tokenizing XML documents.

You can split streams by enabling the streaming mode using the streaming builder method.

from("direct:streaming").split(body().tokenize(",")).streaming().to("activemq:my.parts");

You can also supply your custom splitter to use with streaming like this:

import static org.apache.camel.builder.ExpressionBuilder.beanExpression; from("direct:streaming") .split(beanExpression(new MyCustomIteratorFactory(), "iterator")) .streaming().to("activemq:my.parts")

Streaming big XML payloads using Tokenizer language

There are two tokenizers that can be used to tokenize an XML payload. The first tokenizer uses the same principle as in the text tokenizer to scan the XML payload and extract a sequence of tokens.

Available as of Camel 2.9
If you have a big XML payload, from a file source, and want to split it in streaming mode, then you can use the Tokenizer language with start/end tokens to do this with low memory footprint.

StAX component

The Camel StAX component can also be used to split big XML files in a streaming mode. See more details at StAX.

For example you may have a XML payload structured as follows

xml

Now to split this big file using XPath would cause the entire content to be loaded into memory. So instead we can use the Tokenizer language to do this as follows:

from("file:inbox") .split().tokenizeXML("order").streaming() .to("activemq:queue:order");

In XML DSL the route would be as follows:

xml

Notice the tokenizeXML method which will split the file using the tag name of the child node (more precisely speaking, the local name of the element without its namespace prefix if any), which mean it will grab the content between the <order> and </order> tags (incl. the tokens). So for example a splitted message would be as follows:

xml

If you want to inherit namespaces from a root/parent tag, then you can do this as well by providing the name of the root/parent tag:

xml

And in Java DSL its as follows:

from("file:inbox") .split().tokenizeXML("order", "orders").streaming() .to("activemq:queue:order");

Available as of Camel 2.13.1, you can set the above inheritNamsepaceTagName property to "*" to include the preceding context in each token (i.e., generating each token enclosed in its ancestor elements). It is noted that each token must share the same ancestor elements in this case.

The above tokenizer works well on simple structures but has some inherent limitations in handling more complex XML structures.

Available as of Camel 2.14

The second tokenizer uses a StAX parser to overcome these limitations. This tokenizer recognizes XML namespaces and also handles simple and complex XML structures more naturally and efficiently.

To split using this tokenizer at {urn:shop}order, we can write

Namespaces ns = new Namespaces("ns1", "urn:shop"); ... from("file:inbox") .split().xtokenize("//ns1:order", 'i', ns).streaming() .to("activemq:queue:order)

Two arguments control the behavior of the tokenizer. The first argument specifies the element using a path notation. This path notation uses a subset of xpath with wildcard support. The second argument represents the extraction mode. The available extraction modes are:

mode	description
i	injecting the contextual namespace bindings into the extracted token (default)
w	wrapping the extracted token in its ancestor context
u	unwrapping the extracted token to its child content
t	extracting the text content of the specified element

Having an input XML

xml

<m:orders xmlns:m="urn:shop" xmlns:cat="urn:shop:catalog"> <m:order><id>123</id><date>2014-02-25</date>...</m:order> ...

Each mode will result in the following tokens,

i	<m:order xmlns:m="urn:shop" xmlns:cat="urn:shop:catalog"><id>123</id><date>2014-02-25</date>...</m:order>
w	<m:orders xmlns:m="urn:shop" xmlns:cat="urn:shop:catalog"> <m:order><id>123</id><date>2014-02-25</date>...</m:order></m:orders>
u	<id>123</id><date>2014-02-25</date>...
t	1232014-02-25...

In XML DSL, the equivalent route would be written as follows:

xml

<camelContext xmlns:ns1="urn:shop"> <route> <from uri="file:inbox"/> <split streaming="true"> <xtokenize>//ns1:order</xtokenize> <to uri="activemq:queue:order"/> </split> </route> </camelContext>

or setting the extraction mode explicitly as

xml

... <xtokenize mode="i">//ns1:order</xtokenize> ...

Note that this StAX based tokenizer's uses StAX Location API and requires a StAX Reader implementation (e.g., woodstox) that correctly returns the offset position pointing to the beginning of each event triggering segment (e.g., the offset position of '<' at each start and end element event). If you use a StAX Reader which does not implement that API correctly it results in invalid xml snippets after the split. For example the snippet could be wrong terminated:

<Start>...<</Start> .... <Start>...</</Start>

Splitting files by grouping N lines together

Available as of Camel 2.10

The Tokenizer language has a new option group that allows you to group N parts together, for example to split big files into chunks of 1000 lines.

from("file:inbox") .split().tokenize("\n", 1000).streaming() .to("activemq:queue:order");

And in XML DSL

xml

The group option is a number that must be a positive number that dictates how many groups to combine together. Each part will be combined using the token.
So in the example above the message being sent to the activemq order queue, will contain 1000 lines, and each line separated by the token (which is a new line token).
The output when using the group option is always a java.lang.String type.

Specifying a custom aggregation strategy

This is specified similar to the Aggregator.

Specifying a custom ThreadPoolExecutor

You can customize the underlying ThreadPoolExecutor used in the parallel splitter. In the Java DSL try something like this:

Java

XPathBuilder xPathBuilder = new XPathBuilder("//foo/bar"); ExecutorService pool = ... from("activemq:my.queue") .split(xPathBuilder).executorService(pool) .to("activemq:my.parts");

Using a Pojo to do the splitting

As the Splitter can use any Expression to do the actual splitting we leverage this fact and use a method expression to invoke a Bean to get the splitted parts.
The Bean should return a value that is iterable such as: java.util.Collection, java.util.Iterator or an array.
So the returned value, will then be used by Camel at runtime, to split the message.

Streaming mode and using pojo

When you have enabled the streaming mode, then you should return a Iterator to ensure streamish fashion. For example if the message is a big file, then by using an iterator, that returns a piece of the file in chunks, in the next method of the Iterator ensures low memory footprint. This avoids the need for reading the entire content into memory. For an example see the source code for the TokenizePair implementation.

In the route we define the Expression as a method call to invoke our Bean that we have registered with the id mySplitterBean in the Registry.{snippet:id=e1|lang=java|url=camel/trunk/camel-core/src/test/java/org/apache/camel/processor/SplitterPojoTest.java}And the logic for our Bean is as simple as. Notice we use Camel Bean Binding to pass in the message body as a String object.{snippet:id=e2|lang=java|url=camel/trunk/camel-core/src/test/java/org/apache/camel/processor/SplitterPojoTest.java}

Split aggregate request/reply sample

This sample shows how you can split an Exchange, process each splitted message, aggregate and return a combined response to the original caller using request/reply.

The route below illustrates this and how the split supports a aggregationStrategy to hold the in progress processed messages:{snippet:id=e1|lang=java|url=camel/trunk/camel-core/src/test/java/org/apache/camel/processor/SplitAggregateInOutTest.java}And the OrderService bean is as follows:{snippet:id=e2|lang=java|url=camel/trunk/camel-core/src/test/java/org/apache/camel/processor/SplitAggregateInOutTest.java}And our custom aggregationStrategy that is responsible for holding the in progress aggregated message that after the splitter is ended will be sent to the buildCombinedResponse method for final processing before the combined response can be returned to the waiting caller.{snippet:id=e3|lang=java|url=camel/trunk/camel-core/src/test/java/org/apache/camel/processor/SplitAggregateInOutTest.java}So lets run the sample and see how it works.
We send an Exchange to the direct:start endpoint containing a IN body with the String value: A@B@C. The flow is:

HandleOrder: A HandleOrder: B Aggregate old orders: (id=1,item=A) Aggregate new order: (id=2,item=B) HandleOrder: C Aggregate old orders: (id=1,item=A);(id=2,item=B) Aggregate new order: (id=3,item=C) BuildCombinedResponse: (id=1,item=A);(id=2,item=B);(id=3,item=C) Response to caller: Response[(id=1,item=A);(id=2,item=B);(id=3,item=C)]

Stop processing in case of exception

Available as of Camel 2.1

The Splitter will by default continue to process the entire Exchange even in case of one of the splitted message will thrown an exception during routing.
For example if you have an Exchange with 1000 rows that you split and route each sub message. During processing of these sub messages an exception is thrown at the 17th. What Camel does by default is to process the remainder 983 messages. You have the chance to remedy or handle this in the AggregationStrategy.

But sometimes you just want Camel to stop and let the exception be propagated back, and let the Camel error handler handle it. You can do this in Camel 2.1 by specifying that it should stop in case of an exception occurred. This is done by the stopOnException option as shown below:

from("direct:start") .split(body().tokenize(",")).stopOnException() .process(new MyProcessor()) .to("mock:split");

And using XML DSL you specify it as follows:

xml

Using onPrepare to execute custom logic when preparing messages

Available as of Camel 2.8

See details at Multicast

Sharing unit of work

Available as of Camel 2.8

The Splitter will by default not share unit of work between the parent exchange and each splitted exchange. This means each sub exchange has its own individual unit of work.

For example you may have an use case, where you want to split a big message. And you want to regard that process as an atomic isolated operation that either is a success or failure. In case of a failure you want that big message to be moved into a dead letter queue. To support this use case, you would have to share the unit of work on the Splitter.

Here is an example in Java DSL{snippet:id=e1|lang=java|url=camel/trunk/camel-core/src/test/java/org/apache/camel/processor/SplitSubUnitOfWorkTest.java}Now in this example what would happen is that in case there is a problem processing each sub message, the error handler will kick in (yes error handling still applies for the sub messages). But what doesn't happen is that if a sub message fails all redelivery attempts (its exhausted), then its not moved into that dead letter queue. The reason is that we have shared the unit of work, so the sub message will report the error on the shared unit of work. When the Splitter is done, it checks the state of the shared unit of work and checks if any errors occurred. And if an error occurred it will set the exception on the Exchange and mark it for rollback. The error handler will yet again kick in, as the Exchange has been marked as rollback and it had an exception as well. No redelivery attempts is performed (as it was marked for rollback) and the Exchange will be moved into the dead letter queue.

Using this from XML DSL is just as easy as you just have to set the shareUnitOfWork attribute to true:{snippet:id=e1|lang=xml|url=camel/trunk/components/camel-spring/src/test/resources/org/apache/camel/spring/processor/SpringSplitSubUnitOfWorkTest.xml}

Implementation of shared unit of work

So in reality the unit of work is not shared as a single object instance. Instead SubUnitOfWork is attached to their parent, and issues callback to the parent about their status (commit or rollback). This may be refactored in Camel 3.0 where larger API changes can be done.

Using This Pattern