Apache
Home » Documentation » Bundles

Sling Pipes

tool set for doing extract - transform - load operations by chaining proven code bits.

often one-shot data transformations need sample code to be written & executed. This tiny tool set intends to provide ability to do such transformations with proven & reusable blocks called pipes, streaming resources from one to the other.

What is a pipe

         getOutputBinding

               ^
               |
 getInput  +---+---+   getOutput
           |       |
      +----> Pipe  +---->
           |       |
           +-------+

A sling pipe is essentially a sling resource stream:

At this moment, there are 3 types of pipes to consider:

A Plumber osgi service is provided to help getting, building & executing pipes.

How to configure & execute a pipe

A pipe configuration is ultimately a jcr node, with properties (varying a lot depending on the pipe type):

This configuration can be generated quickly through Pipe Builder API.

Once configuration is done, it's possible to execute Pipes

Pipe Builder API

Plumber can provider a PipeBuilder with newPipe(ResourceResolver resolver) API, that gives a fluent API to quickly configure and run pipes. e.g.

plumber.newPipe(resolver).xpath('//element(*,nt:unstructured)[@sling:resourceType='to/delete']").rm().run();

will search for resource of type to/delete and remove them.

PipeBuilder basically will automatically configure a container pipe, chaining pipes you can configure with a fluent API:

note that that configuration part has shortcuts for some pipes. Typically, above sample is a shorter equivalent of

plumber.newPipe(resolver).pipe('slingPipes/xpath').expr('//element(*,nt:unstructured)[@sling:resourceType='to/delete']").pipe('slingPipes/rm').run();

when available, shortcuts will be specified next to each pipe type documentation.

Once you are happy with the pipe you have created, you can terminate the builder with following command:

HTTP API

Request Path

which will return you the path of the resources that have been through the output of the configured pipe.

In the eventuality of a long execution (synchronous or asynchronous), you can retrieve the status of a pipe, by executing

GET /etc/pipes/mySamplePipe.status.json
Request Parameter binding

you can add as bindings parameter a json object of global bindings you want to add for the execution of the pipe

e.g.

curl -u admin:admin -F "path=/etc/pipes/test" -F "bindings={testBinding:'foo'}" http://localhost:4502/etc/pipes.json

will returns something like

{"size":2, "items":["/one/output/resource", "another/one"]}
Request Parameter writer

you can configure output of your servlet, with writer parameter, a json object as a pattern to the result you want to have. The values of the json object are expressions and can reuse each pipe's subpipe binding.

e.g.

curl -u admin:admin http://localhost:4502/etc/pipes/users.json?writer={"user":"${user.fullName}"}

will returns something similar to

{"size":2, "items":[{'user':'John Smith','path':'/home/users/q/q123jk1UAZS'},{'user':'John Doe','path':'/home/users/q/q153jk1UAZS'}]}
Request Parameter dryRun

if parameter dryRun is set to true, and the executed pipe is supposed to modify content, it will log (at best it can) the change it would have done, without doing anything

Request Parameter size

default response is truncated to 10 items, if you need more (or less), you can modify that settings with the size parameter

Request Parameter async

allow asynchronous execution of the given type. This is advised in case you plan your pipe execution to last longer than the session of your HTTP client. If used, the returned value will be id of the created sling Job. In that case you can monitor the pipes path with status selector as described above until it has the value finished.

Registered Pipes

readers

those are pipes that will spit out resources, without modifying them

Base pipe echo(path)

outputs what is in input (so what is configured in path)

SlingQuery Pipe ($(expr))

executes $(getInput()).children(expression)

MultiPropertyPipe

iterates through values of input multi value property and write them to bindings

XPathPipe (xpath(expr))

retrieve resources resulting of an xpath query

TraversePipe (traverse())

traverse current input resource's tree, outputing, as resources, either the node of the tree, either its properties

AuthorizablePipe (auth(conf))

retrieve authorizable resource corresponding to the id passed in expression, or if not found (or void expression), from the input path, output the found authorizable's resource caution this pipe can modify content in case additional configuration is added (see below)

ParentPipe (parent())

outputs the parent resource of input resource

FilterPipe (grep(conf))

outputs the input resource if its matches its configuration

as an example,

echo('/content/foo').grep('foo','bar','slingPipesFilter_not',true).run()

will either return /content/foo either nothing depending on it not containing @foo=bar

echo('content/foo').name('FOO').grep('slingPipesFilter_test','${FOO.foo == "bar"}').run()

is an equivalent

InputStream reader pipes

those are specific reader pipes, that read information an input stream from defined in expr configuration, that can be:

JsonPipe (json(expr))

feeds bindings with json stream

In case the json value is an array, the pipe will loop over the array elements, and output each one in the binding. Output resource remains each time the input one.

json('{items:[{val:1},{val:2},{val:3}]}').with('valuePath','$.items').name('demo')
mkdir('/content/${demo.val}.run()

should create a tree of 3 resources /content/1, /content/2 and /content/3

CsvPipe (csv(expr))

feeds bindings with csv stream

should create a tree of 3 resources /content/1, /content/2 and /content/3

containers

Container Pipe

assemble a sequence of pipes

Note that pipe builder api automatically creates one for you to chain the subpipe you are configuring

ReferencePipe

executes the pipe referenced in path property

NotPipe

executes the pipe referenced in path property, passes input only if referenced pipe doesn't return any resource

writers

Write Pipe (write(conf))

writes given nodes & properties to current input

e.g. echo('/content/foo').write('foo','bar').run() will write @foo=bar in /content/foo

MovePipe (mv(expr))

JCR move of current input to target path (can be a node or a property)

RemovePipe (rm())

removes the input resource, returns the parent, regardless of the resource being a node, or a property

PathPipe (mkdir(expr))

get or create path given in expression

Making configuration dynamic with pipe bindings

in order to make things interesting, most of the configurations are javascript template strings, hence valid js expressions reusing bindings (from configuration, or other pipes).

Following configurations are evaluated:

you can use name of previous pipes in the pipe container, or the special binding path, where path.previousPipe is the path of the current resource of previous pipe named previousPipe

global bindings can be set at pipe execution, external scripts can be added to the execution as well (see pipe configurations)

sample configurations

slingQuery | write

write repository user prefix Ms/Mr depending on gender

  plumber.newPipe(resolver).xpath('/jcr:root/home/users//element(*,rep:Users)')
  .$('nt:unstructured#profile')
  .write("fullName","${(profile.gender === 'female' ? 'Ms ' + profile.fullName : 'Mr ' + profile.fullName)}")
  .run()
slingQuery | multiProperty | authorizable | write

move badge<->user relation ship from badge->users MV property to a user->badges MV property

 plumber.newPipe(resolver).echo('/etc/badges/jcr:content/par')
 .$('[sling:resourceType=myApp/components/badge]').name('badge')
 .pipe('slingPipes/multiProperty').path('${path.badge}/profiles').name('profile')
 .auth('${profile}').name('user')
 .echo('${path.user}/profile')
 .write('badges','+[${path.badge}]')
 .run()
echo | $ | $ | echo | json | write

this use case is for completing repository website with external system's data (that has an json api), it does

This pipe is run asynchronously in case the execution takes long.

plumber.newPipe(resolver)
 .echo("/content/mySite")
 .$('my:Page')
 .$('my:Page').name("localePage")
 .echo('${path.localePage}/jcr:content').name("content")
 .json('https://www.external.com/api/${content.country.toUpperCase()}.json.name('api')
 .write('cachedValue','${api.remoteJsonValueWeWant}')
 .runAsync(null)
xpath | parent | rm

some other samples are in https://github.com/npeltier/sling-pipes/tree/master/src/test/

Compatibility

For running this tool on a sling instance you need:

Rev. 1809827 by npeltier on Wed, 27 Sep 2017 10:40:34 +0000
Apache Sling, Sling, Apache, the Apache feather logo, and the Apache Sling project logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.