SCUFL2 is the new (since Taverna 3) mechanism for specifying Taverna workflows.
SCUFL2 defines a model, a workflow bundle file format (.wfbundle), and a Java API for working with workflow structures.
SCUFL2 is the workflow language for Taverna 3, and replaces Taverna 2's t2flow format.
Summary
SCUFL2 is the Taverna 3 mechanism for specifying Taverna workflows.
SCUFL2 adopts Linked Data technology and preservation methodologies to create a platform-independent workflow language
that can be inspected, modified, created and executed.
SCUFL2 comes with a Java API that can be used for programmatic access to read and write SCUFL2
workflow bundles.
A workflow bundle is a structured ZIP file with the workflow definitions included as XML documents.
Those workflow documents are described by an XML Schema and are also valid RDF/XML.
The XML Schema allows tools to read and write SCUFL2 workflow definitions as regular structured XML.
The RDF allows RDF-enabled tools to link workflow definitions with external resources.
The workflow structure is defined using an OWL ontology
and annotated with URIs so that third parties can form semantic statements about any component of a Scufl2 workflow,
for example to state that a particular service produces outputs of a certain type,
or that a data link was added by a specific researcher.
Semantic annotations and a manifest for the bundle declare the purpose of,
and links between the different components forming a workflow.
This allows third parties to extract and append annotations about data and services used by the workflow.
Motivation
The t2flow serialization format suffers from being very close to the Java object model,
and contains various items that are simply Java beans serialized using XMLBeans.
As the t2flow format is very verbose, it can be difficult to deal with for third party software to do
inspection ("Which services does this workflow use?"), modification
("Change all calls to broken.com to fixed.com") and generation
("Build a custom workflow from a button").
Developers have informed us that the old SCUFL format of Taverna 1 was significantly easier to work with.
However, this format also has its caveats, like no schema,
unidentified ways to extend service definitions for Taverna plugins and
not supporting various new features in the Taverna 2 engine.
We have therefore decided to form a new serialisation format for workflows, called SCUFL2.
Overview
SCUFL2 consists of:
Scufl2-WorkflowBundle
The entry point of the Taverna Workflow Bundle.
Defines the workflows and profiles of a
Taverna Workflow Bundle.
The main workflow is also normally defined, which would be the top-level workflow to execute.
The profiles defines how these workflows can be realised and executed on different environments,
one of which can be suggested as the main profile.
Bundle path and root files
The workflow bundle document in RDF/XML format should be in in /workflowBundle.rdf
within the bundle archive.
If the archive is a workflow bundle,
i.e. /mimetype
is application/vnd.taverna.scufl2.workflow-bundle
,
then the META-INF/container.xml
can define root files at alternative paths and media types.
This specification requires that one of those formats is application/rdf+xml
according to this specification.
Example META-INF/container.xml
: (may be outdated)
<?xml version="1.0"?>
<container version="1.0"
xmlns=";urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="workflowBundle.ttl"
media-type="text/turtle" />
<rootfile full-path="workflowBundle.rdf"
media-type="application/rdf+xml" />
</rootfiles>
</container>
This defines the RDF/XML root file to be /workflowBundle.rdf
-
with workflowBundle.ttl
being an alternate representation the resource in Turtle format.
SCUFL2-compliant workflow bundle writers:
- Must set the bundle mimetype to
application/vnd.taverna.scufl2.workflow-bundle
- Must add a workflow bundle document in
application/rdf+xml
format
- Should store the workflow bundle document in
/workflowBundle.rdf
- Must not contain a resource
/workflowBundle.rdf
which is not the workflow bundle document
- If the
application/rdf+xml
representation is not in /workflowBundle.rdf
,
the writer must include META-INF/container.xml
with the required <rootfile>
entries.
META-INF/container.xml
, if present, must contain one and only one rootfile
with the media-type
application/rdf+xml
.
rootfiles of other media-types may be included, but their formats and restrictions are not defined by this specification.
- May Add additional representation of the workflow bundle document (and other documents).
Alternates of the workflow bundle document should be included in the
META-INF/container.xml
,
but only if they can be considered to fully specify the workflow bundle as in the RDF/XML format.
(So for instance a text/html
or image/png
representation would not normally be considered a
rootfile if it does not include all the structural information from the RDF/XML representation as specified here)
It is possible to include a workflow bundle document within a different kind of archive or bundle,
for instance in a data bundle.
In this case the bundle is not considered an application/vnd.taverna.scufl2.workflow-bundle
-
but producers of such archives:
- Should store the workflow bundle document in
/workflowBundle.rdf
,
unless the workflow bundle is not to be considered to have a 'main' or 'prominent' role within the archive.
(For instance if the archive is a collection of workflow bundles).
- Should have a
mimetype
and META-INF/container.xml
resource which declares the archive's main entry point,
like the data bundle document.
The mime type must not be application/vnd.taverna.scufl2.workflow-bundle
and
the root files should not be the workflow bundle document.
- Should link to the workflow bundle document from a resource within the archive which (ultimately)
is linked to from one of the
rootfile
documents.
Such documents should be in RDF/XML format.
- Should declare the media type of the RDF/XML workflow bundle document as
application/rdf+xml
in its META-INF/manifest.xml
SCUFL2 compliant workflow bundle readers:
- Should assume that
/workflowBundle.rdf
- if present -
is the root workflow bundle in the application/rdf+xml
format specified here.
- Should assume that if the archive's
mimetype
is application/vnd.taverna.scufl2.workflow-bundle
,
then the rootfile
in META-INF/container.xml
with the media type {{application/rdf+xml})
is the root workflow bundle document.
- May assume that any alternate formats listed as a
rootfile
in a application/vnd.taverna.scufl2.workflow-bundle
archive would fully cover the specification of the RDF/XML representation, and read such formats instead.
- May assume that any
application/rdf+xml
document with a xsi:type="WorkflowBundleDocument"
can be parsed according to the Scufl2 XML schema
Identifiers
Workflow bundles and their resources must be declared with relative identifiers within the archive.
In a application/vnd.taverna.scufl2.workflow-bundle
archive,
the workflow bundle must be identified as the root of the archive.
If the Workflow Bundle document is in workflowBundle.rdf
within the archive, the workflow identifier is ./
.
This should be achieved by setting xml:base="./"
and rdf:about=""
.
This means that one can mint a URI to refer to resources within the bundle archive, including the workflow bundle,
workflows and representations.
If http://example.com/myWfBundle.scufl2
returns a Scufl2 workflow bundle
archive of the content type application/vnd.taverna.scufl2.workflow-bundle
,
then (assuming default structure of the archive):
http://example.com/myWfBundle.scufl2
identifies for the Workflow Bundle representation (the archive)
http://example.com/myWfBundle.scufl2/
identifies the Workflow Bundle (as described here)
http://example.com/myWfBundle.scufl2/workflowBundle.rdf
identifies the Workflow Bundle representation in RDF/XML
http://example.com/myWfBundle.scufl2/workflow/HelloWorld/
identifies the "HelloWorld"
workflow within the bundle
http://example.com/myWfBundle.scufl2/workflow/HelloWorld.rdf
identifies the "HelloWorld" workflow representation in RDF/XML
http://example.com/myWfBundle.scufl2/workflow/HelloWorld/processor/Hello/
identifies the "Hello"
processor within the "HelloWorld" workflow.
Embedded workflow bundles
If the archive is another type of bundle which includes the workflow bundle
(but is not primarily playing the role as the format for this workflow bundle),
the local workflow identifier should be unique within the archive.
This is easiest achieved by using the same folder technique as for the workflow representations:
workflowBundle1.rdf
can define workflowBundle1/
exampleWorkflowBundles/hello.rdf
defines exampleWorkflowBundles/hello/
Such embedded workflow bundles should include their constituent representations
(such as workflow/HelloWorld.rdf
) within that folder,
for instance exampleWorkflowBundles/hello/workflow/HelloWorld.rdf
to define
exampleWorkflowBundles/hello/workflow/HelloWorld/
- but could also be shared among bundles,
for instance both workflowBundle1.rdf
and workflowBundle2.rdf
might refer to workflow/Shared.rdf
.
Global workflow bundle identifiers
Workflow bundles should declare a sameBaseAs reference to a globally unique non-informational URI.
Anyone can generate such a URI using the form http://ns.taverna.org.uk/2010/workflowBundle/UUID/
-
for instance http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/
-
including the trailing slash.
The semantics of sameBaseAs is a kind of recursive version of owl:sameAs -
so also resources which URI start with the same will be included.
So if:
@prefix : <;http://ns.taverna.org.uk/2010/scufl2> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema> .
<./> a :WorkflowBundle;
:name "HelloWorld";
:sameBaseAs <http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/>;
:mainWorkflow <workflow/HelloWorld/>;
:workflow <workflow/HelloWorld/>;
<workflow/HelloWorld/> a :Workflow;
rdfs:seeAlso <workflow/HelloWorld.ttl> .
then also:
<./>
= <http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/>
<workflow/HelloWorld/>
= <http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/>
<workflow/HelloWorld.rdf>
= <http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld.rdf>
This allows anyone to make a statement about any resource within the workflow bundle,
even if the URL of the workflow bundle representation itself might change,
be it on a local USB disk, DropBox folder, myExperiment, etc.
Updating the UUID
It is up to the software editing or creating the workflow to assign a new UUID as soon as
any change has been done to any workflow, profile or workflow bundle,
as this is the globally unique identifier for this workflow archive,
and also the base URI for all the other resources in the archive.
Not implemented by API Scufl2 API do not yet
automatically update the workflow bundle identifier.
SCUFL2-41
To update the URI, use workflowBundle.setSameBaseAs(WorkflowBundle.generateIdentifier())
.
Example representation in RDF/XML
(may be outdated)
<?xml version="1.0"?>
<rdf:RDF xmlns="http://ns.taverna.org.uk/2010/scufl2#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://ns.taverna.org.uk/2010/scufl2# http://ns.taverna.org.uk/2010/scufl2/scufl2.xsd http://www.w3.org/1999/02/22-rdf-syntax-ns# http://ns.taverna.org.uk/2010/scufl2/rdf.xsd"
xsi:type="WorkflowBundleDocument" xml:base="./">
<WorkflowBundle rdf:about="">
<name>HelloWorld</name>
<sameBaseAs
rdf:resource="http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/" />
<mainWorkflow rdf:resource="workflow/HelloWorld/" />
<workflow>
<Workflow rdf:about="workflow/HelloWorld/">
<rdfs:seeAlso rdf:resource="workflow/HelloWorld.rdf" />
</Workflow>
</workflow>
<!--
<workflow>
<Workflow rdf:about="workflow/SomeNestedWorkflow/">
<rdfs:seeAlso rdf:resource="workflow/SomeNestedWorkflow.rdf" />
</Workflow>
</workflow>
-->
<mainProfile rdf:resource="profile/tavernaWorkbench/" />
<profile>
<Profile rdf:about="profile/tavernaServer/">
<rdfs:seeAlso rdf:resource="profile/tavernaServer.rdf" />
</Profile>
</profile>
<profile>
<Profile rdf:about="profile/tavernaWorkbench/">
<rdfs:seeAlso rdf:resource="profile/tavernaWorkbench.rdf" />
</Profile>
</profile>
<rdfs:seeAlso rdf:resource="annotation/workflowBundle.rdf" />
</WorkflowBundle>
</rdf:RDF>
This example defines the workflow bundle "HelloWorld".
It contains one workflow workflow/HelloWorld
, which is also the main workflow.
Any additional workflows are typically nested (and nested-nested, etc) workflows bound as activities in processors).
Two execution profiles are provided, and profile/tavernaWorkbench
is dedicated as the main profile.
Properties
- name (required) gives the human readable title for this workflow archive.
This is a subproperty of
dc:title
.
- sameBaseAs (optional) gives a unique URI which is owl:sameAs with this workflow bundle and its children.
- workflow (required) All workflows included in this bundle.
Each workflow must have an rdfs:seeAlso link to the bundle resource that defines the workflow,
typically
workflow/workflowName.rdf
corresponding to the non-information resource workflow/workflowName/
.
- mainWorkflow (optional) The reference to the top-level workflow of this bundle.
It is valid to have a workflow bundle without a main workflow,
for instance if the bundled workflows are unconnected "workflow fragments".
If there is a mainProfile the workflow bundle must also have a mainWorkflow.
The main workflow must always be listed under workflow.
- profile (optional) profiles specifying how to execute the bundled workflows. In particular the profile provides a set of configured activities bound to the processors for a particular run environment. If no profiles are specified this is an abstract workflow bundle.
- mainProfile (optional) the suggested main profile. Execution platforms unable to choose between the provided profiles can select this profile as a default. It is valid to have a workflow bundle without a main profile (even if it has other profiles), but any main profile must be listed under profile.
- rdfs:seeAlso (optional) link to annotations about the workflow bundle and its content. Traditionally found in
annotation/workflowBundle.rdf
, which should contain further links to annotations from different sources, for instance annotation/myExperiment.rdf
for annotations included from myExperiment.
Bundle links
The workflow bundle document is the starting point for finding all workflow bundle resources within the archive.
Each of the workflows and profiles must therefore have a rdfs:seeAlso link to the bundle resource that defines it.
If alternate formats other than the required RDF/XML format is included in the bundle,
these formats can therefore link to resources in other formats, for instance in an additional workflowBundle.ttl
(Turtle format):
@prefix : <http://ns.taverna.org.uk/2010/scufl2#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<./> a :WorkflowBundle;
:mainProfile <profile/tavernaWorkbench/>;
:mainWorkflow <workflow/HelloWorld/>;
:name "HelloWorld";
:profile <profile/tavernaServer/>,
<profile/tavernaWorkbench/>;
:sameBaseAs <http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/>;
:workflow <workflow/HelloWorld/>;
rdfs:seeAlso <annotation/workflowBundle.ttl> .
<profile/tavernaServer/> a :Profile;
rdfs:seeAlso <profile/tavernaServer.ttl> .
<profile/tavernaWorkbench/> a :Profile;
rdfs:seeAlso <profile/tavernaWorkbench.ttl> .
<workflow/HelloWorld/> a :Workflow;
rdfs:seeAlso <workflow/HelloWorld.ttl> .
Parsing/writing
SCUFL2 compliant writers, when producing the workflow bundle document:
- Should write the workflow bundle RDF/XML document according to the SCUFL2 XML schema,
use the default namespace
xmlns="http://ns.taverna.org.uk/2010/scufl2#"
and declare the
xsi:type="WorkflowBundleDocument"
- Must ensure the workflow bundle RDF/XML document is valid RDF/XML
and includes the properties deemed required by this specification.
Conforming to the XML schema should ensure this.
- Should set the
xml:base
property to"./
- Should set rdf:about to
""
(or "./"' if
xml:base` is not set))
- Should declare a mainWorkflow and mainProfile
- Mustensure that any workflow/profile has a relative rdfs:seeAlso link to a bundle resource in
application/rdf+xml
which defines
that workflow / profile.
SCUFL2 compliant readers, when parsing a workflow bundle document:
May assume that a declared workflow/profile is defined in the referenced representation.
For instance, if:
<workflow>
<Workflow rdf:about="workflow/SomeNestedWorkflow/">
<rdfs:seeAlso rdf:resource="workflow/SomeNestedWorkflow.rdf" />
</Workflow>
</workflow>
then workflow/SomeNestedWorkflow.rdf
> must contain a workflow definition
for workflow/SomeNestedWorkflow/
.
May parse the /workflowBundle.rdf
as RDF/XML
May parse the {/workflowBundle.rdf}} according to the XML schema if the xsi:type="WorkflowBundleDocument"
is set on the rdf:RDF
element.
Scufl2-Workflow
The definition of a workflow, its processors, inputs/outputs and links.
- Bundle path:
/workflow/\$name.n3
Each nested workflow (and nested nested workflows etc.) exists in a separate file within the /workflow/
folder in the bundle.
The bundle's research object defines what is the top level workflow.
Identifier
Each workflow must have a unique name within the bundle's workflow files.
The base part of the file name (excluding extension) must match the scufl2:name of the workflow.
Workflows used in a particular research object are globally identified as
<http://ns.taverna.org.uk/2010/researchObject/$uuid/workflow/$workflowName/>
-
for instance <http://ns.taverna.org.uk/2010/researchObject/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/Helloworld/>
.
As it can be useful to identify nested workflows included in several workflow bundles,
each workflow must also have a scufl2:workflowIdentifier property, which URI must be on the form
<http://ns.taverna.org.uk/2010/workflow/$uuid>
, for example
<http://ns.taverna.org.uk/2010/workflow/efb1cdcd-1e19-408a-885a-303c6553a672/>
.
It is responsibility of the software creating or modifying the workflow to generate a new workflowcIdentifier
as soon as the workflow has changed.
(The randomly generated UUID of the workflow must not match the UUID of a research object or any other workflow.)
The owning research object should then also be assigned a new UUID.
(Note that editing metadata and bindings in other files don't update the workflow, but still update the research object.)
The workflow file should set the @base to the form <workflow/$name>
}
so that nested resources can be referenced relatively, like {{<processor/$processorName>.
Example
(Outdated)
@base <workflow/Helloworld/>
@prefix scufl2: <http://ns.taverna.org.uk/2010/scufl2/ontology/> .
<>
a scufl2:Workflow ;
scufl2:name "Helloworld" ;
scufl2:inputWorkflowPort <in/yourName> ;
scufl2:outputWorkflowPort <out/results> ;
scufl2:datalink <datalink/1>, <datalink/2> ;
scufl2:processor <processor/Hello> .
]]>
This example defines the workflow "HelloWorld".
It contains one workflow input port and one workflow output port, in addition to a single processor and two datalinks.
The nested resources for this workflow, such as InputWorkflowPort , OutputWorkflowPort,
DataLink, Processor and their children should be described
in the same file as the workflow itself.
Additional metadata should be added to an /annotations/
file.
Properties
- scufl2:name (required) gives the programmatic short-name for this workflow within this bundle.
This must be unique among the other workflows.
- scufl2:workflowIdentifier (required) gives the globally unique URI defining this workflow.
See Identifier above.
- scufl2:inputWorkflowPort (optional) All workflow input ports defined for this workflow.
- scufl2:outputWorkflowPort (optional) All workflow output ports defined for this workflow.
- scufl2:datalink (optional) All datalinks
defined between workflow and processor ports in this workflow.
(Note that if this is a nested workflow, its outside links are defined in the parent workflow)
- scufl2:processor (optional) All processors in this workflow.
Bundle links
All nested workflow resources should also be defined in the same archive file as this workflow.
Their URIs must be relative to this workflow, their type and scufl2:name.
So for instance <workflow/Helloworld/processor/Hello> is a scufl2:Processor in <workflow/Helloworld>,
and has a scufl2:name "Hello".
Scufl2 Profile
Details of the Scufl2 profile still to be added.
Taverna Workflow Bundle
The primary SCUFL2 file format is the Taverna workflow bundle.
Media type |
application/vnd.taverna.scufl2.workflow-bundle |
File extension |
.wfbundle |
File type |
Zip archive |
This file is a structured ZIP archive, based on the
Adobe UCF
format.
This is similar to the structured ZIPs used by the OpenOffice format ODF.
For a file to be a Taverna Workflow Bundle if it must:
- Is a valid ZIP container
- Contains the file
mimetype
with the ASCII content application/vnd.taverna.scufl2.workflow-bundle
(without LF/CR)
- Contains the file
workflowBundle.rdf
as a valid RDF/XML
document describing a workflow bundle
To be fully compliant, the bundle should also:
- Contain a valid
META-INF/manifest.xml
file listing all files in the archive
- Contain a valid
META-INF/container.xml
file including an entry for workflowBundle.rdf
The workflow bundle document is the top level entry point ("root file")
for the archive (think: index.html
), and describes:
- Which workflows are included in the bundle under
workflow/
- Which profiles are included in the bundle under
profile/
- Which of the workflows is the suggested main workflow
- Which of the profiles is the suggested main profile
- What is the global workflow bundle identifier.
A Workflow Bundle document can also be included as part of any other bundle,
archive or resource according to these specifications.
In that case the resource name might or might not be workflowBundle.rdf
,
this depends on the specification of that other format.
Archive directory structure
Path |
Type |
Description |
mimetype |
Text |
Mime type of bundle, ie. application/vnd.taverna.scufl2.workflow-bundle |
META-INF/ |
Folder |
Reserved folder for manifest |
META-INF/manifest.xml |
XML |
ODF 1.3-like manifest, listing each file, mime-type and file size |
META-INF/container.xml |
XML |
Adobe UCF/OEBPS list of root files (ie. workflowBundle.rdf )) |
workflowBundle.rdf |
RDF/XML |
Workflow Bundle Document |
vworkflow/ |
Folder |
Workflow definitions |
workflow/HelloWorld.rdf |
RDF/XML |
Workflow definition for "HelloWorld" |
workflow/otherWorkflow.rdf |
RDF/XML |
Workflow definition for "otherWorkflow" |
profile/ |
Folder |
Execution Profile definitions |
profile/someProfile.rdf |
RDF/XML |
Profile definition "someProfile" |
profile/other.rdf |
RDF/XML |
Profile definition "other" |
The archive must be a ZIP file, and should have the file extension .wfbundle
.
Some situations might require treating the workflow bundle as an unpacked set of folders.
In this case the top folder should still have the file extension .wfbundle
.
According to the Adobe UCF specifications, the mimetype
file must be the first file in the folder,
and must be stored without compression, encryption or permission attributes,
to support detection by mimemagic and similar.
The file META-INF/manifest.xml
- if present - must list every non-META-INF
file and folder in the archive,
including the root folder.
It should provide the mime-type - if known - for individual files.
The root folder should have the same mime type as in the mimetype
file - application/vnd.taverna.scufl2.workflow-bundle
.
The file META-INF/container.xml
- if present - should point to the 'root' workflow bundle document.
One and only one entry which must be of the mime type application/rdf+xml
,
and this entry must be called workflowBundle.rdf
.
Alternative representation of the workflow bundle root document can be included in other formats,
there's no similar restriction on their filenames, although it is recommended they match the RDF/XML filename,
for instance workflowBundle.html
, workflowBundle.json
, etc.
The folder workflow
contain each of the workflow definitions as
Workflow Documents.
One of these is typically the main workflow while the others are nested workflows,
but there is no requirement that the workflows included are to be included as a nested workflow or a main workflow.
Such 'dangling workflows' can be considered to be only declared workflows -
they might be there for historical reasons or because the workflow bundle is at an early stage of development
when there is no main workflow yet.
The execution details of workflows (such as activity choice, configuration) are described in the profile
folder,
one Profile Document per possible execution binding.
(For instance, one profile for the graphical Workbench, one for the Taverna Server and one for the Taverna Portal.).
One profile document can include execution details for several workflows,
but there could also be workflows which don't have any execution details in any profile -
these can be considered abstract workflows.
workflowBundle.rdf
The workflow bundle document workflowBundle.rdf
should list each of these workflows and profiles,
and should suggest the main workflow and main profile.
mimetype
This file is required, as a guide for mime magic and similar tools that guess the type of the archive.
Therefore it must be added as the first file to the archive, uncompressed,
so that its content is available in cleartext in the first bytes of the ZIP archive.
The file must be in ASCII and not contain any line feeds.
If the archive is a Taverna Workflow Bundle, the mime type should be application/vnd.taverna.scufl2.workflow-bundle
.
If META-INF/manifest.xml
is present, this mime type must match the mime type of "/"
in the manifest.
To add the file mimetype
as the first uncompressed file, followed by the rest of the bundle (excluding the mimetype file),
try using InfoZip:
$ zip -0 -X ../example.wfbundle mimetype
adding: mimetype (stored 0%)
$ zip -X -r ../example.wfbundle . -x mimetype
adding: workflowBundle.rdf (deflated 74%)
adding: workflow/ (stored 0%)
adding: workflow/HelloWorld.rdf (stored 0%)
..
adding: META-INF/ (stored 0%)
adding: META-INF/manifest.xml (deflated 78%)
adding: META-INF/container.xml (deflated 50%)
To verify:
$ unzip -lv ../example.wfbundle
Archive: ../example.wfbundle
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
35 Stored 35 0% 2010-10-11 16:44 8373c7d8 mimetype
3047 Defl:N 786 74% 2010-10-13 09:40 743ecfe4 workflowBundle.rdf
0 Stored 0 0% 2010-10-06 14:57 00000000 workflow/
...
$ python -c "print open ('../example.wfbundle').read(128)[38:84]"
print("code sample");`application/vnd.taverna.scufl2.workflow-bundle
This file, if exists, should follow the OpenDocument container format,
and list every file in the bundle (except for the META-INF files).
The main functionality provided by the manifest is to give the mime-type of additional resources.
As a minimum the mime-type should distinguish between text/plain
(UTF-8 text) and application/octet-stream
(binary).
If a mime-magick like tool has guessed a more detailed mime type, it can also be provided here.
Additionally the manifest may specify the file sizes,
cccin general this can be useful when inspecting a larger workflow bundle remotely (exposed as a RESTful folder or similar).
The folder /
represents the bundle itself, and must have the same mime type as in the file mimetype
,
ie. application/vnd.taverna.scufl2.workflow-bundle
.
A different mime type might be used if the primary purpose of the archive is different from being a workflow bundle,
for instance being a Taverna Data Bundle.
The workflowBundle.rdf
file must be listed in the manifest, and it must be listed with the application/rdf+xml
mime type.
Any alternative representations must also be listed, and their mime type must match those in META-INF/container.xml
(see below).
The other folders are not required to have a mimetype.
If there is no manifest in the workflow bundle,
all data value files should be treated to be binary application/octet-stream
,
unless they have one of these file extensions:
*.txt
is text/plain
in UTF-8 character set
*.rdf
is application/rdf+xml
Example manifest:
<?xml version="1.0" encoding="UTF-8"?>
<manifest:manifest xmlns:manifest="urn:oasis:names:tc:opendocument:xmlns:manifest:1.0">
<manifest:file-entry manifest:media-type="application/vnd.taverna.scufl2.workflow-bundle" manifest:full-path="/"/>
<manifest:file-entry manifest:media-type="application/rdf+xml" manifest:full-path="workflowBundle.rdf"/>
<manifest:file-entry manifest:media-type="application/rdf+xml" manifest:full-path="workflow/HelloWorld.rdf"/>
<manifest:file-entry manifest:media-type="application/rdf+xml" manifest:full-path="annotation/workflow/HelloWorld.rdf"/>
<manifest:file-entry manifest:media-type="application/rdf+xml" manifest:full-path="annotation/workflowBundle.rdf"/>
<manifest:file-entry manifest:media-type="application/rdf+xml" manifest:full-path="profile/tavernaWorkbench.rdf"/>
<manifest:file-entry manifest:media-type="application/rdf+xml" manifest:full-path="profile/tavernaServer.rdf"/>
<manifest:file-entry manifest:media-type="text/turtle" manifest:full-path="workflowBundle.ttl"/>
<manifest:file-entry manifest:media-type="text/turtle" manifest:full-path="workflow/HelloWorld.ttl"/>
<manifest:file-entry manifest:media-type="text/turtle" manifest:full-path="annotation/workflow/HelloWorld.ttl"/>
<manifest:file-entry manifest:media-type="text/turtle" manifest:full-path="annotation/workflowBundle.ttl"/>
<manifest:file-entry manifest:media-type="text/turtle" manifest:full-path="profile/tavernaWorkbench.ttl"/>
<manifest:file-entry manifest:media-type="text/turtle" manifest:full-path="profile/tavernaServer.ttl"/>
<manifest:file-entry manifest:media-type="image/svg+xml" manifest:full-path="Thumbnails/thumbnail.svg"/>
<manifest:file-entry manifest:media-type="image/png" manifest:full-path="Thumbnails/thumbnail.png"/>
<manifest:file-entry manifest:media-type="image/svg+xml" manifest:full-path="diagram/workflow/HelloWorld.svg"/>
<manifest:file-entry manifest:media-type="image/png" manifest:full-path="diagram/workflow/HelloWorld.png"/>
</manifest:manifest>
This file, if present, should point to the root workflow bundle document,
which in an application/vnd.taverna.scufl2.workflow-bundle
must be workflowBundle.rdf
.
Alternative representation of the same file are permitted,
but SCUFL2 compliant tools are only required to understand the application/rdf+xml
representations described here.
The Adobe UCF specification defines the
format of this container file.
XML namespace in container.xml
Adobe UCF have used the XML namespace `urn:oasis:names:tc:opendocument:xmlns:container` although this format
is not defined by OASIS or the Open Document specification.
SCUFL2 compliant tools should therefore parse `container.xml` ignoring any default namespaces, and write using the default name
space and <container
xmlns="urn:oasis:names:tc:opendocument:xmlns:container"
as the root
element.
If the archive is of the mime type application/vnd.taverna.scufl2.workflow-bundle
and contains other representations of the workflow bundle (for instance: JSON,
Turtle, t2flow) then the bundle must have a container file and list these representations in addition to
workflowBundle.rdf
.
Derived representations such as SVG diagrams and HTML reports should generally not be listed as 'root files'
unless they can be considered to 'fully represent the workflow bundle', for instance by using RDFa.
A SCUFL2 compliant parser can assume that an archive which is not of the mime type
application/vnd.taverna.scufl2.workflow-bundle
,
but does contain a META-INF/container.xml
-listed root file named workflowBundle.rdf
,
that file can be read as an RDF/XML representation of a workflow bundle document,
even if it is not declared as having the application/rdf+xml
mime type.
This enables any future extensions superseeding this application/vnd.taverna.scufl2.workflow-bundle
format.
All rootfiles must be equivalent and describe the same workflow structure,
although additional formats can include more or less information than the required format.
There should be only one rootfile per media-type, and there must be a rootfile for the media type application/rdf+xml
.
Example:
<?xml version="1.0"?>
<container version="1.0"
xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="workflowBundle.ttl"
media-type="text/turtle" />
<rootfile full-path="workflowBundle.rdf"
media-type="application/rdf+xml" />
</rootfiles>
<relationships>
<relationship type="metadata" target="/annotation/$dir/$filename.$ext" />
</relationships>
</container>
Unknown files and file types
Any other files in workflow
and profile
should be ignored by SCUFL2 compliant parsers,
regardless of if they have the application/rdf+xml
mime type or not.
When a SCUFL2 compliant tool has modified an existing Workflow Bundle,
it should remove such unknown files from workflow
and profile
when saving,
unless it has the capabilities to also update these.
These files would typically be representations in other formats which would be out of date after the editing.
On the other hand, if the tool has not structurally modified a workflow or profile,
the tool should not remove unknown files from workflow
and profile
.
On removal of files, the tool should also remove them from META-INF/manifest.xml
and if necessary from
META-INF/container.xml
.
Additional resources
The workflow bundle format is an open-ended specification, so the archive can include additional resources not described here.
For instance the bundle can include:
- Thumbnail of bundle (mini-diagram) (Recommendation:
META-INF/Thumbnails/thumbnail.png
and Thumbnails/thumbnail.svg
)
- Ontologies referenced from RDF/XML files, in particular from configurations
(Recommendation:
ontology/taverna2.2/beanshell.rdf
)
- Diagrams of workflows (Recommendation:
diagram/workflow/HelloWorld.svg
and .png
)
- Alternative representations (RDF, JSON) (Recommendation: Same naming conventions with different extensions)
- Annotations (Recommendation: under
annotations/
in RDF/XML format) -
one file per annotation source, like `myExperiment.rdf)
- Resources/binaries/data needed by workflow (Recommendation: under
resources/
- Example input and output data (Recommendation: as in data bundle)
- Provenance and data of one or more workflow runs (Recommendation: under
run/
A workflow bundle can also play 'double roles' by being other bundles, like a data bundle.
It is the mimetype
and root file that determines what is the "main function" of the bundle,
suggesting which tool should primarily open the bundle.
One can for instance imagine an UCF archive which primarily is an Adobe PDFXML file for a published paper
(see: Mars project) and should be opened in Adobe Acrobat Reader.
However, it can also contain workflowBundle.rdf
, workflow/importantResearch.rdf
,
and could therefore also be opened using SCUFL2 tools.
Scufl2-DataLink
The definition of a data link.
Bundle path: /workflow/\$workflowName.n3
Datalink should be described in the same file as their containing workflow.
Identifier
Each datalink is uniquely identified by their source and destination ports, in addition to the optional merge position.
Datalinks are globally identified as http://ns.taverna.org.uk/2010/researchObject/$researchObjectUUID/workflow/$workflow/datalink?from=$fromPort&to=$toPort&mergePosition=$mergePosition - for instance http://ns.taverna.org.uk/2010/researchObject/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/datalink?from=processor/A/out/result&to=processor/B/in/db defines the link in workflow "HelloWorld" from the output port "result" in the processor "A" going to the input port "db" for the processor "B". As there is no mergePosition there can't be any other links going to the "db" port.
As these URIs can be tricky to construct or maintain, feel free to use anonymous nodes, or construct alterative URIs as , the number here would not have any semantic meaning except it must be unique per datalink in that workflow.
Example
workflow/Helloworld.n3: (out of Date)
@base <workflow/Helloworld> .
@prefix scufl2: <http://ns.taverna.org.uk/2010/scufl2/ontology/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
<>
a scufl2:Workflow ;
scufl2:name "Helloworld" ;
scufl2:workflowIdentifier <http://ns.taverna.org.uk/2010/workflow/00626652-55ae-4a9e-80d4-c8e9ac84e2ca/> ;
scufl2:inputWorkflowPort <in/yourName> ;
scufl2:outputWorkflowPort <out/results> ;
scufl2:datalink _:datalink1, <datalink/5>, <datalink?from=processor/Hello/out/greeting&to=out/results&mergePosition=0> ;
scufl2:processor <processor/Hello> .
<in/yourName> a scufl2:InputWorkflowPort ;
scufl2:name "yourName" ;
scufl2:portDepth 0 .
<out/results> a scufl2:OutputWorkflowPort ;
scufl2:name "results" .
_:datalink1 a scufl2:DataLink ;
scufl2:receivesFrom <in/yourName> ;
scufl2:sendsTo <processor/Hello/in/name> .
<datalink/5> a scufl2:DataLink ;
scufl2:receivesFrom <in/yourName> ;
scufl2:sendsTo <out/results> ;
scufl2:mergePosition 1 .
<datalink?from=processor/Hello/out/greeting&to=out/results&mergePosition=0> a scufl2:DataLink ;
scufl2:receivesFrom <processor/Hello/out/greeting> ;
scufl2:sendsTo <out/results> ;
scufl2:mergePosition 0 .
This example defines three datalinks.
The first link _:datalink1
is just an anonymous node without an identifier. It defines a data link from the input port "yourName" to the processor input port "name". This link could also have been written embedded with the workflow:
<> a scufl2:Workflow ;
...
scufl2:datalink [
scufl2:receivesFrom <in/yourName> ;
scufl2:sendsTo <processor/Hello/in/name>
] .
The second datalink <datalink/5>>
defines a link directly from the workflow input port "yourName" to the output port "results".
Links must go from a scufl2:SendingPort sending to a scufl2:ReceivingPort,
meaning from a workflow input port or processor output port, going to either a workflow output port or processor input port.
Several links can receive from the same sending port.
Merges
Merges is a way Taverna allows you to connect several links to the
same scufl2:ReceivingPort, that is to a processor input port or
workflow output port. When executing, values from each link will be
inserted into the specified scufl2:mergePosition in a new list.
You only need to specify scufl2:mergePosition if more than one link is
connected to the same processor input port or workflow output port. If
there is more than one link connected to the same receiving port, all
of them need to specify a unique mergePosition, starting from 0, with
no gaps. If you specify a single link to the port with a mergePosition
of 0, that port input would still be wrapped in a list.
The third datalink, specified using the 'full' URI <datalink?from=processor/Hello/out/greeting&to=out/results&mergePosition=0>
,
defines the link from the output port "out" in processor "Hello", linking to the workflow output port "results".
As now two links go to that receiving port, they both need to specify a unique mergePosition.
The second link specifies position 1, and the third position 0.
That means that the output port will receive a list Scufl2-DataLink.
The second element 'yourName' would arrive first (as it is sent before "Hello" produces "greeting"),
but it would be arriving in position 1 rather than 0.
URI templates not enough
The full URIs such as
http://ns.taverna.org.uk/2010/researchObject/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/datalink?from=processor/A/out/result&to=processor/B/in/db
are meant to be helpful, not defining. The workflow definitions should
be complete without having to be parse these URIs. Such URIs are
however useful to be able to annotate or talk about workflow elements
outside of the workflow definition.
Although a resource is specified using a full URI which uniquely
identifies it, such as in the datalink above, the resource must still
be defined, such as the datalink must define the properties
scufl2:receivesFrom, scufl2:sendsTo (and optionally)
scufl2:mergePosition. Similarly the input port must
still be defined with a scufl2:name "yourName".
The nested resources for this workflow, such as InputProcessorPort ,
OutputProcessorPort, DispatchStack, IterationStrategyStack and their
children should be described in the same file as owning workflow
itself. Additional metadata should be added to an /annotations/ file.
Properties
- scufl2:receivesFrom (required) The scufl2:SendingPort this link is receiving data from.
The port must be in the same workflow as the link.
- scufl2:sendsTo (required) The scufl2:ReceivingPort this link is sending data to.
The port must be in the same workflow as the link.
- scufl2:mergePosition (optional) An integer, starting from 0.
Must be set where more than one datalinks sendsTo the same ReceivingPort.
The positions for a port must be sequentially assigned from 0 without gaps.
SCUFL2 API
Currently the most up to date information can be found in the github
readme
Previous Versions
For information on previous None Apache versions see the
Mygrid pages
SCUFL2 language
The SCUFL2 language is the abstract set of constructs that define a Taverna workflow.
This has been formalized in an OWL ontology,
which is used by the scufl2-rdfxml
module of the SCUFL2 API
to serialise the Taverna workflow bundle.
At the core of this is a workflow bundle,
which combines a set of workflows and
profiles.
You can browse a serialisation of using the SCUFL2 ontology within a Taverna workflow bundle in this
example (Links to follow):
Scufl2 URIs
URI tree of example workflow bundle
<http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/>
workflow/HelloWorld/
in/yourName
out/results
processor/wait4me/
iterationstrategy/
dispatchstack/
processor/Hello/
in/name
out/greeting
iterationstrategy/
dispatchstack/
datalink?from=processor/Hello/out/greeting&to=out/results&mergePosition=0
datalink?from=in/yourName&to=out/results&mergePosition=1
datalink?from=in/yourName&to=processor/Hello/in/name
control?block=processor/Hello/&untilFinished=processor/wait4me/
profile/tavernaWorkbench/
activity/HelloScript/
processorbinding/Hello/
configuration/Hello/
profile/tavernaServer/
activity/HelloScript/
processorbinding/Hello/
configuration/Hello/
Complete URI tree for example workflow bundle
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/in/yourName
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/out/results
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/wait4me/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/wait4me/iterationstrategy/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/wait4me/iterationstrategy/0/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/wait4me/dispatchstack/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/wait4me/dispatchstack/0/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/wait4me/dispatchstack/1/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/wait4me/dispatchstack/2/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/wait4me/dispatchstack/3/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/wait4me/dispatchstack/4/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/wait4me/dispatchstack/5/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/Hello/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/Hello/in/name
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/Hello/out/greeting
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/Hello/iterationstrategy/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/Hello/iterationstrategy/0/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/Hello/dispatchstack/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/Hello/dispatchstack/0/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/Hello/dispatchstack/1/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/Hello/dispatchstack/2/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/Hello/dispatchstack/3/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/Hello/dispatchstack/4/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/processor/Hello/dispatchstack/5/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/datalink?from=processor/Hello/out/greeting&to=out/results&mergePosition=0
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/datalink?from=in/yourName&to=out/results&mergePosition=1
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/datalink?from=in/yourName&to=processor/Hello/in/name
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/workflow/HelloWorld/control?block=processor/Hello/&untilFinished=processor/wait4me/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaWorkbench/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaWorkbench/activity/HelloScript/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaWorkbench/activity/HelloScript/in/personName
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaWorkbench/activity/HelloScript/out/hello
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaWorkbench/processorbinding/Hello/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaWorkbench/processorbinding/Hello/in/name
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaWorkbench/processorbinding/Hello/out/greeting
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaWorkbench/configuration/Hello/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaServer/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaServer/activity/HelloScript/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaServer/activity/HelloScript/in/personName
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaServer/activity/HelloScript/out/hello
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaServer/processorbinding/Hello/
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaServer/processorbinding/Hello/in/name
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaServer/processorbinding/Hello/out/greeting
http://ns.taverna.org.uk/2010/workflowBundle/28f7c554-4f35-401f-b34b-516e9a0ef731/profile/tavernaServer/configuration/Hello/
]]>
Scufl2 Ontology
Details of the Scufl2 Ontology will be available here later.