The file transport, also known as the VFS (Virtual File System) transport, can be used to read, mediate and write file content using Synapse. This transport allows Synapse to interface with the local file system and remote file systems via file transfer protocols such as FTP.
The file transport is based on Apache Commons VFS, and supports all the file transfer protocols supported by Commons VFS. This includes interactions with the local file system, HTTP, HTTPS, FTP and SFTP (i.e. file transfer over SSH).
There is a fundamental difference between the file transport and transports such as HTTP, and it is important to understand this difference to be able to use the file transport correctly. The HTTP transport binds to a single protocol endpoint, i.e. a TCP port on which it accepts incoming HTTP requests. These requests are then dispatched to the appropriate service based on the request URI. On the other hand, the file transport only receives the payload of a message (i.e. the file), but no additional information that could be used to dispatch the message to a service. This means that file system locations must be explicitly mapped to services. This is done using a set of service parameters. For Synapse this means that the VFS transport listener can only be used in conjunction with proxy services. The relevant service parameters are specified in the proxy service configuration as follows:
In the above example the file system location file:///var/spool/synapse/in is explicitly bound to MyVFSService. Any file dropped in that location will be pre-dispatched to MyVFSService, bypassing any other configured dispatch mechanisms that would normally apply to messages received via HTTP.
The file transport consists of a transport listener component and a transport sender component. Proxy services can read files using the file transport listener, and they can write file content using the file transport sender. Following sections describe how to configure these two components of the transport.
Before a proxy service can read files, the VFS listener must be enabled in the SYNAPSE_HOME/repository/conf/axis2.xml file of Synapse. Look for the following XML configuration in the axis2.xml file, and uncomment it if it's commented out.
To configure a proxy service to receive messages via the VFS listener (i.e. read files from some local or remote location), set the "transports" attribute on the proxy service element to "vfs":
It's also possible to expose a proxy service on VFS transport and several other transports. Simply specify the required transports as a space-separated list in the "transports" attribute:
A proxy service configured with the VFS listener, can be further customized by setting a number of parameters (some of which are required). Following table lists all the supported service parameters. Please refer sample 254 for an example that demonstrates how to use some of these settings.
Parameter Name | Description/Example | Required | Default |
---|---|---|---|
transport.vfs.FileURI |
The primary location to read the file contents from. This must be
specified as a valid URI and it may point to a file or a directory. If
a directory location is specified, the transport will attempt to read
any file dropped into the directory.
<parameter name="transport.vfs.FileURI">file:///home/user/test/in</parameter>
<parameter name="transport.vfs.FileURI">sftp://bob:password@example.com/logs</parameter>
|
Yes | N/A |
transport.vfs.ContentType |
The expected content type for files retrieved for this service. The VFS
transport uses this information to select the appropriate message builder.
<parameter name="transport.vfs.ContentType">text/xml</parameter>
|
Yes | N/A |
transport.vfs.FileNamePattern |
A file name regex pattern to match when fetching files from a directory
specified by the FileURI.
<parameter name="transport.vfs.FileNamePattern">.*.xml</parameter>
|
No | N/A |
transport.PollInterval |
The polling interval in seconds.
<parameter name="transport.PollInterval">10</parameter>
|
No | 300 |
transport.vfs.ActionAfterProcess |
Once a file has been read and successfully processed by Synapse (i.e.
without any errors and runtime exceptions), the file should be
either moved or deleted to prevent Synapse from processing the file for
a second time. This parameter specifies which of the above actions
should be taken. Allowed values are MOVE or DELETE.
<parameter name="transport.vfs.ActionAfterProcess">MOVE</parameter>
|
No | DELETE |
transport.vfs.MoveAfterProcess |
Specify the location to which the files should be moved after successfully
processing them. Required if transport.vfs.ActionAfterProcess is set to
MOVE. Ignored otherwise. Value must be a valid URI (local or remote).
<parameter name="transport.vfs.MoveAfterProcess">file:///home/test/original</parameter>
|
No | N/A |
transport.vfs.ActionAfterFailures |
If Synapse encounters an error while processing a file, the file should be
either moved or deleted to prevent Synapse from processing the file for
a second time. This parameter specifies which of the above actions
should be taken. Allowed values are MOVE or DELETE.
<parameter name="transport.vfs.ActionAfterFailure">MOVE</parameter>
|
No | DELETE |
transport.vfs.MoveAfterFailure |
Specify the location to which the files should be moved after a failure. R
equired if transport.vfs.ActionAfterFailure is set to
MOVE. Ignored otherwise. Value must be a valid URI (local or remote).
<parameter name="transport.vfs.MoveAfterFailure">file:///home/user/test/error</parameter>
|
No | N/A |
transport.vfs.ReplyFileURI |
Specify the reply file location as a URI, in case the proxy service
should generate a response message (file) after processing an input file.
<parameter name="transport.vfs.ReplyFileURI">file:///home/user/test/out</parameter>
|
No | N/A |
transport.vfs.ReplyFileName |
Name of the response file that should be generated by the proxy service.
<parameter name="transport.vfs.ReplyFileName">file:///home/user/test/out</parameter>
|
No | response.xml or response.dat depending on the content type of the response |
transport.vfs.MoveTimestampFormat |
Must be a timestamp format string compatible with
java.text.SimpleDateFormat.
If specified, Synapse will append a timestamp in the specified format to
all the file names, whenever a file is moved to a new location (i.e. when
moving a file after processing it or after a failure).
<parameter name="transport.vfs.MoveTimestampFormat">yy-MM-dd:HHmmss</parameter>
|
No | N/A |
transport.vfs.Locking |
File locking makes sure that each file is accessed by only one proxy
service at any given instant. This is important when multiple proxy
services are reading files from the same location or when one proxy service
is configured to read the files written by another proxy service.
By default file locking is globally enabled in the VFS transport, and
this parameter lets you configure the locking behavior on a per service
basis. Possible values are enable or disable, and both these values are
important because locking can be disabled at the global level by
specifying that at the transport receiver configuration (in axis2.xml) and
selectively enable locking only for a set of services. To configure
global locking behavior, set this parameter in the axis2.xml under the
VFS transport receiver configuration.
<parameter name="transport.vfs.Locking">disable</parameter>
|
No | enable |
transport.vfs.Streaming |
If this parameter is set to true, the transport will attempt to use a
javax.activation.DataSource (instead of a java.io.InputStream ) object
to pass the content of the file to the message builder. Note that this
is only supported by some message builders, e.g. for plain text and binary.
This allows processing of the message without storing the entire content in memory.
It also has two other side effects:
<parameter name="transport.vfs.Streaming">true</parameter>
|
No | false |
transport.vfs.MaxRetryCount |
If the file transport listener encounters an error while trying to
read a file, it will try to read the file again after some time. This
parameter sets the maximum number of times the listener should retry
before giving up. Use the transport.vfs.ReconnectTimeout
parameter to set the time duration between retries.
<parameter name="transport.vfs.MaxRetryCount">3</parameter>
|
No | 3 |
transport.vfs.ReconnectTimeout |
The amount of time (in seconds) the current polling task should be
suspended for after a failed attempt to resolve a file.
<parameter name="transport.vfs.ReconnectTimeout">30000</parameter>
|
No | 30 |
transport.vfs.FailedRecordsFileName |
Once a file has been fully processed, it will be moved to a new
location or deleted. If this operation fails, a log entry with the
failure details can be written to a separate log file. This parameter
controls the name of this failure log file.
<parameter name="transport.vfs.FailedRecordsFileName">move-errors.txt</parameter>
|
No | vfs-move-failed-records.properties |
transport.vfs.FailedRecordsFileDestination |
Once a file has been fully processed, it will be moved to a new
location or deleted. If this operation fails, a log entry with the
failure details can be written to a separate log file. This parameter
controls the location (directory path) of this failure log file. To set
the name of the log file use the transport.vfs.FailedRecordsFileName
parameter.
<parameter name="transport.vfs.FailedRecordsFileDestination">logs/</parameter>
|
No | repository/conf |
transport.vfs.FailedRecordNextRetryDuration |
When a move operation has failed, the operation will be retried after this
amount of time (configured in milliseconds).
<parameter name="transport.vfs.FailedRecordNextRetryDuration">5000</parameter>
|
No | 3000 |
transport.vfs.MoveAfterFailedMove |
The destination to move the file after a failed move attempt.
<parameter name="transport.vfs.MoveAfterFailedMove">repository/move-errors</parameter>
|
No | N/A |
transport.vfs.MoveFailedRecordTimestampFormat |
The time stamp format to use when reporting failed move operations in
the log.
<parameter name="transport.vfs.MoveFailedRecordTimestampFormat">HH:mm:ss</parameter>
|
No | dd/MM/yyyy/ HH:mm:ss |
The file transport sender allows writing outgoing messages to local or remote files. To activate the file transport sender, simply uncomment the following transport sender configuration in the SYNAPSE_HOME/repository/conf/axis2.xml file.
To send a message using the file transport, define a Synapse endpoint with an address that starts with the prefix 'vfs:'. The rest of the address should be a valid local or remote file URI. An example is shown below:
Some more example file URIs are listed below. Remember to prefix each URI with the string 'vfs:' when using these to define Synapse endpoints. Refer http://commons.apache.org/vfs/filesystems.html for a complete list of Commons VFS supported protocols and their corresponding URI formats.
By default file locking is globally enabled for the file transport sender. This behavior can be overridden at the endpoint level by specifying transport.vfs.Locking as a URL query parameter with the appropriate value (enable/disable) on a given endpoint:
You may also change the global locking behavior by setting the transport.vfs.Locking parameter in the file transport sender configuration in axis2.xml file.
When writing to remote file locations using a protocol such as FTP, you might want Synapse to communicate with the FTP server in the passive mode. To configure this behavior, simply add the query parameter vfs.passive to the endpoint address:
When the file transport sender encounters an error while trying to write a file, it can retry after some time. This is useful to recover from certain types of transient I/O errors and network connectivity issues. Following parameters can be configured as URL query parameters on the file (vfs) endpoints to make use of this feature.
Parameter Name | Description/Example | Required | Default |
---|---|---|---|
transport.vfs.MaxRetryCount | Maximum number of retries to perform before giving up. | No | 3 |
transport.vfs.ReconnectTimeout | Time duration (in seconds) between retry attempts. | No | 30 |
The file transport sender does not write file content atomically. Therefore a process reading a file updated by Synapse, may read partial content. To get around this limitation, the temporary file support can be activated on the target file (vfs) endpoint:
This forces the file transport sender to write the data to a temporary file and then move the temporary file to the actual destination configured in the file endpoint. On most operating systems (e.g. Unix/Linux, Windows), this delivers the desired atomic file update behavior. When the file endpoint points to a remote file system, the temporary files will be created on the remote file system, thus preserving the atomic update behavior.
When updating an existing file, the file transport sender usually overwrites the old content. To get append behavior instead, set transport.vfs.Append parameter on the target endpoint:
It should be noted that by its nature, the file transport sender doesn't support synchronous responses and should only be invoked using the out-only message exchange pattern. In a Synapse mediation (sequence/proxy/API), this can be forced using the following mediator:
To avoid man-in-the-middle attacks, SSH clients will only connect to hosts with a known host key. When connecting for the first time to an SSH server, a typical command line SSH client would request confirmation from the user to add the server and its fingerprint to the list of known hosts.
The VFS transports supports SFTP through the JSch library and this library also requires a list of known hosts. Since Synapse is not an interactive process, it can't request confirmation from the user and is therefore unable to automatically add a host to the list. This implies that the list of known hosts must be set up manually before the transport can connect.
JSch loads the list of known hosts from a file called known_hosts in the .ssh sub-directory of the user's home directory, i.e. $HOME/.ssh in Unix and %HOMEPATH%\.ssh in Windows. The location and format of this file are compatible with the OpenSSH client.
Since the file not only contains a list of host names but also the fingerprints of their host keys, the easiest way to add a new host to that file is to simply use the OpenSSH client to open an SSH session on the target host. The client will then ask to add the credentials to the known_hosts file. Note that if the SSH server is configured to only allow SFTP sessions, but no interactive sessions, the connection will actually fail. Since this doesn't rollback the change to the known_hosts file, this error can be ignored.
The VFS listener will start reading a file as soon as it appears in the configured location. To avoid processing half written files, the creation of these files should be made atomic. On most platforms this can be achieved by writing the data to a temporary file and then moving the file to the target location. Note however that a move operation is only atomic if the source and destination are on the same physical file system. The location for the temporary file should be chosen with that constraint in mind.
It should also be noted that the VFS transport sender doesn't create files atomically. Use the transport.vfs.UseTempFile endpoint parameter to get around this issue.