Be Careful with file URLs |
There are (at least) five ways to name files:
The platform-specific notation, called pathnames here
(e.g., /abc/def/ghi.txt
on Unix,
a:\bcd\efg\hij.txt
on DOS and Windows, and
abc:def:ghi.txt
on Macintosh).
A UNC-like notation, called UNC names here (e.g.,
//./abc/def/ghi.txt
or
//./a:/bcd/efg/hij.txt
). The osl layer used to make
heavy use of these as a platform-independent notation, but since osl
has shifted to file URLs as the platform-independent notation (see
below), UNC names have been deprecated and became pretty much useless
(and are only mentioned here for completeness).
The file URLs used by the osl layer as a platform-independent
notation, called osl URLs here (e.g.,
file:///abc/def/ghi.txt
or
file:///a:/bcd/efg/hij.txt
). Read on to learn why it is
important to explicitly label these file URLs as osl
URLs.
The file URLs used by the File Content Provider (FCP) within the
Universal Content Broker (UCB), called FCP URLs (e.g.,
file:///home/usr123/work/abc.txt
or
file:///user/work/abc.txt
). Normally, osl URLs and FCP
URLs are the same (after all, the FCP uses osl to access the files).
But the FCP has a feature called mount points that allows it
to restrict access to only certain files (those that lie below a given
set of mount points in the file system hierarchy), and to give names
to these files that hide their real locations.
For example, if you have a mount point named user
at
the osl URL file:///home/usr123
, the osl URL
file:///home/usr123/work/abc.txt
corresponds to the FCP
URL file:///user/work/abc.txt
. If you only have that
single mount point, the osl URL
file:///home/usr567/work/def.txt
has no corresponding FCP
URL (and cannot be accessed via the FCP).
The URLs used by the UCB, called UCB URLs (e.g.,
file:///a:/bcd/efg/hij.txt
or
vnd.sun.star.wfs:///user/work/abc.txt
). Normally, FCP
URLs and UCB URLs are the same, because the UCB hands file URLs
directly to the FCP. But there is a special content provider, the
Remote Access Content Provider (RAP), that allows to rewrite URLs
before passing them on to other content providers. This is used, for
example, in the Sun ONE Webtop (S1W), where there are typically two
file systems: a client file system accessed via normal (FCP) file URLs
(i.e., there is no rewriting RAP between the UCB and the client FCP),
and a server file system accessed via (FCP) URLs where the
file
scheme has been replaced with
vnd.sun.star.wfs
(i.e., there is a rewriting RAP between
the UCB and the server FCP).
The last two notations (FCP URLs and UCB URLs) are relatively unknown, because in a plain OpenOffice installation neither mount points nor the RAP are used, so that osl URLs, FCP URLs and UCB URLs are all identical. But when you want to write correct code that also works in unusal deployments (or in the S1W, which should be regarded not too unusal), you have to be well aware of these different notations all labeled as "URLs."
As mentioned before, use of UNC names is deprecated. Also, since most code accesses the FCP not directly, but via the UCB, FCP URLs are only of interest to hard core UCB users (who should know what they are doing, anyway). So, in the following we can concentrate on three different notations: pathnames, osl URLs, and UCB URLs.
Pathnames are used in only a few places, because the default notation used by osl (the lowest level of concern to us) already are osl URLs (which are a level above pathnames). It can be argued that interfaces that use pathnames should use osl URLs instead, and that pathnames are only of interest when communicating with the external world (other processes, or the human user).
One place where pathnames are used is class utl::TempFile
.
The osl file system functions (in osl/file.h
and
osl/file.hxx
) now generally use osl URLs in their interfaces.
There should be few places above osl where osl URLs instead of UCB URLs are used (because generally all file access should be done through the UCB, and not directly via osl). One notable exception is the handling of temporary files (see above).
Generally, all interfaces that are designed to communicate resource names within the OpenOffice framework should use UCB URLs, and all implemenations that access resources by these names should do so via the UCB. Another advantage of this is that without any extra effort not only file resources can be accessed, but also other resources like HTTP and FTP (by using appropriate URLs, but these URLs can be opaque to the code, only interpreted by the UCB).
Sometimes it may be necessary to convert between different notations, and the routines to do so are well available:
The methods osl::FileBase::getFileURLFromSystemPath()
and osl::FileBase::getSystemPathFromFileURL()
(and their
plain C counterparts in osl/file.h
) convert between
pathnames (called "system paths" here) and osl URLs.
The methods
utl::LocalFileHelper::ConvertSystemPathToURL()
and
utl::LocalFileHelper::ConvertURLToSystemPath()
convert
between pathnames (again called "system paths" here) and UCB URLs.
Because there can be scenarios where you have multiple FCPs on
different file systems, it can be ambigious how to convert from a
pathname (that does not contain any information identifying a specific
file system) to a UCB URL. Therefore,
ConvertSystemPathToURL()
requires an additional parameter
BaseURL
that identifies the FCP to be used.
There are convenience methods
utl::LocalFileHelper::ConvertPhysicalNameToURL()
and
utl::LocalFileHelper::ConvertURLToPhysicalName()
that
choose the local FCP as BaseURL
and then forward
to the above LocalFileHelper
methods.
For this to work, the UCB maintains a notion of locality
of content providers. This is an heuristic algorithm based on how the
UCB accesses individual content providers (within the same process,
via a pipe on the same machine, via a socket over a network). The net
effect is that the UCB should always choose as most local the FCP
running on the same machine as the UCB, and using these
LocalFileHelper
methods will then always convert between
UCB URLs and pathnames that are valid on this machine.
ConvertURLToPhysicalName()
also makes sure to do the
conversion only if the given UCB URL corresponds to a local pathname
(and not to a pathname on a non-local file system).
There is no direct way to convert between osl URLs and UCB URLs. To
convert from an osl URL to a UCB URL, use
osl::FileBase::getSystemPathFromFileURL()
followed by
utl::LocalFileHelper::ConvertPhysicalNameToURL()
. To convert
from a UCB URL to an osl URL, use
utl::LocalFileHelper::ConvertURLToPhysicalName()
followed by
osl::FileBase::getFileURLFromSystemPath
. But be aware that this
only works if the osl URL and the UCB URL shall denote files within the same
file system.
Author: Stephan Bergmann (Last modification $Date: 2003/12/06 22:37:31 $). Copyright 2001 OpenOffice.org Foundation. All Rights Reserved. |