Use of WebDAV in Subversion

This document details how WebDAV is used within the Subversion product. Specifically, how the client side interfaces with Neon to generate WebDAV requests over the wire, and what the server must do to map incoming WebDAV requests into operations against the Subversion repository. Note that the server side is implemented as an Apache 2.0 module, operating as a back-end for its mod_dav functionality.

This document heavily refers to the Subversion design document and the latest Delta-V protocol draft. Details of those documents will not be replicated here.

NOTE: Subversion uses DeltaV for its communication, but the Subversion client is not a general-purpose DeltaV client. In fact, it expects some custom features from the server. Further, the Subversion server is not a general-purpose DeltaV server. It implements a strict subset of the DeltaV specification. A WebDAV or DeltaV client may very well be able to interoperate with it, but only if that client operates within the narrow confines of those features the server has implemented.
Version 2.0 of Subversion will address WebDAV interoperability (Class 1, Class 2, and DeltaV features). Each of the custom features expected by the client actually has an alternate mechanism available in DeltaV, but in a much less efficient form.
It is expected that Version 1.0 will support read-only, Class 1 WebDAV clients. Any "low-hanging fruit" to increase DeltaV interoperability will be considered.

Basic Concepts

Subversion uses a tree-based format to describe a change set against the repository. This tree is constructed on the client side (by "walking" over the working copy) to describe the change. The tree is marshalled to the server as a linear sequence of changes to be applied to the repository. The repository can accept changes in a random-access manner, so the mapping from a tree to a set of changes works very well for the repository.

Subversion provides properties on files, directories, and even the abstract concept of a revision. Each of the operations involving properties are mapped directly to WebDAV properties, which are manipulated with the PROPFIND and PROPPATCH HTTP methods. Revisions are modeled as DeltaV baselines, so revision properties are available through a PROPFIND on the baseline.

The Subversion server can efficiently compute deltas between two revisions (these deltas are complete tree deltas, not simple text deltas). DeltaV does not have a direct analogue for the tree delta concept. A client could discover changes by issuing a sequence of PROPFIND requests on the various WebDAV resources, but this would be a time-consuming operation, involving many requests. Instead, Subversion marshals this concept as a custom WebDAV report. Using this report, the client learns which items in the working copy are out of date and can issue GET and PROPFIND methods to fetch the new data.

Tags and branches are simple copies in Subversion, which are handled with the WebDAV COPY.

need to talk about copies somewhere. need to discuss how copy history is retained (svn does it automatically, but interop with other servers may require us to set a custom property on those servers.

DeltaV Concepts Used by Subversion

Subversion uses many of the DeltaV concepts, as listed below. Note that many of these concepts are not fully implemented by Subversion; we have implemented enough to meet our needs, but little more.

Baseline
further info to come...

Activity
further info to come...

Version Resource
further info to come...

Version-Controlled Configuration
further info to come...

Baseline Collection
further info to come...

Version-Controlled Resource
further info to come...

Working Resource (Feature)
further info to come...

Merge Feature
further info to come...

Label Feature
further info to come...

Version-Controlled-Collection Feature
further info to come...

Subversion Projects as URLs

The very first concept to define is how a project is exposed to the client. Subversion will expose all projects as URLs on a server. The files and subdirectories under this project will be exposed through the URL namespace.

For example, let us assume that we have a project named "example". And let us say that this project will be exposed at the URL: http://subversion.tigris.org/repos/example/.

This mapping is set up through a set of configuration parameters for the Apache HTTP Server (which is hosting the Subversion code and the particular project in question). The configuration could look like:

<Location /repos/example>
    DAV svn
    SVNPath /home/svn-projects/example
</Location>

Files and directories within the project will be directly mapped into the URL namespace. For example, if the project contains a file "file.c" in a subdirectory "sub", then the URL for that file will be http://subversion.tigris.org/repos/example/sub/file.c.

Initial Checkout

When the user performs the initial checkout of a Subversion project, the client will issue a series of PROPFIND and GET requests. These requests will traverse the repository, pick up some necessary metadata, and then fetch the latest revision.

describe the OPTIONS request for fetching the activity collection set. describe the sequence of PROPFINDs to reach the baseline collection.

(moved here from below; need to rewrite)
When the initial checkout was performed, Subversion fetched the DAV:activity-collection-set value and stored it as a property on each directory in the working copy. property for each collection. This property lists all of the locations on the server where an activity may be created. The first of these locations will be stored on the client for use during the commit process.

Should probably describe the metadata we fetch, and how a checkout of "not the latest" (e.g. by date or revision) will work.

Committing a Change

Subversion commits are modeled using the "activity" concept from DeltaV. An activity can be viewed as a transaction for a set of resources.

Creating the activity

At commit time, the Subversion client will retrieve the stored DAV:activity-collection-set value to know where it should create the activity. Next, the client will generate a UUID (a unique value) to use for the activity's location. Finally, the client will issue a MKACTIVITY method request, where the Request-URL is composed from the activity location and the UUID. This request will construct an activity to hold all of the changes for the commit.

Abbreviated summary:

At checkout time:
Request: OPTIONS for DAV:activity-collection-set
Response: http://www.example.com/repos/foo/$svn/act/
At commit time:
Request: MKACTIVITY http://www.example.com/repos/foo/$svn/act/01234567-89ab-cdef-0123-456789abcdef
Response: 201 (Created)

The CHECKOUT method can specify an activity to use upon checkout. This feature is used to associate all items with the newly-created activity.

Storing the commit message

talk about checking out the baseline and applying a PROPPATCH to the working baseline.

Mapping changes to WebDAV

A change set in Subversion is specified with a "tree delta" (see the SVN design for more details on the changes that can be placed into a tree delta). The tree delta will be unravelled into a set of requests. These requests will be one of the following forms:

Delete file or directory
These changes are mapped onto a DELETE operation. The version resource of the target's parent collection is checked out using the CHECKOUT method (into the current activity). The target (name) is then deleted from the resulting working collection using the DELETE method.

Add file
This is modeled by performing a CHECKOUT of the version resource of the target's parent collection. The new file is created within the resulting working collection using a PUT request. Properties are applied using PROPPATCH.

Add directory
This is modeled by performing a CHECKOUT on the version resource of the target's parent collection. The new directory is created within the resulting working collection with a MKCOL request. Properties are applied using PROPPATCH.

Add file or directory, with previous ancestory (a copy)
need to fix this section

A tree delta can specify that a file/directory originates as a copy of another file/dir. This copy may be further modified by additional elements the tree delta.

This change will be modeled by performing a CHECKOUT on the version resource of the parent collection which will contain the new resource. The VERSION-CONTROL method will create a new version-controlled resource (VCR) within the working collection, with the VCR's DAV:checked-in property referring to the ancestor's version resource.

Note: it appears that we will use COPY to copy the appropriate resource into the working collection. This will create a new version history which is then placed into the working collection. The version history will use the DAV:precursor-set property to specify the version resource of the ancestor.

Because a version resource does not specify the revision, it will not be possible to COPY a version resource into the working collection -- it will not tell us what revision was copied. Instead, we will most likely copy a version resource out of the appropriate baseline. This implies the client must be able to map from a URL/revision pair to a baselined version resource URL.

The second issue is whether/how we set the DAV:precursor-set property of the version history. Or, more precisely, how we synthesize the value from information stored in the repository. This is still under investigation.

Replace file/dir by another file/dir
This change does not have a WebDAV modeling because tree deltas model it as two, sequential operations: a delete, followed by an add.

Moving a file or directory
This change does not have a WebDAV modeling because tree deltas model it as two, distinct operations: a delete, and an add with previous ancestry.

Replace file
This is modeled with a CHECKOUT on the target's version resource, followed by a PUT to the resulting working resource.

Replace directory
In Subversion terms, "replace directory" means that additions, deletions, and other changes will occur within the directory. Each of these changes are modeled individually, and the change to the directory is performed implicitly. Therefore, this "change" has no particular mapping into WebDAV.

Property delta
A property delta (against a file or directory) maps directly to a PROPPATCH in WebDAV terms. The target's version resource will be checked out using CHECKOUT and the PROPPATCH will be applied to the resulting working resource.

Final Commit

The final action of the commit process is to issue a MERGE request to the Subversion server, specifying that the activity (created earlier) be checked in and the corresponding version-controlled resources be updated to refer to the new version resources.

the comment below is not quite right. talk about the working baseline, and how that is used to create a new baseline (with the commit message on it)

The version-controlled resources are also baseline-controlled, which means that updates to them will automatically create a new baseline. In essence, the commit will create a new baseline corresponding to the new Subversion revision.

Example

Warning: this section has not been updated to reflect some recent changes to the SVN-to-DAV mapping. Consider it out of date until this warning is removed.

Consider the following set of operations and its corresponding tree delta (taken from the SVN design document):

  1. rename /dir1/dir2 to /dir1/dir4,
  2. rename /dir1/dir3 to /dir1/dir2, and
  3. move file3 from /dir1/dir4 to /dir1/dir2.
<tree-delta>
  <replace name='dir1'>
    <directory>
      <tree-delta>
        <replace name='dir2'>
          <directory ancestor='/dir1/dir3'>         (1)
            <tree-delta>
              <new name='file3'>                    (2)
                <file ancestor='/dir1/dir2/file3'/>
              </new>
            </tree-delta>
          </directory>
        </replace>
        <delete name='dir3'/>                       (3)
        <new name='dir4'>                           (4)
          <directory ancestor='/dir1/dir2'>
            <tree-delta>
              <delete name='file3'/>                (5)
            </tree-delta>
          </directory>
        </new>
      </tree-delta>
    </directory>
  </replace>
</tree-delta>

Walking through this delta, we map out the WebDAV requests listed below. The numbers in the above delta roughly correspond to the numbered entries below. The correspondence is not exact because a specific, resulting behavior is typically based on a combination of a few elements in the delta.

  1. The <directory ancestor="/dir1/dir3"> specifies that we are overwriting /dir1/dir2 with /dir1/dir3.

    CHECKOUT /dir1/dir2/
    (returns a working resource URL for the directory)

    COPY /dir1/dir3/
    Destination: http://www.example.com/$svn/wrk/.../
    Overwrite: T

  2. /dir1/dir2/file3 is new (since we just overwrote the original dir2 directory), and originates from /dir1/dir2/file3. Thus, we simply COPY the file into the target directory's working resource:

    COPY /dir1/dir2/file3
    Destination: http://www.example.com/$svn/wrk/.../file3

  3. CHECKOUT /dir1/dir3/
    (returns a working resource URL for the directory)

    DELETE /$svn/wrk/.../

  4. We are going to creating a new subdirectory (dir4) in the /dir1 directory. Since we don't have /dir1 checked out yet, we do so:

    CHECKOUT /dir1/
    (returns a working resource URL for the directory)

    And now we copy the right directory into the new working resource:

    COPY /dir1/dir2/
    Destination: http://www.example.com/$svn/wrk/.../dir4/

  5. The COPY created a complete set of working resources on the server, so we simply delete the part that we don't want:

    DELETE: /$svn/wrk/.../dir4/file3

URL Layout

The Subversion server exposes repositories at user-defined URLs. For example, the "foo" repository might be located at http://www.example.com/repos/foo/. However, the server also requires a number of other resources to be exposed for proper operation. These additional resources will be associated with each repository in a location under the main repository URL. By default, this location is "$svn". It may be changed by using the SVNSpecialURI directive:

<Location /repos/foo>
    DAV svn
    SVNPath /home/svn-projects/foo
    SVNSpecialURI .special
</Location>

Underneath the location specified by SVNSpecialURI, we will expose several collections. Assuming we use the default of "$svn", the collections are:

$svn/act/
This area is where activity resources are created. The client will pick a unique name within this collection and issue a MKACTIVITY for that URL. The client will then use the activity in further interactions.

No methods are allowed on the $svn/act/ resource.

Note: actually, we may want to allow a PROPFIND with a Depth: 1 header to allow clients to enumerate the current activities.

Only a subset of methods are allowed on the activities within the collection. They are: PROPFIND, MERGE (commit the activity), and DELETE (abort the activity).

Per the Delta-V specification, all activity resources will have a DAV:resourcetype of DAV:activity.

$svn/his/
do something with this section; we actually don't use version history resources. in the future, they might be modeled like this

This collection contains the version history resources for files and directories in a project. Its internal layout is completely server-defined. Clients will receive URLs into this collection (or a subcollection) from various responses.

No methods are allowed on the $svn/his/ resource.

Internally, the URL namespace is laid out with URLs of the following form:

$svn/his/node-id

The node-id is an internal value that Subversion uses to reference individual files and directories. This node-id is a single integer defined by the Subversion repository. Note that this is an undotted node id, which is the base for the entire history of a given node in the repository.

The DAV:resourcetype of the node-id collection is DAV:version-history.

Note: the above information is probably not quite correct. The issue of linking one version history to another is still open. Further, I think that node 73 and node 73.4.1 are each version histories (where the latter is linked to the former). 73.x and 73.4.1.x are the versions within the version history.

$svn/ver/
This collection contains the version resources for the project.

No methods are allowed on the $svn/ver/ resource.

The layout of this collection is internal to the server. For reference purposes here (and to describe the implementation), it is laid out as:

$svn/ver/node-id/path

Only read-only methods are allowed against these resources (e.g. GET, PROPFIND, REPORT); all other methods are illegal.

The DAV:resourcetype of a version resource is simply the value of the resource at checkin time (e.g. <D:resourcetype/> or <D:resourcetype><D:collection/></D:resourcetype>).

$svn/wrk/
This collection contains working resources for the resources that have been checked out with the CHECKOUT method. The form and construction of this collection is server-defined, but is also well-defined so that clients may interact properly with collection versions that have been checked out.

No methods are allowed on the $svn/wrk/ resource.

For reference purposes, the working resource URLs are constructed as:

$svn/wrk/activity/path

Any method is allowed on the working resources, but no methods are allowed on any of its parents.

The DAV:resourcetype of the working resources follows normal resource typing: <D:resourcetype/> for regular working resources, and <D:resourcetype><D:collection/></D:resourcetype> for working collections.

$svn/vcc/
This section is not yet complete.

version-controlled configuration...

$svn/vcc/root as a singleton.

$svn/bln/
This section is not yet complete.

baselines...

$svn/bln/rev/

$svn/wbl/
This section is not yet complete.

working baseline...

$svn/bc/
This section is not yet complete.

baseline collection...

Property Management (and History/Log Reporting)

this section needs to be reworked. the properties occur on the FS revisions (and exposed via baselines).

As mentioned before, Subversion properties map onto WebDAV properties. For history/log reporting, the following WebDAV properties will be applied to each baseline (a Subversion revision) and to each version resource created by the revision. Since these resources are all version resources, the properties below are read-only.

DAV:comment
This is the standard (dead) property for specifying a checkin comment.

DAV:creator-displayname
This is a (dead) property that is generated from Subversion's concept of the "user" who made a particular change.

DAV:creationdate
This is a read-only live property created by the server at commit time.

The history for a specified file will be generated using the REPORT method and a DAV:property-report report. A typical history will fetch the three properties mentioned above for each version of the file/directory.

Based on the client design, it may be important to specify other read-only live properties for information about versions. For example, how many lines were added/removed in a particular checkin for a file? Creating these live properties will be quite straight-forward, and driven by the client design over time.

Note: if we do this, however, then we'd end up tying the client to the server. Of course, if the client were run against another DeltaV server which didn't report these properties, then we'd simply not display them in the UI. (e.g. graceful degradation of functionality)

Fetching Status and Updates

After the initial checkout, the client can request a status report (what has been changed on the client, pending a commit; what has been changed on the server, pending an update). The update process is similar, except that we also fetch the changes from the server.

The local changes can be handled entirely on the client side. The Working Copy library can easily handle the detection and reporting of these changes. We're concerned with efficiently detecting what has changed on the server.

While it would be possible to traverse the repository, fetching the current state, and comparing that to the client state, it would not be efficient. The Subversion design enables the server to easily compute what has changed (relative to the client), if it is given a description of the client state.

The core of the status and update commands is based on a custom Subversion-specific WebDAV report. This custom report will transmit the state of the working copy to the server, and the server response will specify which resources will need to be updated (fetched).

The request is a standard REPORT request, with a custom XML body. The body will use the standard Subversion technique of reporting a top-level revision number, and then only reporting children that have different revisions. The result of the report will use the same technique of reporting only the resources where a change is found. If a change is found, the server will provide a URL to the version resource to fetch for the changed resource. The server will also report the current revision number.

The XML DTDs for the request and response are TBD.

The custom report will tie the client to only those servers which support the report, but a future version of the software will contain a fallback codepath, a graceful degradation, to support other DeltaV servers.

When an updated is performed, the client will fetch each of the URLs (using GET requests) provided in the server response.

GET (and PUT) operations will transfer content in a "diff" format when possible. The mechanics of this will follow the Internet Draft, titled Delta Encoding in HTTP.

Entity Tags (etags)

Etags are required to be unique across all versions of a resource. Luckily, this is very easy for a version control system. Each etag will be simply be the repository's node-id for the resource.

Etags are used to generate diffs, following the guidelines in the aforementioned draft: Delta Encoding in HTTP. The problem then becomes how to get the etag for each file stored on the client (we don't need etags for directories since we never fetch them). During a checkout or update process, this is easy: the etag is provided in the HTTP response headers for each file retrieved.

The other part of the problem is getting the etag after a commit has occurred. The MERGE response provides a way to request properties from the version resources which are created as part of the checkin of the activity. The etag (and other properties) can be fetched using that mechanism.

Tags and Branches

Tags and branches within Subversion are performed by copying from one area to another. For example:

[.../src/my-project]$ svn cp trunk tags/1.0.3-rc4
[.../src/my-project]$ svn commit

In the above example, tags/1.0.3-rc4 should now be considered readonly and will always reflect the status of trunk.

These copies are handled just like a regular commit. An activity is created with MKACTIVITY, a working resource is created via CHECKOUT (for the target directory; tags/ in our example above), and then a COPY is performed. The activity is then merged back into the repository with a MERGE request.

Server Requirements

Warning: this section is out of date. The DeltaV draft has gone through a number of revisions, and our use of DeltaV has changed some.

DAV Methods

The server will need to implement the following WebDAV methods for proper operation:

The following methods are not required by Subversion at this time:

DAV Properties

The following DeltaV properties will be implemented:

Contrary to the DeltaV specification, the following required properties will not be implemented:

OPTIONS

The OPTIONS request will signal that it supports the following DAV features:

Reports

The DAV:supported-report-set property will signal support for the following reports:

These reports are available only on the "public" resources (the VCRs). They are not available on the resources within the $svn/ area.

Notes, reminders

Discuss timeouts and auto-purge of activities (and the related working resources).
Discuss the activity database maintained by mod_dav_svn.
Discuss other implementation details of ra_dav and mod_dav_svn.

Appendix A: Rationale

Several times, people have asked, "Why choose HTTP/WebDAV/DeltaV? That seems awfully bloated and ill-suited. Why didn't you design a custom, well-tuned protocol? Or maybe use the CVS protocol?" Listed below are a number of reasons for our choice of WebDAV as our network protocol.

While this list could certainly be expanded with more reasons (and to be fair, with a list of reasons why WebDAV was a poor choice), it certainly demonstrates the basic reasons for our choice.

Note: this list came from an email note, so the tone and point of view might be a bit off. Further word-smithing is welcome...

Builtin web browsing of the repository

For example, take a look at: http://svn.apache.org/repos/asf/subversion/trunk/README (that's the HEAD right there; we also have URLs for every previous revision of every file)

DAV-based browsing

Use Web Folders or WebDrive or somesuch on your Windows box (or Windows XP's native DAV mounts) to browse the SVN repository with Windows Explorer. Mac OS X has builtin DAV server mounting. Nautilus has DAV capabilities. Then you have your Open Source tools such as cadaver, Goliath, etc.

People can use existing libraries

I couldn't even begin to count the number of HTTP tools and libraries available. If we had designed our own protocol, then we would have /none/ of those benefits. Heck, two HTTP library implementors (Joe Orton of Neon, and Daniel Stenberg of CURL) are regulars here. we wouldn't get that benefit. I've used Python's httplib (and a davlib of my own) to do a lot of testing of our server. No need to go and roll new protocol libraries.

Existing tools

One word: Ethereal :-) When we capture network traces, Ethereal already knows about HTTP. It's quite nice, but I know there are even better ones out there. But we also have other tools like squid and other (caching) proxies (see the next item).

Caching proxies

Subversion will work great with caching proxies. There is no longer a need for specialized tools like "cvsup". Just drop in a caching proxy, and you've already got your distributed read-only repository. That European dev team can just drop in the cache between them and the US server and their checkouts/updates will get cached for the benefit of the other team members. Commits will flow through, back to the US-based server.

Sophisticated and broad-choice authentication

We don't have to reimplement an authentication scheme for a new protocol. We can use all of the various schemes that have been defined for HTTP. Ever look at the CVS protocol? Ever see the "I Love You" or "I Hate You" lines? :-) That is all part of creating a new authentication scheme. But we get to use SSL and certificate-based auth if we want. Kerberos. NTLM. or even just simple Basic or Digest. And our users can come from text files, database, LDAP, or PAM. We don't have to reinvent the wheel cuz it is all available for Apache already.

Awesome network server

We don't have to worry about how to portably set TCP_CORK for optimal network packets. We don't have to worry about when sendfile() makes sense, or if it is available. We don't have to worry about dropped client connections, how to best use threads and processes to scale, request management, monitoring, logging, etc. Apache gives us all of that and a ton more. I *really* would not want to do that through xinetd. I mean... setting TCP_CORK on stdout? freaky :-)

Well-defined on-wire compression

We already have on-wire compression, similar to CVS's "-z#" switch. And we didn't do anything. The client library and server that we use just support it automatically for us, according to RFC 2616.

Future interoperability

In the future, we'll be able to interoperate with a multitude of IDEs and other WebDAV/DeltaV clients. As DeltaV becomes more prevalent, IDEs could very well use it for source code management, and we'll be right there without needing to write some MS/SCC library to interface to the tool.


Greg Stein
Last modified: Fri Jan 25 12:54:20 PST 2002