eZ component: Webdav, RFC overview, 1.1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :Author: Tobias Schlitt :Revision: $Rev$ :Date: $Date$ :Status: Draft .. contents:: Scope ~~~~~ This document tries to summarize the major points of `RFC 2518`_ (WebDAV) and the associated `RFC 2616`_ (HTTP/1.1) in respect to distributed editing. The main points here are the support of entity tags and locking. This document only summarizes the RFCs and contains small design related hints here and there. The main design for the functionality analyzed here is contained in the design-1.1.txt file. .. _`RFC 2518`: http://www.ietf.org/rfc/rfc2518.txt .. _`RFC 2616`: http://www.ietf.org/rfc/rfc2616.txt Entity Tags ~~~~~~~~~~~ Entity tags are generally used in the HTTP/1.1 protocol to provide a mechanism of validating that a resource is in the same state. Whenever the state of a resource changes, its entity tag needs to change. In following, the definition of the HTTP/1.1 validation mechanisms in general, the definition of an entity tag and the definition of the ETag-header are described. In addition the usage of entity tags in the Webdav RFC is described. Section 3.11 of the HTTP/1.1 RFC describes entity tags. These strings identify (tag) the state of a resource, named "entities" in the RFC. The entity tag consists of a quoted string and an optional weakness modifier. The quoted string must be unique for each state. :: entity-tag = [ weak ] opaque-tag weak = "W/" opaque-tag = quoted-string A non-weak entity tag must identify a certain state uniquely. With the added W/ prefix, one and the same tag may identify different states of a resource that are semantically equivalent. Entity tags are used in HTTP/1.1 in combination with the following headers: - Request - If-Match - If-None-Match - If-Range - Response - ETag Since the Webdav component will generate the entity tag, we should ensure to only generate strong entity tags. Headers ======= The following section describes the headers that are affected by entity tags and how the server should respect them. If-Match -------- The If-Match header is generally used to make the method it is send with conditional. Only if the conditions defined by the header are met, the action associated with the method should be performed by the server. The header format is defined as follows (14.24): :: If-Match = "If-Match" ":" ( "*" | 1#entity-tag ) The If-Match header in general assumes that the affected resource exists, if it does not, the request must fail since no entity is there to compare the given criteria too (no entity exists). The header either specifies "*", to indicate that an entity must exist, whichever that is. Alternatively any number of entity tags can be given, divided by ",". If one of the given tags match the current state of the resource, the method is performed as if not If-Match header was given. Else the method must fail with 412 (Precondition failed). In case the request would have failed anyway (not result in a 2xx or 412 status), the If-Match condition is not even checked, but the error response generated by the request is returned. The reaction of a server to a combination of multiple If-* headers is undefined. .. Note:: We should just throw all If-* headers away if a combination of the occurs, so the back-end does not need to deal with it. Examples: :: If-Match: "xyzzy" If-Match: "xyzzy", "r2d2xxxx", "c3piozzzz" If-Match: * If-None-Match ------------- This header behaves similar to the If-Match header, except that the operation is only performed if none of the submitted entity tags matches. In case the operation is a GET or a HEAD operation and one of the tags matched, the server should return a 304 (Not Modified) code, in all other cases a 412 (Precondition Failed) must be returned. The header is defined like this: :: If-None-Match = "If-None-Match" ":" ( "*" | 1#entity-tag ) The '*' again means that any entity tag exists (as for If-Match). In case of the If-None-Match header, the operation will be executed only of no entity of the resource exists. In fact this means that the resource does not exist. Examples: :: If-None-Match: "xyzzy" If-None-Match: W/"xyzzy" If-None-Match: "xyzzy", "r2d2xxxx", "c3piozzzz" If-None-Match: W/"xyzzy", W/"r2d2xxxx", W/"c3piozzzz" If-None-Match: * If-Range -------- The If-Range header does only make sense to be respected, if the server supports partial GET requests and resuming of such. Since the Webdav component does not support this, yet, the header will be ignored. .. Note:: If we support partial GET sometimes, support for this header must be considered, too. ETag ---- The ETag header is send by a server to give the client an entity-tag that identifies the current state of a certain resource. This tag can later be used by the client with any of the headers described above. The ETag header is therefore a response header, while the headers described above are request headers. The ETag header is build up like this: :: ETag = "ETag" ":" entity-tag Examples:: ETag: "xyzzy" ETag: W/"xyzzy" ETag: "" .. Attention:: The following is an assumption that should be verified somehow. It seems that the ETag header is only defined for a single resource, which indicates that responses that affect multiple resources should not contain it. For Webdav this includes several methods like COPY and MOVE, but also PROPFIND with a Depth header other than 0. Validation ========== The purpose of entity tags is to ensure, that a certain operation is only performed, if a resource resides in a certain state. This state is defined by an entity tag. Most likely, this applies to the GET operation in combination with caching for pure HTTP/1.1. However, in combination with the WebDAV extension, validation of entity tags might also be necessary for operations like PUT and others. The HTTP/1.1 RFC defines 2 different validator schemes: Strong and weak validators. Entity tags are generally considered strong validators, since they should change as soon as the affected resource changes its state. However, the protocol provides a way to declare entity tags to be weak validators, as described in the ETag header above. .. Note:: We should not make use of this way of "weaking" entity tags, but always provide the strong method. Caches (like proxy servers and browser caches) use additional methods to validate their content. The most common way here is to use the Last-Modified header in addition, which indicates the last modification time of the resource. -- Note:: We need to decide if we should support this validator in addition. This would also involve more headers to react on, like If-Modified-Since. This is not mandatory. Locking ~~~~~~~ This section tries to summarize the important facts about locking mentioned in RFC 2518 in various places, enhanced by first pre-considerations of design and implementation issues associated with them. Lock types ========== The WebDAV RFC distinguishes between read and write locks, while only write locks are defined in detail. The RFC explains, that "the syntax is extensible, and permits the eventual specification of locking for other access types". A write lock determines that the locking principle may exclusively write to the affected resources. Reading is possibly for every other principle, too. If a principle that does not hold a specific lock tries to perform a writing operation to a resource which is locked by another principle, this operation must fail. Lock scopes =========== The WebDAV RFC specifies 2 different scopes for locking: - Exclusive - Shared For an exclusive lock exactly 1 principle may hold a lock on a specific resource and only this principle is allowed to perform the affected operations on the locked resource. For a shared lock it is possible that multiple principles take part in the lock (group editing). Every principle that takes part in a shared lock may perform the affected operations on the locked resource. WebDAV does not provide any channel to allow communication between the principles involved into a shared lock. The communication of these principles must be handled externally. The following table shows lock compatibility: +----------------------+-----------------+--------------+ | Current lock state/ | Shared Lock | Exclusive | | Lock request | | Lock | +----------------------+-----------------+--------------+ | None | True | True | +----------------------+-----------------+--------------+ | Shared Lock | True | False | +----------------------+-----------------+--------------+ | Exclusive Lock | False | False* | +----------------------+-----------------+--------------+ *Legend*: True = lock may be granted. False = lock MUST NOT be granted. \*=It is illegal for a principal to request the same lock twice. Lock tokens =========== A lock token identifies a specific lock uniquely across all resources for all times. Whenever a successful LOCK request was processed, it returns the specific lock token for this lock. The lock token associates the locking principle with the locked resource. Therefore multiple lock tokens might be assigned to a single resource, if the lock is a shared lock. A lock token must be unique throughout all resources for all times. The WebDAV RFC therefore defines a lock token scheme, which can optionally be used by the server. The so called "opaquelocktoekn" scheme makes use of UUID_, as defined in `ISO-11578`_. A `PECL package for UUIDs`_ is available. .. _`UUID`: http://en.wikipedia.org/wiki/UUID .. _`ISO-11578`: http://www.iso.ch/cate/d2229.html .. _`PECL package for UUIDs`: http://pecl.php.net/package/uuid Since the opaquelocktoken scheme is not mandatory, the code snippet :: $token = md5( uniqid( rand(), true ) ); could be used as an alternative to provide the necessary amount of uniqueness. To create a lock token that, a this way generated ID could be appended to the URI of the affected resource to provide transparency of the source of a lock token. For example: http://webdav/foo/bar.txt#. Every principle that can access the WebDAV server has access to lock tokens through the LOCKDISCOVERY request method, so the lock must be bound to a different authentication mechanism. An owner string is submitted with the LOCK request, which might help here. Affected requests ================= Locks affect several request, beside the explicitly lock related requests. The following 2 sections summarize the affected request methods and give a short overview about how these are affected. LOCK ---- The LOCK method is a new method, which needs to be supported. The request body of the LOCK method contains a dedicated XML element. Both, the request abstraction object and objects for the conent, already exist in the Webdav component. The method supports the Depth header, but only with the values 0 and INFINITY. The value 1 is not supported. 0 means that only the resource itself is affected and INFITY includes all descendant resources. For non-collection resources both mean the same, For collection resources 0 means that only the collection should be locked and INFINITY reciusively locks all descendants of the collection in addition to the collection itself. No Depth header means that INFINITY is asumed. A LOCK method must only return a single lock token for all resources locked with this request. If an UNLOCK method is successfully executed with this lock token, all affected resources must be unlocked. If a LOCK operation fails because there is a conflict with one of the resources to LOCK, the complete operation needs to fail (no partial success). The response code 409 (Conflict) must be returned, the body must be a multistatus XML element that contains the resource that is responsible for the conflict. Status codes returned by the LOCK method are: 200 (OK) - The lock request succeeded and the value of the lockdiscovery property is included in the body. 412 (Precondition Failed) - The included lock token was not enforceable on this resource or the server could not satisfy the request in the lockinfo XML element. 423 (Locked) - The resource is locked, so the method has been rejected. Example - Simple LOCK request: :: >>Request LOCK /workspace/webdav/proposal.doc HTTP/1.1 Host: webdav.sb.aol.com Timeout: Infinite, Second-4100000000 Content-Type: text/xml; charset="utf-8" Content-Length: xxxx Authorization: Digest username="ejw", realm="ejw@webdav.sb.aol.com", nonce="...", uri="/workspace/webdav/proposal.doc", response="...", opaque="..." http://www.ics.uci.edu/~ejw/contact.html >>Response HTTP/1.1 200 OK Content-Type: text/xml; charset="utf-8" Content-Length: xxxx Infinity http://www.ics.uci.edu/~ejw/contact.html Second-604800 opaquelocktoken:e71d4fae-5dec-22d6-fea5-00a0c91e6be4 Example - Refreshing LOCK request: :: >>Request LOCK /workspace/webdav/proposal.doc HTTP/1.1 Host: webdav.sb.aol.com Timeout: Infinite, Second-4100000000 If: () Authorization: Digest username="ejw", realm="ejw@webdav.sb.aol.com", nonce="...", uri="/workspace/webdav/proposal.doc", response="...", opaque="..." >>Response HTTP/1.1 200 OK Content-Type: text/xml; charset="utf-8" Content-Length: xxxx Infinity http://www.ics.uci.edu/~ejw/contact.html Second-604800 opaquelocktoken:e71d4fae-5dec-22d6-fea5-00a0c91e6be4 .. Note:: The server must not honor the principles Timeout header! Example - Multi resource LOCK Request: :: >>Request LOCK /webdav/ HTTP/1.1 Host: webdav.sb.aol.com Timeout: Infinite, Second-4100000000 Depth: infinity Content-Type: text/xml; charset="utf-8" Content-Length: xxxx Authorization: Digest username="ejw", realm="ejw@webdav.sb.aol.com", nonce="...", uri="/workspace/webdav/proposal.doc", response="...", opaque="..." http://www.ics.uci.edu/~ejw/contact.html >>Response HTTP/1.1 207 Multi-Status Content-Type: text/xml; charset="utf-8" Content-Length: xxxx http://webdav.sb.aol.com/webdav/secret HTTP/1.1 403 Forbidden http://webdav.sb.aol.com/webdav/ HTTP/1.1 424 Failed Dependency UNLOCK ------ The UNLOCK method handles the removal of a lock established via the LOCK method. A lock may also disappear by itself, for example when a timeout is reached. If the principle holding a lock finished its operation on the locked resources it should use the UNLOCK method to release the lock. The UNLOCK method only receives a lock token via the Lock-Token header. The lock identified by this token is to be released. .. Note:: Not only the resource identified by the resource URI must be unlocked, but all other resources that are locked by the given lock token! Example - UNLOCK request: :: >>Request UNLOCK /workspace/webdav/info.doc HTTP/1.1 Host: webdav.sb.aol.com Lock-Token: Authorization: Digest username="ejw", realm="ejw@webdav.sb.aol.com", nonce="...", uri="/workspace/webdav/proposal.doc", response="...", opaque="..." >>Response HTTP/1.1 204 No Content Affected base methods --------------------- The following methods may only be performed on a locked resource if the performing principle owns the specific lock. - PUT - POST - PROPPATCH - MOVE - COPY - DELETE - MKCOL In addition the PROPFIND request is affected by lock support, since lock information is visible to every principle through the LOCKDISCOVERY and SUPPORTEDLOCK properties. Locking resources ================= Both types of resources (non-collection and collection resources) can be locked. This section describes differences for both types and other points directly related to locking resources. Non-collection resources ------------------------ A non collection resource can be affected directly or indirectly by a lock. In the first case a principle has issued a LOCK request explicitly for this resource, only locking this single resource. The second case occurs, if a principle locked a collection resource and the non-collection resource is a direct or in-direct descendant of it. For detailed information on this topic see the next section about locking if collection resources. Collections ----------- The LOCK request allows the 'Depth' header to be set to specify the depth of the created lock. A depth value of ZERO means, that only the affected collection itself is locked. This might be sensible to add new resources to this collection. The depth value INFINITY means that the created lock recursively affects all descendants of the collection. This way it is possible to lock a complete sub-tree of the WebDAV repository. Any lock (no matter which depth) on a collection prevents the addition and removal of direct members of this collection by non-lock-owners. This affects the following methods: - PUT - POST - MKCOL - DELETE If a collection should be locked and any of its members is already locked, this conflicts with the lock to be set and must result in an 423 error (Locked). Members that are newly created inside a locked collection or copied/moved to it are automatically included to the lock. This affects infinity-depth locks as well as zero-depth ones for direct children of the locked collection. Lock null resources ------------------- A write lock might be acquired to a resource that does not (yet) exist. This is called a "lock null resource". A lock null resource only supports the methods: - PUT - MKCOL - OPTIONS - PROPFIND - LOCK - UNLOCK All other methods must return 404 (Not Found) or 405 (Method Not Allowed). The properties of a null resource are mostly empty, except there must be LOCKDISCOVERY and SUPPORTEDLOCK properties. If PUT or MKCOL are issued, a null resource becomes a normal one. The RFC does not state if the lock stays on this newly created (real) resource or if it is removed. COPY/MOVE --------- Both methods destroy locks. If a resource is moved to a locked collection, it is automatically added to the lock (same principle assumed). Both will not work, if the destination is locked but no lock is owned by the principle. Refresh ======= A LOCK request must not occur twice. To refresh a lock, principles send a LOCK request with empty body and an If header that specifies the lock tokens to refresh locks for. If this occurs, the timers of the lock must be reset. A Timeout header might be send by the principle, but the server may safely ignore these and simply perform a refresh as it desires. If header ~~~~~~~~~ In addition to the headers If-Match and If-None-Match, which are described in the HTTP/1.1 RFC, RFC 2518 (WebDAV) describes the If-Header. The If header is used to define conditional actions by the client, similar to the 2 headers named before. However, it is constructed in a much more complex and weird way. The If header is described with the following pseudo EBNF: :: If = "If" ":" ( 1*No-tag-list | 1*Tagged-list) No-tag-list = List Tagged-list = Resource 1*List Resource = Coded-URL List = "(" 1*(["Not"](State-token | "[" entity-tag "]")) ")" State-token = Coded-URL Coded-URL = "<" absoluteURI ">" It may contain entity tags (see `Entity tag support`_), lock tokens (see `Lock tokens`_) and combination of both. In addition the header may contain additional resource URIs to affect not only the main request URI. Luckily, the If header either containes a tagged list (including affected resource URIs) or a no-tag list (without resource URIs). It cannot contain a combination of those. Both lists (tagged and no-tag) contain a not limmited number of lock tokens and/or entity tags and maybe prefixed by the keyword "Not". This indicates that the affected method may only be executed if the condition defined in the list does not match. This works similar to the If-None-Match header, specified by the HTTP/1.1 RFC. To illustrate this complex definition some more, some examples are presented and explained in following. No-tag list ------------ :: If: ( ["I am an ETag"]) (["I am another ETag"]) This If header consists of 2 no-tag lists (it does not contain any resource URIs). The first list consists of a lock token and an entity tag, the second only contains an entity tag. The semantics of this example is, that the method containing this If header may only be executed if - either the first combination of lock token and entity tag is matched - or if the second entity tag is matched. Note that the first list item describes a logical AND operation, while the whole list concatenates its items by logical OR. Tagged list ----------- :: COPY /resource1 HTTP/1.1 Host: www.foo.bar Destination: http://www.foo.bar/resource2 If: ( [W/"A weak ETag"]) (["strong ETag"]) (["another strong ETag"]) This example does not only show an If header with a tagged list, but also a context where this could make some sense: The COPY method affects several resources at once. It works at least on a source (the request URI) and a destination (see Destination header), Additionally it can affect whole sub-tress, using the Depth header. The If header in this case affects 2 resources, while one of the defined conditions will be checked and the other won't. The first list affects the request URI and contains 2 elements concatenated with logical OR. The first item consists of an AND-combination of a lock token and an entity tag. The second only contains an entity tag. This is similar to the example shown above for the no-tag list. Except that it contains the "tagging" URI. For the second tag only an entity tag is listed. Anyway, this condition would not be checked in the request but simply be ignored, since the resource in the tag is not affected by the resource. Not --- :: If: (Not ) This simple If header only shows the definition of a "Not" affected list. The keyword must occur at the very begining of the affected list item. This item contains 2 lock tokens combined with logical AND. In clear words the requested affected by this If header will be executed if non of the affected resources is locked by either of the specified lock tokens. .. Local Variables: mode: rst fill-column: 79 End: vim: et syn=rst tw=79