Network Working Group L. Masinter 
Request for Comments: 2718 Xerox Corporation 
Category: Informational H. Alvestrand 
 Maxware, Pirsenteret 
 D. Zigmond 
 WebTV Networks, Inc. 
 R. Petke 
 UUNET Technologies 
 November 1999 


Guidelines for new URL Schemes

Status of this Memo

This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.

Copyright Notice

Copyright © The Internet Society (1999). All Rights Reserved.

Abstract

A Uniform Resource Locator (URL) is a compact string representation of the location for a resource that is available via the Internet. This document provides guidelines for the definition of new URL schemes.

1 Introduction

A Uniform Resource Locator (URL) is a compact string representation of the location for a resource that is available via the Internet. RFC 2396 [1] defines the general syntax and semantics of URIs, and, by inclusion, URLs. URLs are designated by including a "<scheme>:" and then a "<scheme-specific-part>". Many URL schemes are already defined.

This document provides guidelines for the definition of new URL schemes, for consideration by those who are defining and registering or evaluating those definitions.

The process by which new URL schemes are registered is defined in RFC 2717 [2].

2 Guidelines for new URL schemes

Because new URL schemes potentially complicate client software, new schemes must have demonstrable utility and operability, as well as compatibility with existing URL schemes. This section elaborates these criteria.

2.1 Syntactic compatibility

New URL schemes should follow the same syntactic conventions of existing schemes when appropriate. If a URI scheme that has embedded links in content accessed by that scheme does not share syntax with a different scheme, the same content cannot be served up under different schemes without rewriting the content. This can already be a problem, and with future digital signature schemes, rewriting may not even be possible. Deployment of other schemes in the future could therefore become extremely difficult.

2.1.1 Motivations for syntactic compatibility

Why should new URL schemes share as much of the generic URI syntax (that makes sense to share) as possible? Consider the following:

2.1.2 Improper use of "//" following "<scheme>:"

Contrary to some examples set in past years, the use of double slashes as the first component of the <scheme-specific-part> of a URL is not simply an artistic indicator that what follows is a URL: Double slashes are used ONLY when the syntax of the URL's <scheme-specific-part> contains a hierarchical structure as described in RFC 2396. In URLs from such schemes, the use of double slashes indicates that what follows is the top hierarchical element for a naming authority. (See section 3 of RFC 2396 for more details.) URL schemes which do not contain a conformant hierarchical structure in their <scheme-specific-part> should not use double slashes following the "<scheme>:" string.

2.1.3 Compatibility with relative URLs

URL schemes should use the generic URL syntax if they are intended to be used with relative URLs. A description of the allowed relative forms should be included in the scheme's definition. Many applications use relative URLs extensively. Specifically,

2.2 Is the scheme well defined?

It is important that the semantics of the "resource" that a URL "locates" be well defined. This might mean different things depending on the nature of the URL scheme.

2.2.1 Clear mapping from other name spaces

In many cases, new URL schemes are defined as ways to translate other protocols and name spaces into the general framework of URLs. The "ftp" URL scheme translates from the FTP protocol, while the "mid" URL scheme translates from the Message-ID field of messages.

In either case, the description of the mapping must be complete, must describe how characters get encoded or not in URLs, must describe exactly how all legal values of the base standard can be represented using the URL scheme, and exactly which modifiers, alternate forms and other artifacts from the base standards are included or not included. These requirements are elaborated below.

2.2.2 URL schemes associated with network protocols

Most new URL schemes are associated with network resources that have one or several network protocols that can access them. The 'ftp', 'news', and 'http' schemes are of this nature. For such schemes, the specification should completely describe how URLs are translated into protocol actions in sufficient detail to make the access of the network resource unambiguous. If an implementation of the URL scheme requires some configuration, the configuration elements must be clearly identified. (For example, the 'news' scheme, if implemented using NTTP, requires configuration of the NTTP server.)

2.2.3 Definition of non-protocol URL schemes

In some cases, URL schemes do not have particular network protocols associated with them, because their use is limited to contexts where the access method is understood. This is the case, for example, with the "cid" and "mid" URL schemes. For these URL schemes, the specification should describe the notation of the scheme and a complete mapping of the locator from its source.

2.2.4 Definition of URL schemes not associated with data resources

Most URL schemes locate Internet resources that correspond to data objects that can be retrieved or modified. This is the case with "ftp" and "http", for example. However, some URL schemes do not; for example, the "mailto" URL scheme corresponds to an Internet mail address.

If a new URL scheme does not locate resources that are data objects, the properties of names in the new space must be clearly defined.

2.2.5 Character encoding

When describing URL schemes in which (some of) the elements of the URL are actually representations of sequences of characters, care should be taken not to introduce unnecessary variety in the ways in which characters are encoded into octets and then into URL characters. Unless there is some compelling reason for a particular scheme to do otherwise, translating character sequences into UTF-8 (RFC 2279) [3] and then subsequently using the %HH encoding for unsafe octets is recommended.

2.2.6 Definition of operations

In some contexts (for example, HTML forms) it is possible to specify any one of a list of operations to be performed on a specific URL. (Outside forms, it is generally assumed to be something you GET.)

The URL scheme definition should describe all well-defined operations on the URL identifier, and what they are supposed to do.

Some URL schemes (for example, "telnet") provide location information for hooking onto bi-directional data streams, and don't fit the "infoaccess" paradigm of most URLs very well; this should be documented.

NOTE: It is perfectly valid to say that "no operation apart from GET is defined for this URL". It is also valid to say that "there's only one operation defined for this URL, and it's not very GET-like". The important point is that what is defined on this type is described.

2.3 Demonstrated utility

URL schemes should have demonstrated utility. New URL schemes are expensive things to support. Often they require special code in browsers, proxies, and/or servers. Having a lot of ways to say the same thing needless complicates these programs without adding value to the Internet.

The kinds of things that are useful include:

2.3.1 Proxy into HTTP/HTML

One way to provide a demonstration of utility is via a gateway which provides objects in the new scheme for clients using an existing protocol. It is much easier to deploy gateways to a new service than it is to deploy browsers that understand the new URL object.

Things to look for when thinking about a proxy are:

2.4 Are there security considerations?

Above and beyond the security considerations of the base mechanism a scheme builds upon, one must think of things that can happen in the normal course of URL usage.

In particular:

2.5 Does it start with UR?

Any scheme starting with the letters "U" and "R", in particular if it attaches any of the meanings "uniform", "universal" or "unifying" to the first letter, is going to cause intense debate, and generate much heat (but maybe little light).

Any such proposal should either make sure that there is a large consensus behind it that it will be the only scheme of its type, or pick another name.

2.6 Non-considerations

Some issues that are often raised but are not relevant to new URL schemes include the following.

2.6.1 Are all objects accessible?

Can all objects in the world that are validly identified by a scheme be accessed by any UA implementing it?

Sometimes the answer will be yes and sometimes no; often it will depend on factors (like firewalls or client configuration) not directly related to the scheme itself.

3 Security Considerations

New URL schemes are required to address all security considerations in their definitions.

4 References

[1]Berners-Lee, T., Fielding, R.T., and L. Masinter, “Uniform Resource Identifiers (URI): Generic Syntax”, RFC 2396, August 1998.
[2]Petke, R. and I. King, “Registration Procedures for URL Scheme Names”, BCP 35, RFC 2717, November 1999.
[3]Yergeau, F., “UTF-8, a transformation format of ISO 10646”, RFC 2279, January 1998.

Authors' Addresses

 Larry Masinter
 Xerox Corporation
 Palo Alto Research Center
3333 Coyote Hill Road
 Palo Alto, CA 94304
EMail: masinter@parc.xerox.com
URI: http://purl.org/NET/masinter
 
 Harald Tveit Alvestrand
 Maxware, Pirsenteret
 N-7005 Trondheim
 NORWAY
Phone: +47 73 54 57 00
EMail: harald.alvestrand@maxware.no
 
 Dan Zigmond
 WebTV Networks, Inc.
 305 Lytton Avenue
 Palo Alto, CA 94301
 USA
Phone: +1-650-614-6071
EMail: djz@corp.webtv.net
 
 Rich Petke
 UUNET Technologies
 5000 Britton Road
P. O. Box 5000
 Hilliard, OH 43026-5000
Phone: +1-614-723-4157
Fax: +1-614-723-8407
EMail: rpetke@wcom.net
 

Intellectual Property Statement

The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.

Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at <http://www.ietf.org/ipr>.

The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org.

Disclaimer of Validity

This document and the information contained herein are provided on an “AS IS” basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Copyright Statement

Copyright © The Internet Society (1999). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.

Acknowledgement

Funding for the RFC Editor function is currently provided by the Internet Society.