A proposal for an svn filesystem dump/restore format. Two problems we want to solve ============================= 1. When we change our node-id schema, we need to migrate all of our data (by dumping and restoring). 2. Serves as a backup format. Could be read by other software tools someday. Design Goals ============ A. Written as two new public functions in svn_fs.h. To be invoked by new 'svnadmin' subcommands. B. Format uses only timeless fs concepts. The dump format needs to reference concepts that we *know* are general enough to never change. These concepts must exist independently of any internal node-id schema, or any DB storage backend. In other words, we're talking about the basic ideas in our original "design spec" from May 2000. Format Semantics ================ Here are the timeless semantics of our fs design -- the things that would be stored in our dump format. - A filesystem is an array of trees. Each tree is called a "revision" and has unversioned properties attached. - A revision has a tree of "nodes" hanging off of it. Actually, the nodes in the filesystem form a DAG. A revision always points to an initial node that represents the 'root' of some tree. - The majority of a tree's nodes are hard-links (references) to nodes that were created in earlier trees. - A node contains - versioned text - versioned properties - predecessor history: "which node am I a variant of?" - copy history: "which node am I a copy of?" The history values can be non-existent (meaning the node is completely new), or can have a value of {revision, path}. ------------------------------------------------------------------------ Refinement of proposal #2: (after discussion with gstein) ========================= Each node starts with RFC822-style headers at the top. The final header is a 'Content-length:', followed by the content, so record boundaries can be inferred. The content section has two implicit parts: a property hash, and the fulltext. The division between these two sections is implied by the "PROPS-END\n" tag at the end of the prophash. In the case of a directory node or a revision, only the prophash is present. ----------------------------------------------------------------- SVN DUMPFILE VERSION 1 FORMAT The format starts with the version number of the dump format ("SVN-fs-dump-format-version: 1\n"), followed by a series of revision records. Each revision record starts with information about the revision, followed by a variable number of node changes for that revision. Fields in [braces] are optional, and unknown headers are always ignored, for backwards compatibility. Revision-number: N Prop-content-length: P Content-length: L ...P bytes of property data. Properties are stored in the same human-readable hashdump format used by working copy property files, except that they end with "PROPS-END\n" for better readability. Node-path: /absolute/path/to/node/in/filesystem Node-kind: file | dir (1) Node-action: change | add | delete | replace [Node-copyfrom-rev: X] [Node-copyfrom-path: /path ] [Text-copy-source-md5: blob] (2) [Text-content-md5: blob] [Text-content-length: T] [Prop-content-length: P] Content-length: Y (3) ... Y bytes of content data, divided into P bytes of "property" data and T bytes of "text" data. The properties come first; their total length (including formatting) is Prop-content-length, and is included in Node-content-length. The "PROPS-END\n" line always terminates the property section if there are props. The remainder of the Y bytes (expected to be equivalent to Text-content-length) represent the contents of the node. Notes: (1) if the node represents a deletion, this field is optional. (2) this is a checksum of the source of the copy. a loader process can use this checksum to determine that the copyfrom path/rev already present in a filesystem is really the *correct* one to use. (3) the Content-length header is technically unnecessary, since the information it holds (and more) can be found in the Prop-content-length and Text-content-length fields. Though Subversion itself does not make use of the header when reading a dumpfile, we include it for compatibility with generic RFC822 parsers. ----------------------------------------------------------------- EXAMPLE Here's an example of revision 1422, whereby I added a new directory "baz", added a new file "bop" inside it, and modified the file "foo.c": Revision-number: 1422 Prop-content-length: 80 Content-length: 80 K 6 author V 7 sussman K 3 log V 17 Added two files, changed a third. PROPS-END Node-path: bar/baz Node-kind: dir Node-action: add Prop-content-length: 35 Content-length: 35 K 10 svn:ignore V 4 TAGS PROPS-END Node-path: bar/baz/bop Node-kind: file Node-action: add Prop-content-length: 76 Text-content-length: 54 Content-length: 130 K 14 svn:executable V 2 on K 12 svn:keywords V 15 LastChangedDate PROPS-END Here is the text of the newly added 'bop' file. Whee. Node-path: bar/foo.c Node-kind: file Node-action: change Text-content-length: 102 Content-length: 102 Here is the fulltext of my change to an existing /bar/foo.c. Notice that this file has no properties.