A proposal for an svn filesystem dump/restore format. Two problems we want to solve ============================= 1. When we change our node-id schema, we need to migrate all of our data (by dumping and restoring). 2. Serves as a backup format. Could be read by other software tools someday. Design Goals ============ A. Written as two new public functions in svn_fs.h. To be invoked by new 'svnadmin' subcommands. B. Format uses only timeless fs concepts. The dump format needs to reference concepts that we *know* are general enough to never change. These concepts must exist independently of any internal node-id schema, or any DB storage backend. In other words, we're talking about the basic ideas in our original "design spec" from May 2000. Format Semantics ================ Here are the timeless semantics of our fs design -- the things that would be stored in our dump format. - A filesystem is an array of trees. Each tree is called a "revision" and has unversioned properties attached. - A revision has a tree of "nodes" hanging off of it. Actually, the nodes in the filesystem form a DAG. A revision always points to an initial node that represents the 'root' of some tree. - The majority of a tree's nodes are hard-links (references) to nodes that were created in earlier trees. - A node contains - versioned text - versioned properties - predecessor history: "which node am I a variant of?" - copy history: "which node am I a copy of?" The history values can be non-existent (meaning the node is completely new), or can have a value of {revision, path}. ------------------------------------------------------------------------ Refinement of proposal #2: (after discussion with gstein) ========================= Each node starts with RFC822-style headers at the top. The final header is a 'Content-length:', followed by the content, so record boundaries can be inferred. The content section has two implicit parts: a property hash, and the fulltext. The division between these two sections is implied by the "PROPS-END\n" tag at the end of the prophash. In the case of a directory node or a revision, only the prophash is present. ----------------------------------------------------------------- SVN DUMPFILE VERSION 1 FORMAT The format starts with the version number of the dump format ("SVN-fs-dump-format-version: 1\n"), followed by a series of revision records. Each revision record starts with information about the revision, followed by a variable number of node changes for that revision. Fields in [braces] are optional, and unknown headers are always ignored, for backwards compatibility. Revision-number: N [Revision-content-md5: blob] Content-length: L ...L bytes of property data. Properties are stored in the same human-readable hashdump format used by working copy property files, except that they end with "PROPS-END\n" for better readability. Node-path: /absolute/path/to/node/in/filesystem Node-kind: file | dir (1) Node-action: change | add | delete | replace [Node-copyfrom-rev: X] [Node-copyfrom-path: /path ] [Node-copy-source-md5: blob] (2) [Node-content-md5: blob] Content-length: Y ... Y bytes of content data, divided into 'props' and 'text' sections. The properties come first; their total length (including formatting) is included in Node-content-length. The "PROPS-END\n" line always terminates the property section; if there are no props, "PROPS-END\n" still signifies the beginning of the node's text content. Notes: (1) if the node represents a deletion, this field is optional. (2) this is a checksum of the source of the copy. a loader process can use this checksum to determine that the copyfrom path/rev already present in a filesystem is really the *correct* one to use. ----------------------------------------------------------------- EXAMPLE Here's an example of revision 1422, whereby I added a new directory "baz", added a new file "bop" inside it, and modified the file "foo.c": Revision-number: 1422 Content-length: 74 K 6 author V 7 sussman K 3 log V 17 Added two files, changed a third. PROPS-END Node-path: /bar/baz Node-rev: 1422 Node-kind: dir Node-action: added Content-checksum: oj3eu729 Content-length: 29 K 10 svn:ignore V 4 TAGS PROPS-END Node-path: /bar/baz/bop Node-rev: 1422 Node-kind: file Node-action: added Content-checksum: bzz35te7 Content-length: 124 K 12 svn:keywords V 15 LastChangedDate K 14 svn:executable V 2 on PROPS-END Here is the text of the newly added 'bop' file. Whee. Node-path: /bar/foo.c Node-rev: 1422 Node-kind: file Node-action: added Content-checksum: Ae73te7et Content-length: 105 PROPS-END Here is the fulltext of my change to an existing /bar/foo.c. Notice that this file has no properties.