This file describes the format produced by 'svnadmin dump' and consumed by 'svnadmin load'. The format has undergone revisions over time. They are presented in reverse chronological order here. You may wish to start with the VERSION 1 description in order to get a baseline understanding first. ===== SVN DUMPFILE VERSION 3 FORMAT ===== (generated by SVN versions 1.1.0-present, if requested by the user) This format is equivalent to the VERSION 2 format except for the following: 1.) The format starts with the new version number of the dump format ("SVN-fs-dump-format-version: 3\n"). 2.) There are two new optional headers for node changes: [Text-delta: true|false] [Prop-delta: true|false] The default value for these headers is "false". If the value is set to "true", then the text and property contents will be treated as deltas against the previous contents of the node (as determined by copy history for adds with history, or by the value in the previous revision for changes--just as with commits). Property deltas have the same format as regular property lists except that (1) properties with the same value as in the previous contents of the node are not printed, and (2) deleted properties will be written out as D just as a regular property is printed, but with the "K " changed to a "D " and with no value part. Text deltas are written out as a series of svndiff0 windows. ===== SVN DUMPFILE VERSION 2 FORMAT ===== (generated by SVN versions 0.18.0-present, by default) This format is equivalent to the VERSION 1 format in every respect, except for the following: 1.) The format starts with the new version number of the dump format ("SVN-fs-dump-format-version: 2\n"). 2.) In addition to "Revision Records", another sort of record is supported: the "UUID" record, which should be of the form: UUID: 7bf7a5ef-cabf-0310-b7d4-93df341afa7e This should be used to indicate the UUID of the originating repository. ===== SVN DUMPFILE VERSION 1 FORMAT ===== (generated by SVN versions prior to 0.18.0) The binary format starts with the version number of the dump format ("SVN-fs-dump-format-version: 1\n"), followed by a series of revision records. Each revision record starts with information about the revision, followed by a variable number of node changes for that revision. Fields in [braces] are optional, and unknown headers are always ignored, for backwards compatibility. Revision-number: N Prop-content-length: P Content-length: L ...P bytes of property data. Properties are stored in the same human-readable hashdump format used by working copy property files, except that they end with "PROPS-END\n" for better readability. Node-path: /absolute/path/to/node/in/filesystem Node-kind: file | dir (1) Node-action: change | add | delete | replace [Node-copyfrom-rev: X] [Node-copyfrom-path: /path ] [Text-copy-source-md5: blob] (2) [Text-content-md5: blob] [Text-content-length: T] [Prop-content-length: P] Content-length: Y (3) ... Y bytes of content data, divided into P bytes of "property" data and T bytes of "text" data. The properties come first; their total length (including formatting) is Prop-content-length, and is included in Node-content-length. The "PROPS-END\n" line always terminates the property section if there are props. The remainder of the Y bytes (expected to be equivalent to Text-content-length) represent the contents of the node. Notes: (1) if the node represents a deletion, this field is optional. (2) this is a checksum of the source of the copy. a loader process can use this checksum to determine that the copyfrom path/rev already present in a filesystem is really the *correct* one to use. (3) the Content-length header is technically unnecessary, since the information it holds (and more) can be found in the Prop-content-length and Text-content-length fields. Though Subversion itself does not make use of the header when reading a dumpfile, we include it for compatibility with generic RFC822 parsers. (4) There are actually 2 types of version 1 dump streams. The regular ones are generated since r2634 (svn 0.14.0). Older ones also claim to be version 1, but miss the Props-content-length and Text-content-length fields in the block header. In those days there *always* was a properties block. EXAMPLE: Here's an example of revision 1422, whereby I added a new directory "baz", added a new file "bop" inside it, and modified the file "foo.c": Revision-number: 1422 Prop-content-length: 80 Content-length: 80 K 6 author V 7 sussman K 3 log V 33 Added two files, changed a third. PROPS-END Node-path: bar/baz Node-kind: dir Node-action: add Prop-content-length: 35 Content-length: 35 K 10 svn:ignore V 4 TAGS PROPS-END Node-path: bar/baz/bop Node-kind: file Node-action: add Prop-content-length: 76 Text-content-length: 54 Content-length: 130 K 14 svn:executable V 2 on K 12 svn:keywords V 15 LastChangedDate PROPS-END Here is the text of the newly added 'bop' file. Whee. Node-path: bar/foo.c Node-kind: file Node-action: change Text-content-length: 102 Content-length: 102 Here is the fulltext of my change to an existing /bar/foo.c. Notice that this file has no properties. -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- Old discussion: (This file started as a proposal, preserved here for posterity.) A proposal for an svn filesystem dump/restore format. Two problems we want to solve ============================= 1. When we change our node-id schema, we need to migrate all of our data (by dumping and restoring). 2. Serves as a backup format. Could be read by other software tools someday. Design Goals ============ A. Written as two new public functions in svn_fs.h. To be invoked by new 'svnadmin' subcommands. B. Format uses only timeless fs concepts. The dump format needs to reference concepts that we *know* are general enough to never change. These concepts must exist independently of any internal node-id schema, or any DB storage backend. In other words, we're talking about the basic ideas in our original "design spec" from May 2000. Format Semantics ================ Here are the timeless semantics of our fs design -- the things that would be stored in our dump format. - A filesystem is an array of trees. Each tree is called a "revision" and has unversioned properties attached. - A revision has a tree of "nodes" hanging off of it. Actually, the nodes in the filesystem form a DAG. A revision always points to an initial node that represents the 'root' of some tree. - The majority of a tree's nodes are hard-links (references) to nodes that were created in earlier trees. - A node contains - versioned text - versioned properties - predecessor history: "which node am I a variant of?" - copy history: "which node am I a copy of?" The history values can be non-existent (meaning the node is completely new), or can have a value of {revision, path}. ------------------------------------------------------------------------ Refinement of proposal #2: (after discussion with gstein) ========================= Each node starts with RFC822-style headers at the top. The final header is a 'Content-length:', followed by the content, so record boundaries can be inferred. The content section has two implicit parts: a property hash, and the fulltext. The division between these two sections is implied by the "PROPS-END\n" tag at the end of the prophash. In the case of a directory node or a revision, only the prophash is present.