Element Model

Julian Foad

Table of Contents

Part I. Overview

1 Introduction

Imagine a user who:
The design aims to satisfy such a user.
Move tracking is intimately related to merging. The original goal was to make merging work well in the presence of moves. A move tracking model is required in order to support merging.
The purpose for this design is to explore how Subversion could support move tracking, so that we can evaluate its advantages and disadvantages and decide whether we want to implement something based on this idea or to go in a different direction.
The presentation here pretends that we have free rein to redesign Subversion from the ground up. As such it is best to read this with the idea in mind of implementing a new prototype and seeing how it works, and not to think in terms of adapting the existing Subversion.
The ’svnmover’ prototype on the ’move-tracking-2’ branch already implements much of this model.

1.1 Key Features

1.2 Other Potential Benefits

1.2.1 Repository Entropy Reduction

Often a Subversion repository contains metadata that conveys no intentional information but only accidental information. Examples:
In the new model, many such cases can be reduced to a true no-op.

1.2.2 Orthogonal Branching

While it is designed for path-addressed branches, this model itself does not require that each branch is attached to a unique path. With little or no change to the basic model, we could support branching at the root path level within a repository, where branches are addressed in a name-space orthogonal to the path namespace, as used in popular DVCSs.
More or less the only addition that would be required is a method for addressing the branches: an extension to the branch id scheme, and extensions to the APIs for converting to and from path-space.
This could be useful both on its own and for greater compatibility with DVCSs.

1.2.3 Cross-Repository Branching

The model supports, in principle, cross-repository branching and merging. Element identifiers would need to be extended to include a repository identifier.

1.3 Potential Cons

1.4 Unfinished Design

Branching — how to track the ’from’ branch and revision.
3-way merge — how to deal with a subtree root node.
Merge history.

1.5 TODO (in the document)

2 Glossary

branch a set of elements arranged in a tree, with a branch-root element at the root of the tree; a leaf node of the tree can be a file node element or a subbranch-root element; a subbranch-root element is part of this branch but the subbranch it links to is not
branch-root the root element of a branch; carries its own payload, but its location is determined solely by the corresponding (linked) subbranch-root element of the outer branch; see also subbranch-root
eid element identifier, uniquely identifying an element within a branch and matching the equivalent element in another branch; an arbitrary token that supports comparison for equality and being used as an index key
element a file or directory or subbranch-root; the smallest unit of merging; versioned content is (depending on context) its parent element id and name, and/or its payload;
location the tree-structure part of the versioned content of an element: that is, the element’s parent-eid and name
payload the non-tree-structure part of the versioned content of an element: that is, the element’s properties (for a file or directory) and text (for a file)
subbranch-root the kind of element that links a subbranch to an outer branch; has a versioned parent-eid and name
target may refer to the target of a merge; or to the target path of a symlink, if symlinks are supported

Part II. The Model

3 Layers

Table 1 Layers of modelling
(client/UI)
Branch Management possible future addition
Merge History TBD; to be described here
Element Model described here
Repository Storage TBD; to be described elsewhere
(server/fs)

4 Element Model

4.1 Introduction

Subversion fundamentally tracks versions of a tree of the user’s primary data, or of multiple trees if the repository is used to version multiple projects. A versioned tree consists of:
Subversion also tracks metadata describing the versions of and changes to the versioned tree, including:
Historically, Subversion tracked the branches by using part of the pathname within the repository. In the element model, we must think of the user’s versioned content as being clearly distinct from the branch identification prefix of the in-repository path.
The structure and content of a versioned tree can be precisely defined as comprising the following attributes on each node:
We no longer think of a directory as holding a list of its children; instead, each child holds a single reference that identifies its parent. In this way, the high level conceptual idea of a “move” is modeled by a change to exactly one element.
The root node of a versioned tree must be treated specially. When merging changes from one branch of this versioned tree to another, only the payload of this root node must be merged; its parent and name are part of the external structure of the repository, the branching metadata, not part of this versioned tree.
Each branch root is modeled by a linked pair of elements: a subbranch-root element which belongs to the outer branch (and has a location but no payload), linked to a node element which is the root element of the inner branch (and has only a payload).
The root path of the repository is defined as being a root node of a branch. In typical Subversion work flows this would not be thought of as a branch. Any normal branch in a typical Subversion work flow would be modeled as a subbranch of the root branch. For example, we might create a subbranch-root element at the path ’trunk’, linked to a branch root node of the kind ’directory’ and having a (possibly empty) set of properties.
A subbranch-root element acts like a container that contains a nested subbranch. When merging such an element, the whole nested subbranch is treated as a unit and handled by a recursive application of the merge algorithm.
The term ’elements’ refers collectively to nodes and to subbranch-root elements.

4.2 Elements, Nodes, Paths, Payloads

An element is either a node element or a subbranch-root element.
A node element represents a file or a directory, or maybe in future a symlink. Each node element has versioned content called its payload. The payload for a file is its text and properties; the payload for a symlink is its target and properties. The payload for a directory is its properties only.
The children of a directory do not belong to the directory in this model. Instead, the child-parent relationships are modeled by each element having a “parent element” attribute and a “name” attribute.
The (parent, name, payload) of an element is its versioned content. The basic unit of change tracking (including merge tracking) is the change to the content of one element in one revision.
A subbranch-root element carries a location but no explicit payload. Instead, it represents the connection of a subbranch with its outer branch. The subbranch-root element’s fixed attribute “subbranch-root-eid” specifies the element id acting as root of the subbranch.

4.2.1 attributes of an element

fixed attributes:
versioned attributes:

4.2.2 Repository Root

An new repository in Subversion is given a single directory node as its root element.
As a branch root element, the repository root element has no parent-eid or name and no outer branch, and so it cannot be moved away from the repository root path. However, it can be branched to another path, although in typical work flows that would not be done.

4.2.3 Paths

At each path there is exactly one node element (if there is anything at all). This carries the payload at that path, and in most cases carries the location information (parent and name) of that path as well.
At a branch boundary path there are two elements involved. The root element of the inner branch is a node and carries the payload only. A subbranch-root element belonging to the outer branch carries the location information (parent and name).
### “Within a revision, elements map to paths as follows:” [table showing an example] ?

4.3 Branch

A branch is addressed by the sequence of subbranch-root elements that must be followed, starting from the repository root, to reach it. The repository root branch is addressed by a zero-length sequence.
For example, with integer EIDs, the repository root branch address is ’’, while each first-level subbranch within it is addressed by a length-one sequence of EIDs such as ’53’ or ’12001’, and a second-level subbranch is ’53.14141’ or ’12001.12345’.
### Example as a table of paths and their branch ids?
The branching model supports whole-repository branching, as used in many DVCSs. While it has been assumed in the descriptions above that each branch root is mapped to a different path in a single repository-wide path-space, we could also provide a way to map branches to a new branching name-space orthogonal to paths.

5 3-way Merging

Here we cover the rules for calculating the result of a 3-way merge.
We assume the base state (usually a youngest common ancestor) and the other two states (variously called ’source’ and ’target’, or ’side 1’ and ’side 2’) are already known. The section on Merge History covers the higher layer of merging: deciding what changes need to be merged and what three-way merges (with which bases) will be used to accomplish it.
It is characteristic of this model that elements are uniquely paired across branches by their element id; there is no ambiguity about which element in one branch or state should be merged with which element in another.
The aspect of 3-way merging most relevant to the element branching model is how the tree shapes are merged. The payload of each element (such as its properties and text) can be merged with a traditional 3-way text merge. The merging of payload is not described here because it need not differ from traditional practice.

5.1 Stages of Tree Shape Merge

Calculation of a 3-way merge is largely per element in this model. For each element, we specify how to merge any two changes to that element. In the second stage we specify how to combine the per-element merges into a whole tree merge. In both stages we specify which combinations should be regarded as conflicting.
The two main stages:
It may be useful to (partially) order the merging of elements, so as to build up the result tree in parent-to-child order. This would ensure we don’t cause the user to spend effort in resolving conflicts in a subtree when a tree conflict occurring higher up the hierarchy might make them decide to throw away the whole subtree. However, it is impossible to determine a complete top-down ordering of the result tree in advance, in general, when the result will depend on conflict resolution choices.

5.2 Per element, 3-way merge

For each element that appears in any of the three sides of the merge, we merge the two changes to the element’s (parent-eid, name) pair according to Table 2↓.
In this table, each row represents an element, and a letter (O, R, X, Y) represents a particular value of the element’s (parent-eid, name) pair: different letters mean the parent-eid differs and/or the name differs. The semantics are symmetric with respect to Side 1 and Side 2; only one row of a symmetric pair is shown.
Base Side 1 Side 2 Permissive Result Strict Result Notes
- - R R R only add [1]
O O R R R only mv [1]
O O - - - only del [2]
- R R R conflict duplicate add [1]
O R R R conflict duplicate mv [1]
O - - - conflict duplicate del
- X Y conflict conflict add vs. add
O X Y conflict conflict mv vs. mv
O X - conflict conflict mv vs. del
Table 2 3-way merge of a single element
Notes:

5.3 Merging Parent and Name Separately

Table 2 suggests that an element’s location (parent-eid and name) should be treated as a unit. If a conflict occurs, the conflict relates to the parent-eid and name together. An alternative is to merge the two parts separately. This implies allowing a conflict to be raised and resolved for either or both parts independently.
Each part independently:
Both parts together:
The choice of whether a conflict should apply to both parts together or each part separately is one of user preference and the best choice would depend on the scenario. In essence it is a user interface issue. If the merge logic is to be provided in a library, this should be designed to handle the two parts separately so that the UI layer has the choice to treat them either together or separately.
The idea is exactly the same as allowing text and property conflicts on the same element to be handled separately.
It is logical to generalize the idea to treating any or all of the component parts of an element’s content (parent-eid, name, properties, text, ...) together or separately for the purposes of conflict handling.

5.4 Conflict Policies

Conflict detection is fundamentally subjective: any two changes could conflict semantically. The job of a good merge tool is to help the user by flagging combinations of changes that are likely to need human intervention, while quietly resolving the rest in a way that is likely to be satisfactory. Because the needs of users and the degree of semantic coupling depend on the use case, the tool therefore should not be configurable in what cases are reported as a conflict.
Configurability is provided by a merge policy. The policy is a statement of what combinations should be flagged as a conflict and what should be resolved automatically. As a starting point, one ’strict’ policy and one ’permissive’ policy are suggested. The policies suggested here differ only in their treatment of duplicate changes — cases where both sides of the merge are making the same change. In some merge scenarios it is known and intentional that many of the same changes are being made on both sides, and in those scenarios the permissive policy may be more helpful. In scenarios where a duplicate change is not expected and may indicate an error has been made, then the strict policy may be more helpful.
It is also helpful to have a different policy for user-initiated merges than for rebase-on-commit merges. The latter has historically been implemented in Subversion with a fairly strict policy. (Note also that due to the architecture of the commit process there is no option to store conflicts and have the user resolve them. Any conflict results in immediate failure of the commit.)
A merge policy can also specify which parts of an element’s content are merged together, as discussed in the section ’Merging Parent and Name Separately’ above.

5.5 Per-Element Conflicts

As seen in Table 2 above.

5.6 Multi-Element Conflicts

After or during the per-element merge, it is necessary to check for the following kinds of conflict that affect more than one element.

5.6.1 Clashes

There is a clash if two or more elements have the same parent-eid and the same name.
This is always a conflict.

5.6.2 Orphans

An element is an orphan if its parent is not present (is deleted or is not added) in the merge result.
When this applies to an element that is modified as a result of the merge, that is a conflict.
When this applies to an element that is not modified as a result of the merge, the element is silently deleted. This means that deleting a parent directory causes all its children in the target to be deleted, not only the children that it had in the source.
A merge policy option could enable a stricter option here.

5.6.3 Cycles

There is a cycle if following the parent-eid chain from an element leads, sooner or later, back to itself rather than to the branch root.
This is always a conflict.

5.7 Subtree Merge

When merging a ’subtree’ it is necessary to decide what exactly is meant by a ’subtree’. Historically a subtree was defined by specifying a subtree root element. Now that moves are supported, a given element can be inside the given subtree root element on one ’side’ of the merge and outside it on another side of the merge.
Suggestions:
The ’location’ attributes of the subtree root element must be ignored when merging.
Note that a given element may be the root of one branch and a subtree in another branch.

6 Merge History

Merge history is required to track which changes are present in which branches.

6.1 Requirements

Merge history should:
The system should enable merge history to be treated as a DAG in cases where ’complete’ merging is being performed. In other words, after a complete merge, it should be clear from the merge history that there is a point that can be treated as a youngest common ancestor.
Merge history also provides the metadata necessary to answer “from which branch and revision was this branch branched?” — see the section on Copying.

6.2 Subtree Merge History

Merge history shall indicate that a change is logically present in the branch. This means the effect of the change is in the branch, even if no longer in the same form as the original change. Therefore mergeinfo does not inherently belong at the element level, but at the whole branch level.
### Is mergeinfo attached to a “subtree” or rather a subset of elements, in order to track subtree merges?
The smallest unit of tracked changes shall be the change to the versioned content of one element (parent-eid, name, payload) in one revision.
### ...

6.3 Design

### TBD

7 Copying

In old svn, copying inserted metadata that said “node-rev foo@100 was copied from /bar@95”. Apart from being visible in “log -v” and “info”, the only way this affected behaviour was that commands such as “log” or “merge” with this node as the target would follow copies backwards. This could be useful for simulating moves, to a degree. But a copy was not recognized as an operation that could be merged to another branch. Instead, merge treated a copy the same as a simple add; in both cases, the result was a copy from the source branch.
Copying as a separate concept is no longer much needed, as the model natively supports the following cases where a ’copy’ was previously used:
The only remaining case for copying is:
There is no need for this kind of ’plain’ copying to be modeled explicitly at this level.
Recommendation:

7.1 How New Features Replace Copying

The copy-from metadata is replaced in the following way in the new model, for each case that was previously represented by a copy:
### DIAGRAMS

7.2 Merge History as Copy-From Metadata

Merge history can replace much of the semantics of copy-from metadata.
### basically:

7.3 Translating from new model back to ’copy-from’ metadata

###

8 Branch Management

The ’elements’ model deliberately does not address branch management. By branch management I mean things such as
Branch management should be modelled and implemented at a higher layer.

9 The Working Copy

The fundamental purpose of the working copy (WC) is to hold a base state and local changes against that state, in order to build (offline) a new commit. A single commit, in Subversion so far.
As such, the design of the WC must match the design of the repository. A move-tracking WC will work with the element model. The WC represents a base state and a working state, each comprising a set of elements. It may be stored as a complete base state and a difference.
Special considerations:

9.1 Checkout, Update, Switch, Commit

Basic WC state operations including Checkout, Update, Switch, Commit, Diff and Merge shall all make use of basic element-mode diff or edit APIs. For most of these operations a contextual diff is required in order to provide a good merge.
(Commit requires less context because its rebase merge is more restricted: for example, it does not merge text or properties so no context is required for them. Update and Switch perhaps do not need context in areas where there are no local changes. Checkout probably does not require any context.)

9.2 Mixed-Rev WC Operation

Each element in the WC has its own base revision. A commit will update only the elements for which it commits a change, resulting in a mixed-rev WC. Any element of set of elements can be updated separately, resulting in a mixed-rev WC.
An update shall convey a set of per-element changes. These can include moving elements. When an update attempts to transform the base state into a non-tree configuration (such as with a cycle or a name clash or a deleted parent) the update shall be rejected and the WC unchanged. An update shall only succeed if it results in a tree configuration for the new base state.
A successful update to the base state shall be merged with the working state to produce the new working state. Any resulting tree conflicts shall be recorded in the WC to be resolved by the user, like in old Subversion except the kinds of conflicts are different.
Consequences: