Element Model

Julian Foad

Table of Contents

Part I: Overview

Section 1: Introduction

Section 2: Glossary

Part II: The Model

Section 3: Layers

Section 4: Element Model

Section 5: 3-way Merging

Section 6: Merge History

Section 7: Copying

Section 8: Branch Management

Section 9: The Working Copy

Part I. Overview

1 Introduction

Imagine a user who:

is familiar with version control concepts
is not familiar with Subversion specifics
expects directories to be versioned (not just files)
expects move tracking (as opposed to move ’detection’)

The design aims to satisfy such a user.

Move tracking is intimately related to merging. The original goal was to make merging work well in the presence of moves. A move tracking model is required in order to support merging.

The purpose for this design is to explore how Subversion could support move tracking, so that we can evaluate its advantages and disadvantages and decide whether we want to implement something based on this idea or to go in a different direction.

The presentation here pretends that we have free rein to redesign Subversion from the ground up. As such it is best to read this with the idea in mind of implementing a new prototype and seeing how it works, and not to think in terms of adapting the existing Subversion.

The ’svnmover’ prototype on the ’move-tracking-2’ branch already implements much of this model.

1.1 Key Features

Simple and powerful model of moves
- easy to understand (for users and developers)
- moves are atomic and never ’broken’ into two halves
- deterministic result for arbitrary combinations of moving and other operations
Symmetry between the ’revisions’ dimension and the ’branches’ dimension
- a diff between two branches has the same form and capabilities as a diff between two revisions of a branch
- unlike historical Subversion
- enables uniform definition of 3-way merge given any 3 points in a merge DAG
Flexible nested branching
- branching at arbitrary paths
- flexible subbranching, inside and/or outside the parent branch
  - e.g. /trunk/foo-branch1, /trunk/foo-branch2, /branches/foo-branch3
Simpler tree conflicts
- the possible kinds of tree conflicts are easily enumerated

1.2 Other Potential Benefits

Repository entropy reduction
Orthogonal branching namespace at the root path level
Cross-repository branching and merging

1.2.1 Repository Entropy Reduction

Often a Subversion repository contains metadata that conveys no intentional information but only accidental information. Examples:

move with a copy-from revision older than (new rev - 1)
move with the delete half and the copy half committed in separate revisions
copy/branch from revision (new rev - N) where there is no change between that and (new rev - 1)
copy/branch with a subtree replaced with a different revision of itself, e.g. r122645 in ASF repo
no-op roll-back: replace with self copied from (new rev - 1)

In the new model, many such cases can be reduced to a true no-op.

move is atomic
branching from R < (new rev - 1)
- ### TBD whether and how it will store copy-from revision
copy from R < (new rev - 1)
- ### TBD how it will store copy-from revision
- if wanted, the new design could encourage updating the copy-from on update and on commit
no-op roll-back can be a true no-op if the UI encourages ’resurrect’ or ’roll-back’ command semantics rather than plain ’copy’

1.2.2 Orthogonal Branching

While it is designed for path-addressed branches, this model itself does not require that each branch is attached to a unique path. With little or no change to the basic model, we could support branching at the root path level within a repository, where branches are addressed in a name-space orthogonal to the path namespace, as used in popular DVCSs.

More or less the only addition that would be required is a method for addressing the branches: an extension to the branch id scheme, and extensions to the APIs for converting to and from path-space.

This could be useful both on its own and for greater compatibility with DVCSs.

1.2.3 Cross-Repository Branching

The model supports, in principle, cross-repository branching and merging. Element identifiers would need to be extended to include a repository identifier.

1.3 Potential Cons

No built-in way to retrospectively convert separately created subtrees to related branches.
The mental model will be unfamiliar to users, at first.

1.4 Unfinished Design

Branching — how to track the ’from’ branch and revision.

3-way merge — how to deal with a subtree root node.

Merge history.

1.5 TODO (in the document)

rename ’subbranch-root’ to something less like ’branch root’: perhaps ’branch-point’ or ’subbranch-element’?

2 Glossary

branch a set of elements arranged in a tree, with a branch-root element at the root of the tree; a leaf node of the tree can be a file node element or a subbranch-root element; a subbranch-root element is part of this branch but the subbranch it links to is not

branch-root the root element of a branch; carries its own payload, but its location is determined solely by the corresponding (linked) subbranch-root element of the outer branch; see also subbranch-root

eid element identifier, uniquely identifying an element within a branch and matching the equivalent element in another branch; an arbitrary token that supports comparison for equality and being used as an index key

element a file or directory or subbranch-root; the smallest unit of merging; versioned content is (depending on context) its parent element id and name, and/or its payload;

location the tree-structure part of the versioned content of an element: that is, the element’s parent-eid and name

payload the non-tree-structure part of the versioned content of an element: that is, the element’s properties (for a file or directory) and text (for a file)

subbranch-root the kind of element that links a subbranch to an outer branch; has a versioned parent-eid and name

target may refer to the target of a merge; or to the target path of a symlink, if symlinks are supported

Part II. The Model

3 Layers

Table 1 Layers of modelling

(client/UI)
Branch Management	possible future addition
Merge History	TBD; to be described here
Element Model	described here
Repository Storage	TBD; to be described elsewhere
(server/fs)

4 Element Model

4.1 Introduction

Subversion fundamentally tracks versions of a tree of the user’s primary data, or of multiple trees if the repository is used to version multiple projects. A versioned tree consists of:

nodes (files, directories) with their user properties and other user content
the tree structure

Subversion also tracks metadata describing the versions of and changes to the versioned tree, including:

branches
merge history

Historically, Subversion tracked the branches by using part of the pathname within the repository. In the element model, we must think of the user’s versioned content as being clearly distinct from the branch identification prefix of the in-repository path.

The structure and content of a versioned tree can be precisely defined as comprising the following attributes on each node:

payload:
- kind (file or directory), properties, text
location within versioned tree (except for the root of the versioned tree):
- parent node identifier
- name

We no longer think of a directory as holding a list of its children; instead, each child holds a single reference that identifies its parent. In this way, the high level conceptual idea of a “move” is modeled by a change to exactly one element.

The root node of a versioned tree must be treated specially. When merging changes from one branch of this versioned tree to another, only the payload of this root node must be merged; its parent and name are part of the external structure of the repository, the branching metadata, not part of this versioned tree.

Each branch root is modeled by a linked pair of elements: a subbranch-root element which belongs to the outer branch (and has a location but no payload), linked to a node element which is the root element of the inner branch (and has only a payload).

The root path of the repository is defined as being a root node of a branch. In typical Subversion work flows this would not be thought of as a branch. Any normal branch in a typical Subversion work flow would be modeled as a subbranch of the root branch. For example, we might create a subbranch-root element at the path ’trunk’, linked to a branch root node of the kind ’directory’ and having a (possibly empty) set of properties.

A subbranch-root element acts like a container that contains a nested subbranch. When merging such an element, the whole nested subbranch is treated as a unit and handled by a recursive application of the merge algorithm.

The term ’elements’ refers collectively to nodes and to subbranch-root elements.

4.2 Elements, Nodes, Paths, Payloads

An element is either a node element or a subbranch-root element.

A node element represents a file or a directory, or maybe in future a symlink. Each node element has versioned content called its payload. The payload for a file is its text and properties; the payload for a symlink is its target and properties. The payload for a directory is its properties only.

The children of a directory do not belong to the directory in this model. Instead, the child-parent relationships are modeled by each element having a “parent element” attribute and a “name” attribute.

The (parent, name, payload) of an element is its versioned content. The basic unit of change tracking (including merge tracking) is the change to the content of one element in one revision.

A subbranch-root element carries a location but no explicit payload. Instead, it represents the connection of a subbranch with its outer branch. The subbranch-root element’s fixed attribute “subbranch-root-eid” specifies the element id acting as root of the subbranch.

4.2.1 attributes of an element

fixed attributes:

eid
kind: ’node’ | ’subbranch-root’
node-kind: ’file’ | ’dir’ # iff kind=node
subbranch-root-eid # iff kind=subbranch-root

versioned attributes:

location: (parent-eid, name) # or <nil> when acting as a branch root
payload: (props, [text | target]) # iff kind=node

4.2.2 Repository Root

An new repository in Subversion is given a single directory node as its root element.

(eid=0, kind=’node’, node-kind=’dir’)

As a branch root element, the repository root element has no parent-eid or name and no outer branch, and so it cannot be moved away from the repository root path. However, it can be branched to another path, although in typical work flows that would not be done.

4.2.3 Paths

At each path there is exactly one node element (if there is anything at all). This carries the payload at that path, and in most cases carries the location information (parent and name) of that path as well.

At a branch boundary path there are two elements involved. The root element of the inner branch is a node and carries the payload only. A subbranch-root element belonging to the outer branch carries the location information (parent and name).

### “Within a revision, elements map to paths as follows:” [table showing an example] ?

4.3 Branch

A branch is addressed by the sequence of subbranch-root elements that must be followed, starting from the repository root, to reach it. The repository root branch is addressed by a zero-length sequence.

For example, with integer EIDs, the repository root branch address is ’’, while each first-level subbranch within it is addressed by a length-one sequence of EIDs such as ’53’ or ’12001’, and a second-level subbranch is ’53.14141’ or ’12001.12345’.

### Example as a table of paths and their branch ids?

The branching model supports whole-repository branching, as used in many DVCSs. While it has been assumed in the descriptions above that each branch root is mapped to a different path in a single repository-wide path-space, we could also provide a way to map branches to a new branching name-space orthogonal to paths.

5 3-way Merging

Here we cover the rules for calculating the result of a 3-way merge.

We assume the base state (usually a youngest common ancestor) and the other two states (variously called ’source’ and ’target’, or ’side 1’ and ’side 2’) are already known. The section on Merge History covers the higher layer of merging: deciding what changes need to be merged and what three-way merges (with which bases) will be used to accomplish it.

It is characteristic of this model that elements are uniquely paired across branches by their element id; there is no ambiguity about which element in one branch or state should be merged with which element in another.

The aspect of 3-way merging most relevant to the element branching model is how the tree shapes are merged. The payload of each element (such as its properties and text) can be merged with a traditional 3-way text merge. The merging of payload is not described here because it need not differ from traditional practice.

5.1 Stages of Tree Shape Merge

Calculation of a 3-way merge is largely per element in this model. For each element, we specify how to merge any two changes to that element. In the second stage we specify how to combine the per-element merges into a whole tree merge. In both stages we specify which combinations should be regarded as conflicting.

The two main stages:

Per element, 3-way merge the element’s content (parent-eid, name, payload). Identify in-element conflicts.
Check the result for multi-element conflicts (clashes, cycles and orphans).

It may be useful to (partially) order the merging of elements, so as to build up the result tree in parent-to-child order. This would ensure we don’t cause the user to spend effort in resolving conflicts in a subtree when a tree conflict occurring higher up the hierarchy might make them decide to throw away the whole subtree. However, it is impossible to determine a complete top-down ordering of the result tree in advance, in general, when the result will depend on conflict resolution choices.

5.2 Per element, 3-way merge

For each element that appears in any of the three sides of the merge, we merge the two changes to the element’s (parent-eid, name) pair according to Table 2↓.

In this table, each row represents an element, and a letter (O, R, X, Y) represents a particular value of the element’s (parent-eid, name) pair: different letters mean the parent-eid differs and/or the name differs. The semantics are symmetric with respect to Side 1 and Side 2; only one row of a symmetric pair is shown.

Base	Side 1	Side 2	Permissive Result	Strict Result	Notes
-	-	R	R	R	only add [1]
O	O	R	R	R	only mv [1]
O	O	-	-	-	only del [2]
-	R	R	R	conflict	duplicate add [1]
O	R	R	R	conflict	duplicate mv [1]
O	-	-	-	conflict	duplicate del
-	X	Y	conflict	conflict	add vs. add
O	X	Y	conflict	conflict	mv vs. mv
O	X	-	conflict	conflict	mv vs. del

Table 2 3-way merge of a single element

Notes:

“add” — includes a copy, for these purposes.
“conflict” — consider this a conflict, unless the user preconfigured what to do with it.
[1] — conflict if we detect a clash
[2] — conflict if we detect an orphaned subtree

5.3 Merging Parent and Name Separately

Table 2 suggests that an element’s location (parent-eid and name) should be treated as a unit. If a conflict occurs, the conflict relates to the parent-eid and name together. An alternative is to merge the two parts separately. This implies allowing a conflict to be raised and resolved for either or both parts independently.

Each part independently:

merge (’foo’ renamed to ’bar’) with (’foo’ moved to ’D/foo’): result is ’D/bar’.

Both parts together:

merge (’foo’ renamed to ’bar’) with (’foo’ moved to ’D/foo’): conflict.

The choice of whether a conflict should apply to both parts together or each part separately is one of user preference and the best choice would depend on the scenario. In essence it is a user interface issue. If the merge logic is to be provided in a library, this should be designed to handle the two parts separately so that the UI layer has the choice to treat them either together or separately.

The idea is exactly the same as allowing text and property conflicts on the same element to be handled separately.

It is logical to generalize the idea to treating any or all of the component parts of an element’s content (parent-eid, name, properties, text, ...) together or separately for the purposes of conflict handling.

5.4 Conflict Policies

Conflict detection is fundamentally subjective: any two changes could conflict semantically. The job of a good merge tool is to help the user by flagging combinations of changes that are likely to need human intervention, while quietly resolving the rest in a way that is likely to be satisfactory. Because the needs of users and the degree of semantic coupling depend on the use case, the tool therefore should not be configurable in what cases are reported as a conflict.

Configurability is provided by a merge policy. The policy is a statement of what combinations should be flagged as a conflict and what should be resolved automatically. As a starting point, one ’strict’ policy and one ’permissive’ policy are suggested. The policies suggested here differ only in their treatment of duplicate changes — cases where both sides of the merge are making the same change. In some merge scenarios it is known and intentional that many of the same changes are being made on both sides, and in those scenarios the permissive policy may be more helpful. In scenarios where a duplicate change is not expected and may indicate an error has been made, then the strict policy may be more helpful.

It is also helpful to have a different policy for user-initiated merges than for rebase-on-commit merges. The latter has historically been implemented in Subversion with a fairly strict policy. (Note also that due to the architecture of the commit process there is no option to store conflicts and have the user resolve them. Any conflict results in immediate failure of the commit.)

A merge policy can also specify which parts of an element’s content are merged together, as discussed in the section ’Merging Parent and Name Separately’ above.

5.5 Per-Element Conflicts

As seen in Table 2 above.

Duplicate add: the same element is added on both sides, at the same location/parent-eid/name.
Duplicate mv: the same element is moved on both sides, to the same location/parent-eid/name.
Duplicate del: the same element is deleted on both sides.
Add vs. add: the same element is added on both sides, at different locations/parent-eids/names.
Move vs. move: the same element is moved on both sides, to different locations/parent-eids/names.
Move vs. delete: the same element is moved on one side and deleted on the other side.

5.6 Multi-Element Conflicts

After or during the per-element merge, it is necessary to check for the following kinds of conflict that affect more than one element.

5.6.1 Clashes

There is a clash if two or more elements have the same parent-eid and the same name.

This is always a conflict.

5.6.2 Orphans

An element is an orphan if its parent is not present (is deleted or is not added) in the merge result.

When this applies to an element that is modified as a result of the merge, that is a conflict.

When this applies to an element that is not modified as a result of the merge, the element is silently deleted. This means that deleting a parent directory causes all its children in the target to be deleted, not only the children that it had in the source.

A merge policy option could enable a stricter option here.

5.6.3 Cycles

There is a cycle if following the parent-eid chain from an element leads, sooner or later, back to itself rather than to the branch root.

This is always a conflict.

5.7 Subtree Merge

When merging a ’subtree’ it is necessary to decide what exactly is meant by a ’subtree’. Historically a subtree was defined by specifying a subtree root element. Now that moves are supported, a given element can be inside the given subtree root element on one ’side’ of the merge and outside it on another side of the merge.

Suggestions:

merge changes to whole elements: never merge ’half’ of a move
take the union of the elements in the subtrees at each side of the merge
if the merged result for any element ’moves’ the element outside the target, in other words to a parent that does not exist in the target, that is a conflict

The ’location’ attributes of the subtree root element must be ignored when merging.

### That’s inconsistent.

Note that a given element may be the root of one branch and a subtree in another branch.

6 Merge History

Merge history is required to track which changes are present in which branches.

6.1 Requirements

Merge history should:

track original changes separately from merges
track changes that originated in this branch as well as in other branches
track changes at the granularity of branch-element-revs

The system should enable merge history to be treated as a DAG in cases where ’complete’ merging is being performed. In other words, after a complete merge, it should be clear from the merge history that there is a point that can be treated as a youngest common ancestor.

Merge history also provides the metadata necessary to answer “from which branch and revision was this branch branched?” — see the section on Copying.

6.2 Subtree Merge History

Merge history shall indicate that a change is logically present in the branch. This means the effect of the change is in the branch, even if no longer in the same form as the original change. Therefore mergeinfo does not inherently belong at the element level, but at the whole branch level.

### Is mergeinfo attached to a “subtree” or rather a subset of elements, in order to track subtree merges?

The smallest unit of tracked changes shall be the change to the versioned content of one element (parent-eid, name, payload) in one revision.

### How to deal with merging only the payload of the subtree root element in a subtree merge, separately from merging the location change of the same element in an outer merge? Perhaps complain if both the payload and the location of that node changed, as it’s an unlikely scenario in many use cases. But that’s not rigorous.

### ...

6.3 Design

### TBD

7 Copying

In old svn, copying inserted metadata that said “node-rev foo@100 was copied from /bar@95”. Apart from being visible in “log -v” and “info”, the only way this affected behaviour was that commands such as “log” or “merge” with this node as the target would follow copies backwards. This could be useful for simulating moves, to a degree. But a copy was not recognized as an operation that could be merged to another branch. Instead, merge treated a copy the same as a simple add; in both cases, the result was a copy from the source branch.

Copying as a separate concept is no longer much needed, as the model natively supports the following cases where a ’copy’ was previously used:

moving
branching
merging a node creation into another branch
resurrecting (the latest version or an older version, including roll-back by resurrecting an older version to replace the current version)

The only remaining case for copying is:

making a new node or subtree
- that starts off identical to an existing (past or present) one
- and that records a historical link to it

There is no need for this kind of ’plain’ copying to be modeled explicitly at this level.

Recommendation:

the copy-from for a plain copy is recorded as revision-property metadata, or at a higher layer of model

7.1 How New Features Replace Copying

The copy-from metadata is replaced in the following way in the new model, for each case that was previously represented by a copy:

### DIAGRAMS

moving
- normal (in-branch) — history is perfectly represented
- cross-branch — use merge history to represent from-branch
- copy from older version — treat as move plus roll-back (use merge history to represent from-revision)
branching — use merge history to represent from-branch; copying from non-head: treat as roll-back?
merging a creation — use merge history to represent from-branch; from-revision is meaningless
ressurecting
- normal (in branch) — use merge history to represent any roll-back
- cross-branch — similar to merging a creation
making a new node or subtree — use merge history?

7.2 Merge History as Copy-From Metadata

Merge history can replace much of the semantics of copy-from metadata.

### basically:

on copying, set the history to ’COPYFROMPATH : all revs up to COPYFROMREV’
on reading, if the history looks like this (ignoring further merges) then report the history as ’copied from ...’, else don’t

7.3 Translating from new model back to ’copy-from’ metadata

###

8 Branch Management

The ’elements’ model deliberately does not address branch management. By branch management I mean things such as

whether each project contains its own branches, or each branch contains multiple projects;
branch promotion models, governing which kinds of merge are allowed between which branches;
designating branches as ’dev’ or ’release’ or ’mainline’;
where/when/whether subbranching is allowed.

Branch management should be modelled and implemented at a higher layer.

9 The Working Copy

The fundamental purpose of the working copy (WC) is to hold a base state and local changes against that state, in order to build (offline) a new commit. A single commit, in Subversion so far.

As such, the design of the WC must match the design of the repository. A move-tracking WC will work with the element model. The WC represents a base state and a working state, each comprising a set of elements. It may be stored as a complete base state and a difference.

Special considerations:

mixed-rev: each element at its own rev; each element appears only once (no ’appears at two paths’ confusion)
...

9.1 Checkout, Update, Switch, Commit

Basic WC state operations including Checkout, Update, Switch, Commit, Diff and Merge shall all make use of basic element-mode diff or edit APIs. For most of these operations a contextual diff is required in order to provide a good merge.

(Commit requires less context because its rebase merge is more restricted: for example, it does not merge text or properties so no context is required for them. Update and Switch perhaps do not need context in areas where there are no local changes. Checkout probably does not require any context.)

9.2 Mixed-Rev WC Operation

Each element in the WC has its own base revision. A commit will update only the elements for which it commits a change, resulting in a mixed-rev WC. Any element of set of elements can be updated separately, resulting in a mixed-rev WC.

An update shall convey a set of per-element changes. These can include moving elements. When an update attempts to transform the base state into a non-tree configuration (such as with a cycle or a name clash or a deleted parent) the update shall be rejected and the WC unchanged. An update shall only succeed if it results in a tree configuration for the new base state.

A successful update to the base state shall be merged with the working state to produce the new working state. Any resulting tree conflicts shall be recorded in the WC to be resolved by the user, like in old Subversion except the kinds of conflicts are different.

Consequences:

elements may be at different base revisions without ambiguity in the semantics
there is no such thing as a broken/split/half move