Merge Tracking Functional Specification

*** UNDER CONSTRUCTION ***

Merge tracking functional specification. Describes Subversion 1.5.0, except where noted as unimplemented.

TODO: Describe how each requirement will actually function for Subversion. Remove redundancies.

Diff/Status operations

Output is shown the same as pre-Merge Tracking, except for:

Diffs pretty-print changes to merge info in an easily human-readable form.
Diffs sometimes report spurious property changes from merge info (bug?).
Status represents changes to the merge info for the root of a tree as a property change.

Copy/Move operations

Copy and move operations handle two types of merge info:

Explicit: The pre-existing value of the svn:mergeinfo property on the source path.
Implicit: All revisions represented by the object at the source path (from its "appeared in" revision to its current revision).

Repository Access operation

Copy/move operations which contact the repository include:

WC to URL (code in progress, tests complete, copy test #11 still failing over ra_dav)
URL to WC
URL to URL

These operations always propogate both explicit and implicit merge info. Other than the inclusion of merge info, operation is effectively the same as pre-Merge Tracking.

Working Copy to Working Copy operation

Pre-Merge Tracking, WC to WC operations occurred offline (e.g. with no repository access). This is a typical behavior of refactoring tools (e.g. IDEs like Eclipse), and is very useful when offline (e.g. on an airplane or subway, or at a cafe).

However, to propogate merge info during copy/move operations, access to both a path's comprehensive merge info and its history is necessary. To preserve offline operation, the Merge Tracking implementation supports two modes:

A compatibility mode, which neither contacts the repository, nor does any merge info propogation (unless a copy source's merge info has been locally modified, in which its value is propogated the as any Subversion revision property).
A mode which requires repository access (e.g. isn't offline), but which propogates all merge info from source path to destination (unimplemented, start with copy test #31).

This behavior is comparable to the difference between svn status and svn status -u.

While some state indicating delayed merge info retrieval and handling could instead be stored in WC to preserve offline operation, there are complications with this when subsequent uncommited revert operations should change the merge info (we'd have to store negative merge info in the WC).

Merge-related Meta Data

Merge Tracking meta data is stored in housekeeping properties (e.g. svn:mergeinfo).

Meta Data Manipulation

While direct manipulation of housekeeping properties can be used to change merge info, commands to manipulate this information have been provided. Either style of operation supports adjustment of merge info when manual merges occur, and can also be used to fulfill block changes undesired for merge (later, this might be better-addressed by a separate housekeeping property).

merge --record-only adds (or subtracts, if a reversed revision range is supplied) merge info for a path without performing the actual merge.
propedit/propset changes merge info for a path.
propdel removes mere info for a path.

Meta Data Audit and Query

These features may or may not be completed for 1.5.0.

Change Set Merge Availability (TODO)
Find Change Set (TODO)
Commutative Author and Revision Reporting

Commutative Author and Revision Auditing

Scope

Most commands which show username and merge information should also respect merge information and support Commutative Auditing. These commands, collectively referred to auditing commands, are:

svn log
svn blame
svn status --show-updates

svn info is purposely not included in this list, on the grounds that one would typically need more information than it can reasonably provide.

A new switch, --merge-sensitive, along with a corresponding single-character shortcut, will be introduced for the auditing commands. Using it will enable these commands to show the additional information gleaned from parsing and processing the merge info on the targets in question. This switch will also work with --xml to include additional merge information. The new functionality added by --merge-sensitive is as follows.

svn log

The original log message, in the current format, with the addition of a list of revisions and merge source paths that have been merged into the target. The output for log should be consistent with the diff output for the svn:mergeinfo property.

The --verbose switch will output the log information for the merged revisions as well. This output may be in the style of svnmerge.py: the primary log message, followed by each of the original log messages indented with separators between them.

svn blame

Two additional columns for each line, with the original revision and author of that line. Unlike other commands, we do not need to worry about multiple source revisions, because each line can have at most one author.

svn status --show-updates

Add additional columns, reflecting the last original authors and revisions.

Pending Questions

How will --merge-sensitive behave for commits which remove merge info (e.g. reverts)?
In the case of svn log, would the user be better served if we just included the original revision logs in line with the logs (i.e., no special indentation, etc.)?
What about svn ls --verbose, which also shows revisions and usernames?

Additional Features

Although not part of the initial implementation, additional features have been suggested:

A configuration option to always enable --merge-sensitive.

Repeated Merge

There are two general schemes for solving the repeated merge problem. Subversion 1.5 uses the Most Recent Common Ancestor (MRCA) approach. If a later version of Subversion (e.g. 2.0) overhauls the Merge Tracking implementation, it'll likely use the Ancestry Set (AS) approach.

Either solution also supports the cherry picking, rollback, and property merging use cases. A merge preview which is lighter-weight than an uncommitted merge into a WC is not supported.

The Most Recent Common Ancestor approach

In this scheme, An optional set of merge sources in each node-revision. When asked to do a merge with only one source (that is, just svn merge URL, with no second argument), you compute the most recent ancestor and do a three-way merge between the common ancestor, the given URL, and the WC.

To compute the most recent ancestor, you chain off the immediate predecessors of each node-revision. The immediate predecessors are the direct predecessor (the most recent node-revision within the node) and the merge sources. An interleaved breadth-first search should find the most recent common ancestor.

The Ancestry Set approach

In this scheme, you record the full ancestry set for each node-revision -- that is, the set of all changes which are accounted for in that node-revision. (How you store this ancestry set is unimportant; the point is, you need a reasonably efficient way of determining it when asked.) If you are asked to "svn merge URL", you apply the changes present in URL's ancestry but absent in WC's ancestry. Note that this is not a single three-way merge; you may have to apply a large number of disjoint changes to the WC.

For a longer description of this approach, see the "Merging and Ancestry" section of the original design doc.

Ancestry-Sensitive Line-Based Merge

Make 'hunks' of contextually-merged text sensitive to ancestry.

A high-resolution version of repeated merge. Rather than tracking whole changesets, we track the lineage of specific lines of code within a file. The basic idea is that when re-merging a particular hunk of code, the contextual-merging process is aware that certain lines of code already represent the merging of particular lines of development. Jack Repenning has a great example of this from ClearCase (see ASCII diagram below).

See the variance adjusted patching document for an extended discussion of how to implement this by composing diffs; see svn_diff_diff4() for an implementation of same. We may be closer to ancestry-sensitive merging than we think.

Here's an example demonstrating how individual lines of code can be tracked. In this diagram, we're drawing the lineage of a single file, with time flowing downwards. The file begins life with three lines of text, "1\n2\n\3\n". The file then splits into two lines of development.

                    1     
                    2     
                    3     
                  /   \   
                 /     \  
                /       \ 
            one           1   
            two           2.5 
            three         3   
             |     \      |
             |      \     |   
             |       \    |            
             |        \   |            
             |         \ one                ## This node is a human's
             |           two-point-five     ## merge of two sides.
             |           three        
             |            |
             |            |
             |            |
            one          one
            Two          two-point-five
            three        newline       
               \         three  
                \         |   
                 \        |
                  \       |
                   \      |
                    \     |
                     \    |
                      \   |
                       \  |
                         one                ## This node is a human's
                         Two-point-five     ## merge of the changes
                         newline            ## since the last merge.
                         three

It's the second merge that's important here.

In a system like Subversion, the second merge of the left branch to the right will fail miserably: the whole file's contents will be placed within conflict markers. That's because it's trying to dumbly apply a patch that changes "1\n2\n3" to "one\nTwo\nthree", and the target file has no matching lines at all.

A smarter system (like Clearcase) would remember that the previous merge had happened, and specifically notice that the lines "one" and "three" are the results of that previous merge. Therefore, it would ask the human only to deal with the "Two" versus "two-point-five" conflict; the earlier changes ("1\n2\n3" to "one\ntwo\nthree") would already be accounted for.

Comparisons, Arguments, and Questions

AS allows you to merge changes from a branch out of order, without doing any bookkeeping. MRCA requires you to merge changes from a branch in order.

MRCA is simpler to implement, since it results in a three-way merge (which is well-understood by Subversion). However, it may not handle all edge cases. For instance, it may break down faster if the merging topology is not hierarchical.

MRCA may be easier for users to understand, even though AS is probably simpler to a mathematician.

Consistency with other modern version controls systems is desirable.

If a user asks to merge a directory, should we apply MRCA or AS to each subdirectory and file to determine what ancestor(s) to use? Or should we apply MRCA or AS just once, to the directory itself? The latter approach seems simpler and more efficient, but will break down quickly if the user wants to merge subdirectories of a branch in advance of merging in the whole thing.

Merge Conflict Resolution

Merging inevitably produces conflicts which cannot be resolved by an algorithm alone. In such a case, human intervention is required to resolve the conflicts. The merge algorithm used by Subversion's Merge Tracking implementation makes this problem worse, since it breaks a requested merge range into several merges to avoid repeating merges which have already been applied to a merge target or its children.

To help alleviate the pain of conflict resolution, a merge conflict resolution callback can be employed by Subversion clients (unimplemented). This callback is invoked whenever merge conflicts are encountered, and can takes steps like launching a graphical merge tool (for interactive conflict resolution), or following a pre-specified directive like "always use the version from my merge source". This last implementation can be used to support the SCM automated merge use case.

In a future release, the command-line client may supply a merge conflict resolution callback which will behave much like svk, when in interactive mode displaying some context for each conflict and prompting for how to resolve it, or when in non-interactive mode, taking directives beforehand (unimplemented).

Related discussion from the dev@ mailing list can be found here:

Feedback solicited from IDE developers
Original API proposal (likely requires changes)

Issue #2022 is loosely related.

Distribution of Conflict Resolution

No explicit facility is provided for distribution of conflict resolution. To support this use case, developers can co-ordinate with each other to resolve merge conflicts on portions of a tree, and trade patches.

Migration and Interoperability

Migration

No explicit steps are necessary to migrate the content of a pre-Merge Tracking repository. Only an upgrade to Subversion 1.5.0 is necessary.

TODO: Merge meta data from svnmerge.py. Dan Berlin has written Python code to perform this migration; it needs to be made available in the tools/server-side/ area of the distribution .

Interoperability

Executive summary for client/repository inter-op:

Older Subversion clients may interact with a 1.5.x+ Subversion repository, but will continue to lack Merge Tracking functionality for:
- Recording meta data about any merges performed.
- Using merge meta data to avoid repeated merging.
1.5.x+ Subversion clients may interact with a older Subversion repositories, with Merge Tracking functionality effectively neutralized.

Gory detail for client/repository inter-op:

A repository 1.4.x- doesn't provide any way to retrieve inherited merge info for a path (regardless of client version). For a 1.5.x+ client which could theoretically make use of any merge info available to it, this will typically neutralize its Merge Tracking functionality. The one case where merge info might come into play is when the merge info for a path is available locally (e.g. in the client's WC); in this case, repeated merges may be avoided.
A 1.5.x client will record merge tracking meta data for merges performed, regardless of repository version. However, a repository 1.4.x- won't know to do anything special with this merge info. When the repository is upgraded to 1.5.x+, we'll retain this merge info in the svn:mergeinfo property, but I'm not yet clear on what'll happen to the sqlite merge info index. We may need some sort of upgrade path here, but don't have one yet, and aren't promising one.

Subversion dump files continue to be fully portable between pre- and post-Merge Tracking versions of Subversion.

$Date$