Merge Tracking Functional Specification

Merge tracking functional specification. Describes Subversion 1.5.0, except where noted as unimplemented. This is a living specification, which will change as features are added or refined.

Merge operations

Merge operations involving a single source URL (e.g. svn merge -cN URL) allow the revision range and source URL parameters to be optional. The revision range defaults to "all unmerged revisions", while the source URL is inferred using a combination of merge info and copy history. When a revision range is not provided, merge operations are not able to "revert" changes (e.g. a la svn merge -c -7 URL).

See the repeated merge section below for discussion of various merge algorithms, and details on the merge algorithm used.

Merge info is not taken into consideration for three-way merges, merge operations which do not specify identical "from" and "to" URLs (e.g. svn merge FROM_URL@REV1 TO_URL@REV2). In the future, Subversion will likely support this, but currently lacks sufficient history and merge info between the repository and client to perform this operation in a reasonable manner. The primary use case this will impact is vendor branches.

Diff/Status operations

Output is shown the same as pre-Merge Tracking, except for:

Copy/Move operations

Copy and move operations handle two types of merge info:

Explicit
The pre-existing value of the svn:mergeinfo property on the source path.
Implicit
All revisions represented by the object at the source path (from its "appeared in" revision to its current revision).

Repository Access operation

Copy/move operations which contact the repository include:

  • WC to URL
  • URL to WC
  • URL to URL

These operations always propagate both explicit and implicit merge info. Other than the inclusion of merge info, operation is effectively the same as pre-Merge Tracking.

Working Copy to Working Copy operation

Pre-Merge Tracking, WC to WC operations occurred offline (e.g. with no repository access). This is a typical behavior of refactoring tools (e.g. IDEs like Eclipse), and is very useful when offline (e.g. on an airplane or subway, or at a cafe).

However, to propagate merge info during copy/move operations, access to both a path's comprehensive merge info and its history is necessary. To preserve offline operation, the Merge Tracking implementation supports two modes:

  • A compatibility mode, which neither contacts the repository, nor does any merge info propagation (unless a copy source's merge info has been locally modified, in which its value is propagated the as any Subversion revision property). This is the default mode of operation.
  • A mode which requires repository access (e.g. isn't offline), but which propagates all merge info from source path to destination. This mode is activated via the --use-merge-history option.

This behavior is comparable to the difference between svn status and svn status -u.

While some state indicating delayed merge info retrieval and handling could instead be stored in WC to preserve offline operation, there are complications with this when subsequent uncommitted revert operations should change the merge info (we'd have to store negative merge info in the WC).

Sparse Checkouts

When merging to a WC with sparsely populated directories, non-inheritable mergeinfo for the merge is set on the deepest directories present.

Rationale

[JAF] The above may be the actual behaviour but sounds too simplistic to be the desired behaviour. Although a simple depth attribute is recorded for each dir in the WC, the ambient depth of a dir in the WC is not simply one of empty/files/immediates/infinity, but rather is a tree in which different branches are populated to different depths. Surely the merge ought to respond to the ambient depth rather than the simple depth attribute.

[JAF] It's not clear whether an incoming addition should be honoured if the added node would fall outside the ambient depth. The sparse- directories design doesn't seem to address this case for updates, let alone merges. It would be silly if a merge could delete such a node (that was present in the WC despite being outside its parent's 'depth' attribute) and could not then re-add a node of the same name in order to perform both halves of an incoming replacement. Issue #4164 "inconsistencies in merge handling of adds vs. edits in shallow targets" is related.

See sparse-directories design.

Switched Paths

Switched paths are treated as the root of a working copy regarding mergeinfo inheritance, recording, and elision. Specifically, for a merge target with an arbitrary switched subtree:

  WC-Root
     |
  Target
     |
    SSP
     | \
     |  \
    SS   SSS

  Target - The WC target of the merge operation (may be same as WC-Root)
  SSP    - Switched Subtree Parent (may be same as Target)
  SSPS   - Switched Subtree Sibling (zero or more)
  SS     - Switched Subtree

Note: If SS is itself the target of the merge, then the no special handling is needed, the merge takes place as if SS is the root of the WC.

Delete

Issue #4163 "merged deletion of switched subtrees records non-inheritable mergeinfo": If a merge deletes the path SS, the desired behaviour is currently undefined and the actual behaviour is that a commit will delete both SS (from SSRP) and SS@BASE (from SSP).

Rationale

Why does merging work this way with switched subtrees?

If a subtree (SS) is switched, that means the user has chosen for the time being to work with a substitute for the original subtree (SS@BASE), knowing that any modifications made in SS can be committed only to the repository location of SS and the original subtree SS@BASE remains hidden and unaffected.

The general semantics of a merge is to apply local modifications to the working copy and record the merge as having been applied to the tree that is represented by the working copy.

Merge tracking should ensure that the subtree of the merge that goes into SS is recorded as being applied to SS, while the subtree SS@BASE should be recorded as not having received that merge.

Since the working copy represents parts of two different branches, two parts of the merge are thus applied to the two different branches, and recorded as such when the user commits the result.

If the user is doing a merge that may affect SS, it is reasonable to assume that SS is an alternative variant of SS@BASE rather than some totally unrelated item. So, in terms of Subversion's loose branching semantics, SS is a 'branch' of SS@BASE. If the user chooses to merge when the assumption is false and SS doesn't have a sensible branching relationship with SS@BASE, the result will be nonsensical or, in concrete terms, there will be merge conflicts.

Note: Many typical branching policies would forbid committing to two branches at once, let alone committing merges to two branches at once. However, the user may have reasons for doing this merge without intending to commit the result as-is.

Property change operations

Property changes from propedit, propset, and propdel operations can be used to change merge info. However, as these operations do not attempt to address merge info inheritance, changes to merge info on a directory affects merge info on any child paths.

Merge Info Elision

Merge info set on a working copy "child" path as a result of a merge, switch, or update, may fully/partially elide to the path's nearest working copy or repository ancestor with fully/partially equivalent merge info. Elision is attempted as part of any merge/switch/update:

Full Elision

  • Simple Equivalency: The merge info on a child path and the merge info on its nearest ancestor both map the same set of source paths to the same revision ranges. The "same" source path is taken to mean that the only differences between the two are relative path differences which are exactly the same as the relative path differences between the child and the ancestor, e.g.:
      Properties on '/A_COPY_2':
        svn:mergeinfo : /A:4-9
      /A_COPY:3
      Properties on '/A_COPY_2/B/E':
        svn:mergeinfo : /A/B/E:4-9
      /A_COPY/B/E:3
    The merge info on 'A_COPY_2/B/E' elides to 'A_COPY_2' because the only differences between the merge source paths on each is 'B/E' which is the same as the relative path difference between 'A_COPY_2/B/E' and 'A_COPY_2'.
  • Semantic Equivalency: Excepting paths unique to the child or ancestor which map to empty revision ranges, the merge info between the child and ancestor is otherwise equivalent per the definition of simple equivalency, e.g.:
      Properties on '/A_COPY_2':
        svn:mergeinfo : /A:4-9
      /A_COPY:
      Properties on '/A_COPY_2/B/E':
        svn:mergeinfo : /A/B/E:4-9
    
      Properties on '/A_COPY_2':
        svn:mergeinfo : /A:4-9
      Properties on '/A_COPY_2/B/E':
        svn:mergeinfo : /A/B/E:4-9
      /A_COPY/B/E:
    In both of the above examples the merge info on 'A_COPY_2/B/E' elides to 'A_COPY_2'.
  • Empty Revision Range Equivalency: If the merge info on a path is made up entirely of paths mapped to empty revision ranges and the path has no ancestor with merge info then the merge info fully elides.

Partial Elision

  • Partial Equivalency: Only the merge source paths unique to the child which map to empty revision ranges will elide if the merge info between the child and ancestor is otherwise non-equivalent, e.g.:
      Properties on '/A_COPY_2':
        svn:mergeinfo : /A:4-6
      Properties on '/A_COPY_2/B/E':
        svn:mergeinfo : /A/B/E:5
      /A_COPY/B/E:
    The empty revision range merge info from 'A_COPY/B/E' on 'A_COPY_2/B/E' elides, leaving:
      Properties on '/A_COPY_2':
        svn:mergeinfo : /A:4-6
      Properties on '/A_COPY_2/B/E':
        svn:mergeinfo : /A/B/E:5

Elision and Non-Inheritable Revision Ranges

The above rules apply only to mergeinfo without non-inheritable revision ranges. Mergeinfo with non-inheritable revision ranges cannot elide or be elided to.

Merge History

Merge Tracking meta data is stored in housekeeping properties (e.g. svn:mergeinfo).

Merge History Manipulation

While direct manipulation of housekeeping properties can be used to change merge info, commands to manipulate this information have been provided. Either style of operation supports adjustment of merge info when manual merges occur, and can also be used to fulfill block changes undesired for merge (later, this might be better-addressed by a separate housekeeping property).

  • merge --record-only adds (or subtracts, if a reversed revision range is supplied) merge info for a path without performing the actual merge.
  • propedit/propset changes merge info for a path.
  • propdel removes merge info for a path.

Merge History Audit and Query

The Commutative Author and Revision Reporting feature has been implemented, and will be included in 1.5.0.

These features may or may not be completed for 1.5.0.

Changeset Merge Availability

Show changesets available for merge/already merged from one or more merge source(s). The command-line client's default output format should be equivalent to that of svn log, and allow for XML-formatted output (for machine parsing). Blue sky, the command-line could also produce an output format equivalent to that of svn diff.

Recent discussion can be found here. Development is tracked here.

The Show Changesets Blocked from Merging portion of this feature is scheduled for implementation post-1.5, and is dependent upon the revision blocking feature slated for the same timeframe.

Find Changeset

Show where a changeset has been merged from/merged to, providing merging revision, URL, and rangelist. The command-line client should allow for XML-formatted output (for machine parsing).

Recent discussion can be found here. Development is tracked here.

The Find Paths containing Specific Incarnation of Versioned Resource portion of this feature is not yet scheduled for implementation.

Commutative Author and Revision Reporting

Scope

The following commands which show username and merge information should respect merge information and support Commutative Reporting. These commands are:

  • svn log
  • svn blame

svn info, svn ls --verbose and svn status --show-updates are purposely not included in this list. While one would typically need more information than they can reasonably provide alone, adding more output to these commands would clutter their command-line interface, reducing their utility. Merge Tracking-aware API support for the underlying functionality provided by these commands may be added at some point in the future (e.g. for use by third-party clients like TortoiseSVN).

A new switch, --use-merge-history, along with a corresponding single-character shortcut (-g), will be introduced for the toggle merge information. Using it will enable these commands to show the additional information gleaned from parsing and processing the merge info on the targets in question. This switch will also work with --xml to include additional merge information. The new functionality added by --use-merge-history is as follows.

svn log

The original log message(s), in the current format, with the addition of a list of revisions and merge source paths that have been merged into the target. The output for log should be consistent with the diff output for the svn:mergeinfo property.

The --verbose switch will output the log information for the merged revisions in place of the information for the revision in which the merge occurred. Each of the original message(s) will have an additional line indicating that it is the result of a merge, and which revision the merge occurred in.

For instance, if Alice was the original author of r12, Bob was the orginial author of r14, and Chuck merged them both r12 and r14 as part of r24, the output of svn log --use-merge-history will look like this:

------------------------------------------------------------------------
r24 | chuck | 2007-04-30 10:18:01 -0500 (Mon, 16 Apr 2007) | 1 line
 
Merge r12 and r14 from branch to trunk.
------------------------------------------------------------------------
r14 | bob | 2007-04-16 18:50:29 -0500 (Mon, 16 Apr 2007) | 1 line
Result of a merge from: r24

Remove inadvertent changes to Death-Ray-o-Matic introduced in r12.
------------------------------------------------------------------------
r12 | alice | 2007-04-16 19:02:48 -0500 (Mon, 16 Apr 2007) | 1 line
Result of a merge from: r24

Fix frapnalyzer bug in frobnicator.
  

Using the same example, the output of svn log --use-merge-history --verbose will look like:

------------------------------------------------------------------------
r24 | chuck | 2007-04-30 10:18:01 -0500 (Mon, 16 Apr 2007) | 1 line
Changed paths:
   M /trunk/death-ray.c
   M /trunk/frobnicator/frapnalyzer.c
 
Merge r12 and r14 from branch to trunk.
------------------------------------------------------------------------
r14 | bob | 2007-04-16 18:50:29 -0500 (Mon, 16 Apr 2007) | 1 line
Changed paths:
   M /branches/world-domination/death-ray.c
Result of a merge from: r24

Remove inadvertent changes to Death-Ray-o-Matic introduced in r12.
------------------------------------------------------------------------
r12 | alice | 2007-04-16 19:02:48 -0500 (Mon, 16 Apr 2007) | 1 line
Changed paths:
   M /branches/world-domination/frobnicator/frapnalyzer.c
   M /branches/world-domination/death-ray.c
Result of a merge from: r24

Fix frapnalyzer bug in frobnicator.
  

If r12 was itself a merge of r9 and r10, svn log --use-merge-history for r24 will look like this:

------------------------------------------------------------------------
r24 | chuck | 2007-04-30 10:18:01 -0500 (Mon, 16 Apr 2007) | 1 line
 
Merge r12 and r14 from branch to trunk.
------------------------------------------------------------------------
r14 | bob | 2007-04-16 18:50:29 -0500 (Mon, 16 Apr 2007) | 1 line
Result of a merge from: r24

Remove inadvertent changes to Death-Ray-o-Matic introduced in r12.
------------------------------------------------------------------------
r12 | alice | 2007-04-16 19:02:48 -0500 (Mon, 16 Apr 2007) | 1 line
Result of a merge from: r24

Fix frapnalyzer bug in frobnicator.
------------------------------------------------------------------------
r10 | alice | 2007-04-16 19:02:28 -0500 (Mon, 16 Apr 2007) | 1 line
Result of a merge from: r12, r24

Fix frapnalyzer documentation.
------------------------------------------------------------------------
r9 | bob | 2007-04-16 19:01:48 -0500 (Mon, 16 Apr 2007) | 1 line
Result of a merge from: r12, r24

Whitespace fixes.  No functional change.
  

In each case, merged revisions will be grouped together under the merging revisions, and sorted by revision number. This may mean that not all log messages will be in revision number order, but changes will be presented in the order they were actually made.

Output for svn log -g --xml will exploit the tree structure of XML to include child messages as subelements of the corresponding parent log messages. Consumers can use the location of a particular log message in the tree to determine its ancestry.

This output may be useful for auditing purposes to those migrating from svnmerge.py, as a replacement for repeating the entirety of the merged ranges' log messages in the log message for the commit of a merge (e.g. svnmerge.py's generated svnmerge-commit-message.txt file).

svn blame

Reuse the existing author and revision columns. Instead of listing the merging author and revision, list the original author and revision of that line. Unlike other commands, we do not need to worry about multiple source revisions, because each line can have at most one author.

The output of svn blame may be the following:

     2    alice   This is the file 'iota'.
    14    bob     'A' has changed a bit, with 'upsilon', and 'xi'.  
  

Using the -g switch will show the author who most recently changed the line, independent of which branch it was changed on (so long as the changes have been merged). If Chuck made changes to the file in r11, which was then merged in r14, the output of svn blame -g for the same file may look like this:

     2    alice   This is the file 'iota'.
    11    chuck   'A' has changed a bit, with 'upsilon', and 'xi'.  
  

The --verbose flag also triggers additional information. In addition to the date of the revision, --verbose, in combination with -g, also displays the original path, relative to the repository root, where the modifications were made. Given the above example, svn blame -g --verbose will be something like this:

     2    alice   2007-06-07 10:16:49 -0500 (Thu, 07 Jun 2007) /trunk/iota       This is the file 'iota'.
    11    chuck   2007-06-07 12:29:48 -0500 (Thu, 07 Jun 2007) /branches/a/iota  'A' has changed a bit, with 'upsilon', and 'xi'.  
  

The output of svn blame -g --xml is not limited by size, and will include all available information.

For commits which remove merge info (e.g. reverts), --use-merge-history will trace back to the original author. For example if Alice makes a commit to code previously modified by Bob (committed with no merge history), and Alice's commit is subsequently reverted by Chris, we should show Bob as the author. If Bob's commit was itself the result of a merge, we should recurse until we find a commit which did not add merge info (the leaf node), and assume its author.

Pending Questions
Additional Features

Although not part of the initial implementation, additional features have been suggested:

  • A configuration option to always enable --use-merge-history.

Repeated Merge

There are two general schemes for solving the repeated merge problem. Subversion 1.5 uses the Most Recent Common Ancestor (MRCA) approach. If a later version of Subversion (e.g. 2.0) overhauls the Merge Tracking implementation, it'll likely use the Ancestry Set (AS) approach.

Either solution also supports the cherry picking, rollback, and property merging use cases. A merge preview which is lighter-weight than an uncommitted merge into a WC is not supported.

The Most Recent Common Ancestor approach

In this scheme, An optional set of merge sources in each node-revision. When asked to do a merge with only one source (that is, just svn merge URL, with no second argument), you compute the most recent ancestor and do a three-way merge between the common ancestor, the given URL, and the WC.

To compute the most recent ancestor, you chain off the immediate predecessors of each node-revision. The immediate predecessors are the direct predecessor (the most recent node-revision within the node) and the merge sources. An interleaved breadth-first search should find the most recent common ancestor.

The Ancestry Set approach

In this scheme, you record the full ancestry set for each node-revision -- that is, the set of all changes which are accounted for in that node-revision. (How you store this ancestry set is unimportant; the point is, you need a reasonably efficient way of determining it when asked.) If you are asked to "svn merge URL", you apply the changes present in URL's ancestry but absent in WC's ancestry. Note that this is not a single three-way merge; you may have to apply a large number of disjoint changes to the WC.

For a longer description of this approach, see the "Merging and Ancestry" section of the original design doc.

Ancestry-Sensitive Line-Based Merge

Make 'hunks' of contextually-merged text sensitive to ancestry.

A high-resolution version of repeated merge. Rather than tracking whole changesets, we track the lineage of specific lines of code within a file. The basic idea is that when re-merging a particular hunk of code, the contextual-merging process is aware that certain lines of code already represent the merging of particular lines of development. Jack Repenning has a great example of this from ClearCase (see ASCII diagram below).

See the variance adjusted patching document for an extended discussion of how to implement this by composing diffs; see svn_diff_diff4() for an implementation of same. We may be closer to ancestry-sensitive merging than we think.

Here's an example demonstrating how individual lines of code can be tracked. In this diagram, we're drawing the lineage of a single file, with time flowing downwards. The file begins life with three lines of text, "1\n2\n\3\n". The file then splits into two lines of development.

                    1     
                    2     
                    3     
                  /   \   
                 /     \  
                /       \ 
            one           1   
            two           2.5 
            three         3   
             |     \      |
             |      \     |   
             |       \    |            
             |        \   |            
             |         \ one                ## This node is a human's
             |           two-point-five     ## merge of two sides.
             |           three        
             |            |
             |            |
             |            |
            one          one
            Two          two-point-five
            three        newline       
               \         three  
                \         |   
                 \        |
                  \       |
                   \      |
                    \     |
                     \    |
                      \   |
                       \  |
                         one                ## This node is a human's
                         Two-point-five     ## merge of the changes
                         newline            ## since the last merge.
                         three

It's the second merge that's important here.

In a system like Subversion, the second merge of the left branch to the right will fail miserably: the whole file's contents will be placed within conflict markers. That's because it's trying to dumbly apply a patch that changes "1\n2\n3" to "one\nTwo\nthree", and the target file has no matching lines at all.

A smarter system (like Clearcase) would remember that the previous merge had happened, and specifically notice that the lines "one" and "three" are the results of that previous merge. Therefore, it would ask the human only to deal with the "Two" versus "two-point-five" conflict; the earlier changes ("1\n2\n3" to "one\ntwo\nthree") would already be accounted for.

Comparisons, Arguments, and Questions

AS allows you to merge changes from a branch out of order, without doing any bookkeeping. MRCA requires you to merge changes from a branch in order.

MRCA is simpler to implement, since it results in a three-way merge (which is well-understood by Subversion). However, it may not handle all edge cases. For instance, it may break down faster if the merging topology is not hierarchical.

MRCA may be easier for users to understand, even though AS is probably simpler to a mathematician.

Consistency with other modern version controls systems is desirable.

If a user asks to merge a directory, should we apply MRCA or AS to each subdirectory and file to determine what ancestor(s) to use? Or should we apply MRCA or AS just once, to the directory itself? The latter approach seems simpler and more efficient, but will break down quickly if the user wants to merge subdirectories of a branch in advance of merging in the whole thing.

Merge Conflict Resolution

Merging inevitably produces conflicts which cannot be resolved by an algorithm alone. In such a case, human intervention is required to resolve the conflicts. The merge algorithm used by Subversion's Merge Tracking implementation makes this problem worse, since it breaks a requested merge range into several merges to avoid repeating merges which have already been applied to a merge target or its children. After a conflict is encountered, merges of subsequent revision ranges must be aborted, since tree conflicts or previous content conflicts cannot be reliably merged into (e.g. you can't merge into a file that either isn't there or which you could potentially merge inside one side of a conflict marker).

To help alleviate the pain of conflict resolution, a merge conflict resolution callback can be employed by Subversion clients. This callback is invoked whenever merge conflicts are encountered, and can takes steps like launching a graphical merge tool (for interactive conflict resolution), or following a pre-specified directive like "always use the version from my merge source". This last implementation can be used to support the SCM automated merge use case.

The command-line client includes a merge conflict resolution callback which behaves much like svk, when in interactive mode prompting for how to resolve each conflicted file or property value. When in non-interactive mode (or configured to disallow interactive conflict resolution via [miscellany] interactive-conflicts = no), conflict resolution is postponed until post-merge (as in pre-1.5 releases). See the 1.5 release notes for an example.

In a post-1.5 release, the command-line client will provide an interactive conflict resolution option to display some context for each conflict in a path or property value, and prompt for how to resolve it. The merge algorithm will attempt to continue applying more of the requested merge after conflict is encountered, merging what it can around the conflicted area of the WC, and possibly supporting an option to complete the remainder of an unfinished merge operation after conflicts have been resolved manually.

Related discussion from the dev@ mailing list can be found here:

Issue #2022 is loosely related.

Distribution of Conflict Resolution

No explicit facility is provided for distribution of conflict resolution. To support this use case, developers can co-ordinate with each other to resolve merge conflicts on portions of a tree, and trade patches.

Migration and Interoperability

Migration

No explicit steps are necessary to migrate the content of a pre-Merge Tracking repository. Only an upgrade to Subversion 1.5.0 is necessary.

TODO: Merge meta data from svnmerge.py. Dan Berlin has written Python code to perform this migration; it needs to be made available in the tools/server-side/ area of the distribution .

Interoperability

Executive summary for client/repository inter-op:

  • Older Subversion clients may interact with a 1.5.x+ Subversion repository, but will continue to lack Merge Tracking functionality for:
    • Recording meta data about any merges performed.
    • Using merge meta data to avoid repeated merging.
  • 1.5.x+ Subversion clients may interact with a older Subversion repositories, with Merge Tracking functionality effectively neutralized.

Gory detail for client/repository inter-op:

  • A repository 1.4.x- doesn't provide any way to retrieve inherited merge info for a path (regardless of client version). For a 1.5.x+ client which could theoretically make use of any merge info available to it, this will typically neutralize its Merge Tracking functionality. The one case where merge info might come into play is when the merge info for a path is available locally (e.g. in the client's WC); in this case, repeated merges may be avoided.
  • A 1.5.x client will record merge tracking meta data for merges performed, regardless of repository version. However, a repository 1.4.x- won't know to do anything special with this merge info. When the repository is upgraded to 1.5.x+, we'll retain this merge info in the svn:mergeinfo property, but I'm not yet clear on what'll happen to the sqlite merge info index. We may need some sort of upgrade path here, but don't have one yet, and aren't promising one.

Subversion dump files continue to be fully portable between pre- and post-Merge Tracking versions of Subversion.

$Date$