SUBVERSION "MERGE" and "SWITCH" FEATURES
                         Slated for 0.9 (M9)

                     1st draft writ by Karl & Ben,
                 after much discussion with CMike & Greg.


This is primarily a description of the semantics of merge and switch,
that is, Subversion's user-visible behavior in these operations.  It
also discusses some implementation issues.

Definitions:

   * Merging is like "cvs update -j -j".  I.e., take the difference
     between two trees in the repository, and apply it diffily to the
     working copy.

   * Switching means to switch the working copy from one line of
     development over to another, like "cvs update -r <TAG|BRANCH>".
     Of course, Subversion doesn't really have the concept of lines
     of development, it just has copies.  But if a working directory
     is based on repository tree T, and you "switch" it to be based on
     repository tree S, where T and S are similar (related) in some
     way, that's effectively the same as what CVS does.


The General Theory of Updating, Merging, and Switching
======================================================

Updating, merging, and switching are all very similar operations; each
command is a request to have the server modify the working copy in
some way.  Each of these subcommands begins with the client describing
the "state" of the working copy to the server, and ends with the
server comparing trees and sending back tree-delta(s) to the client.

Here's the easiest way to understand the three operations: assume that
X:PATH1 and Y:PATH2 are paths within two repository revisions X and Y,
which are possibly the same revision.  The server compares the X:PATH1
and Y:PATH2 and sends the difference to the client.

  * In an update, PATH1 == PATH2 always, and after the tree-delta is
    applied, the working copy metadata is changed (specifically,
    revisions are bumped.)

  * In a merge, PATH1 does not necessarily equal PATH2, and we don't
    touch metadata (except maybe for "genetic" merging properties
    someday).  In other words, the applied changes end up looking like
    local modifications.

    [Actually, in a merge PATH1 usually does equal PATH2 -- in fact,
    that's how it always is in CVS, in a sense.  So I think
    supporting the PATH1 != PATH2 case in merge should not be a high
    priority.  -kff]

  * In a switch, PATH1 does not necessarily equal PATH2, and we *do*
    rewrite the working copy metadata (specifically, revisions are
    bumped and URLs are changed).

When doing a merge or switch, the user needs to specify at least one
of the two paths.  There's a risk that the requested path may be
completely unrelated to the path represented by the working copy --
and thus might result in seemingly random diffs and conflicts
everywhere (or in the worst case, a complete deletion and re-checkout
of the working copy!)  Our plan is to add a heuristic to Subversion
that asks the question "are these two paths related in some way?"  If
the test fails, the command aborts and the user receives a friendly
message:  "PATH1 and PATH2 have no common ancestry.  Are you *sure*
you want to apply this delta?  If so, re-run the command with the
--force option."


Merging
=======

Merge is a special case of update, or rather, update is a special case
of merge.  Simplifying things a bit: when we update, we take the
differences between path P at revision X versus P at revision Y, and
apply that difference to the working copy.  Note that since P:X
reflects the working copy text bases exactly, the server can send
contextless diffs to bring the working copy to P:Y.  (The
simplification here is that P:X is really a transaction reflecting the
working copy's revision mixture, and not necessarily corresponding
precisely to any single revision tree).

When we merge, we take the differences between path P at revision X
(X:P) versus path Q at revision Y (Y:Q), and apply them to the working
copy.

Thus, what distinguishes a merge from an update is that P != Q (is
there a symbol for "need not equal"?  Maybe "P ?= Q"...)  For that
matter, X ?= Y.  

X:P and Y:Q are two distinct trees, but in practice, they share a
common ancestor, so using the difference between them is not a
ridiculous idea.  But note that svn_repos_dir_delta() is perfectly
content to express the difference between any two trees, related or
not.

It is possible, indeed likely, that neither P:X nor Q:Y are an exact
reflection of the working copy bases, therefore context diffs are used
to facilitate merging.  

*** Implementation details ***

Heh, two completely different possibilities here:

1. Only the Subversion client generates context diffs and applies them
   (right now by running 'diff' and 'patch' externally.)  Therefore,
   the objective is to create *two* sets of fulltext files in some
   client-side temporary area.  The first fulltext set represents X:P,
   and the second fulltext set represents Y:Q.  The client then
   compares the two sets, generates context diffs, and applies the
   context diffs to the working copy's working files.

   The naive approach would be to just directly ask the server for
   both sets of fulltexts.  (We still consider this an option!)

   A more complex approach (which we'll attempt) is a network
   optimization -- it's a way of creating both sets of fulltexts on
   the client using minimal network traffic:

     * The client builds a transaction on the server that is a
       "reflection" of the working-copy, mixed revisions and all.

     * The server sends a tree-delta between the reflection and X:P;
       the client then applies these binary diffs to copies of the
       working-copy's text-bases in order to reconstruct the fulltexts
       of X:P.

     * The server sends a tree-delta between X:P and Y:Q; the client
       then applies these binary diffs to copies of the X:P fulltexts
       in order to reconstruct the fulltexts of Y:Q.

  And that's it!  We have both sets of fulltexts.  The client
  generates context diffs between them and patches the working copy.

  As mentioned earlier, this process doesn't touch any working-copy
  metadata in .svn/.  Only the working files are patched, so the
  differences appear as local modifications.  At that point, the user
  manually resolves any conflicts.


2. What is the difference between these two commands?

        svn merge -rX:Y <URL>
        svn diff  -rX:Y <URL> | patch

   :-) ?  If we have an extended patch format, supporting copies,
   renames, deletes, and properties (like we've been planning), then
   there isn't formally even any need for a "merge" command -- it's a
   trivial wrapper around "svn diff" and patch.

   In other words, much of the work described in Plan 1 above has
   already been done by Philip Martin in his diff editors.  Maybe we
   should just take advantage of that?  There's still the issue of
   recording metadata about the merge, but presumably that would come
   from the extended patch format.

   Random thoughts from Karl:

   I do wonder if it's always desirable to merge properties anyway.
   Most of the properties we have are subversion-specific, and when I
   think of the kinds of merges I've done in the past, I can't think
   of a case where having the property changes merge would be
   desirable.  Ooooh, but when we use the properties to record what
   has been previously merged, then having them travel *with* the
   changes is useful.  For example:

       $ svn merge -r18:20 http://svn.collab.net/repos/branches/rel_1
       $ svn ci
           ===> produces .../trunk/whatever/blah, revision 100

       Then the next week:

       $ svn switch http://svn.collab.net/repos/branches/rel_2
       $ svn merge -r97:153 http://svn.collab.net/repos/trunk/whatever/blah

   In a situation like that, you want the rel_1 branch merge into
   trunk to travel with the trunk changes you're now merging into
   rel_2.


Switching
=========

Switching is a more general case of update: instead of comparing the
working-copy "reflection" to an identical path in some revision, the
server compares the reflection to some *arbitrary* path in some
revision.  The user specifies the new path.

The result of the operation is to effectively morph the working copy
into representing a different location in the tree.  In theory, there
should be no way to tell the difference between a fresh checkout of
PATH2 and a working copy that was "switched" to PATH2. 

*** Implementation details  ***

As in update operations, the client begins by building a reflection of
working-copy state on the server.  The client then specifies a new
path/revision pair as the target of the tree-delta.

After the client finishes applying the delta, it needs to do a little
more work than update:  besides bumping all working revisions to some
uniform value, it needs to rewrite all of the metadata URL ancestry as
well.

-----------------------------------------------------------------------
               

Interactions:  A Brave New World
================================

With the `svn switch' feature, we now have the potential to have
working copies with "disjoint" subdirs, that is, subdirs whose
repository url is not simply the subdir's parent's url plus the
subdir's entry name in the parent.  For example:

   $ svn checkout http://svn.collab.net/repos/trunk -d svn
   A     ...
   A     svn/subversion/libsvn_wc
   A     svn/subversion/libsvn_fs
   A     svn/subversion/libsvn_repos
   A     svn/subversion/libsvn_delta
   A     ...
   $ cd svn/subversion/libsvn_fs
   $ svn switch http://svn.collab.net/repos/branches/blue/subversion/libsvn_fs
   [...]
   $ 

While svn/subversion/.svn/entries still has an entry for "libsvn_fs",
if you go into libsvn_fs and look at its own directory url, it is not
simply a child of the `subversion' directory url, but rather a
completely different url.  We call this directory "disjoint".

Commits, updates, merges, and further switch commands all need to deal
sanely with this scenario.

We can assume that even disjoint urls are still all within the same
repository, because the parent of a disjoint child still has an entry
for that child, and all working copy walks are guided by entries.  In
cases where there are wc subdirs from completely different
repositories, there is unlikely to be such entry linkage.  [NOTE: We
will still be adding some extra information to the wc to make it
possible to check for the rare circumstance where the parent has an
entry for a subdir which (for whatever reason) is the result of a
checkout from a different repository.  More on that later.]


Changes To The Commit Process:
==============================

Currently, the commit editor driver crawls the working copy, and sends
local modifications through the editor as it finds them.  But we now
have to deal with disjoint urls in the working copy.  Because editors
must be driven depth-first, we cannot send changes against these
disjoint urls as they are found -- instead, we must begin the edit
based on a common parent of all the urls involved in the commit.  So
we must do a preliminary scan of the working copy, discovering all
local mods, collecting the urls for the mods, and then calculating the
common path prefix on which to base the edit.

[NOTE: this increases the memory usage of commits by a small amount.
We formerly interleaved the discovering and sending of local mods, but
now discovery will happen first and produce a list of changed paths,
and then sending the changes will happen entirely after that.  The
benefit is that we preserve commit atomicity even when branches are
present in the working copy... which is very important!]


Changes To The Update Process:
==============================

Currently, update builds a reflection of the working copy's state on
the server (the reflection is a Subversion transaction).  Then the
server sends back a tree delta between the reflection and the desired
revision tree (usually the head revision, but whatever).  The tree
delta is expressed by driving an svn_delta_edit_fns_t editor on the
client side.

If there are disjoint subdirs in the working copy, the reflection
must, uh, reflect this.  That's pretty easy: that subtree of the
transaction will simply point to the appropriate revision subtree
(implementation note: we'll need to add a new function to
svn_ra_reporter_t, allowing us to link arbitrary path/rev pairs into
the transaction.)

But getting the reflection right isn't enough.  The revision tree
we're comparing the reflection with doesn't have the special disjoint
subtree, so a lot of spurious differences would be sent to the client,
which the client would then have to ignore, presumably making a
separate connection later to update the disjoint subdir.  This way
lies madness... or at least inefficiency.

So instead, we'll create a *second*seaoe transaction, representing the
target of the update.  In the plain update case, this transaction is
an exact copy of the revision (and perhaps we'll optimize out the txn
and just use the revision tree after all).  But in the disjoint subdir
case, this second txn will also reflect the disjointedness.  In other
words, when a disjoint directory D is discovered, it will be linked
into both txn trees -- in the reflection txn, D will be at whatever
revision(s) it is in the working copy, and in the target txn, it will
be at the target revision of the update.  This way, the delta between
the reflection and target txns will apply cleanly to the working copy
(i.e., svn_repos_dir_delta() will just Do The Right Thing when invoked
on the two txns).  Voila.


Changes to Switch and Merge Process:
====================================

The switch process still needs to build a working-copy reflection that
contains possible "disjointed" subtrees.  However, the second
target-transaction isn't needed at all.  The server can send a delta
between the reflection and a "pure" path in some revision (presumably
the path that we're switching to.)  

If the disjointed subtree and the target path both happen to be part
of the same branch, then svn_repos_dir_delta() won't notice any
differences at all.  Otherwise, the user should expect to have the
disjointed section of the working copy be "converted" to a new URL,
just like the rest of the working copy.

In the case of merges, we continue to build a reflection that contains
disjointed subtrees.  Again, no need for a second transaction.
Remember that the reflection is only being built as a shortcut to
cheaply construct fulltexts of X:P in the client.  The structure of
the reflection is irrelevant; *any* reflection can be used as a basis
for sending a tree-delta that constructs X:P, no matter what
disjointed sections it has.  (Although some reflections may be more
useful than others!  In the worst case, if the reflection is
completely unrelated to X:P, then svn_repos_dir_delta() regresses into
sending fulltexts anyway.)