*** UNDER CONSTRUCTION ***
Subversion's merge tracking uses a layered design, with the user-visible operations based primarily on the information from the merge history.
Or, Tracking What Revisions Have Been Merged Where provides the information used by Subversion's merge tracking-related capabilities (history sensitive merging, etc.). The design is based on Dan Berlin's proposal (Message-Id: <1146242044.23718.1.camel@dberlin0-corp.corp.google.com>).
The following content has been temporarily superseded by notes on the development branch until the design completely stabilizes.
The goal of the Merge History portion of the design is to track the
information needed by the operations outlined by the use cases (e.g. the revision numbers
being merged by a merge operation), and keeping this information in
the right places as various operations (copy
,
delete
, add
, etc.) are performed. This
portion of the design does not encompass the operations
themselves.
The goals:
Pre-notes: Whether to store info in revprops, or just on dirs and files, is still an open question. It makes certain semantics clearer on what operations do below and how to proceed easier. The question is whether it is efficient enough time wise when we go to retrieve merge info, and whether it complicates what merge has to do too much. It also removes all of the listed prerequisites.
The first question that many people ask is "where should we store
the merge information?" (what we store will be covered next). A merge
history property, named SVN_MERGE_PROPERTY
(e.g. svn:mergeinfo
) stored in the revision properties,
directory properties, and file properties. Each will store the
full, complete list of current merged in changes, as
far as it knows. This ensures that the merge algorithm and other
consumers do not have to walk back revisions in order to get the
transitive closure of the revision list.
The way we choose which of file, dir, revprop merge info to use in case of conflicts simple system of inheritance [1] where the "most specific" place wins. This means that if the property is set on a file, that completely overrides the directory and revision level properties.
The way we choose which to store to depends on how much and where you merge, and will be covered in the semantics.
The reasoning for this system is to avoid having to either copy info everywhere, or crawl everywhere, in order to determine which revisions have been applied. At the same time, we want to be space and time efficient, so we can't just store the entire revision list everywhere.
As for what is stored:
A survey of the community shows a slight preference for a human editable storage format along the lines of how svnmerge stores its merge info (e.g. path name and list of revisions). Binary storage of such information would buy, on average, a 2-3 byte decrease per revision/range in size over ASCII [2], while making it not directly human readable/editable.
The revisions we have merged into something are represented as a path, a colon, and then a comma separated revision list, containing one or more revision or revision ranges. Revision range end and beginning points are separated by "-".
Token | Definition |
---|---|
revisionrange | REVISION "-" REVISION |
revisioneelement | revisionrange | REVISION |
revisionlist | revisioneelement (COMMA revisioneelement)* |
revisionline | PATHNAME COLON revisionlist |
top | revisionline (NEWLINE revisionline)* |
This list will not be stored in a canonicalized minimal form for a path (e.g. it may contain single revision numbers that could fall into ranges). This is chiefly because the benefit of such a canonical format (slightly easier for comparison, but not indexing) is outweighed by the fact that generating a canonical form may require groveling through a lot of information to determine what that minimal canonical form is. In particular, it may be that the revision list "5,7,9" is, in minimal canonical form, "5-9", because 6 and 8 do not have any affect on the path name that 5 and 9 are from. Canonicalization could be done as a server side post pass because the information is stored in properties.
Note that this revision format will not scale on its own if you have a list of million revisions. None will easily. However, because it is stored in properties, one can change the WC and FS backends to simply do something different with this single property if they wanted to. Given the rates of change of various very active repositories, this will not be a problem we need to solve for many many years.
The current thinking is that the payload of
SVN_MERGE_PROPERTY
will be stored in an index separate
from the FS which is created during svnadmin create
.
This index will support fast querying, be populated during a merge or
svnadmin load
, and cough up its contents during a
svn propget SVN_MERGE_PROPERTY
or svnadmin
dump
. The contents of SVN_MERGE_PROPERTY
will not
be stored redundantly in the FS (only in the index). Dan Berlin is
prototyping this index using sqlite3, and David James has a (generic)
schema design underway.
Each operation you can perform may update or copy the merge info associated with a path, file, or revision.
svn add
: No change to merge info
svn delete
: No direct change to merge info
(indirectly, because the props go away, so does the merge info for the
file)
svn rename
: No change to merge info
svn copy
: Copies the merge info from the source path
to the destination path, if any.
This includes copying info from revprops, if necessary, by determining if the merge info exists in a revprop for the last changed commit for the source path, and copying it to the new revprop if it does (someone probably needs to check if this is the right semantic :P)
All copies are full-copies of the merge information.
svn merge
: Adds or subtracts to the merge info,
according to the following:
SVN_MERGE_PROPERTY
set on that
file.SVN_MERGE_PROPERTY
set on the
directory.SVN_MERGE_PROPERTY
set on the
revprop.Thus a merge of revisions 1-9 from http://foo.bar.com/reposroot/trunk would produce "/trunk:1-9"
cross-repo merging is a bridge we can cross if we ever get there :).
SVN_MERGE_PROPERTY
revprop differently from other
revprops (either by special casing the cases users do care about
controlling, or special casing props users don't care about
controlling, etc.) so that people who don't have access to the
revprops can still do history sensitive merges of directories they
do have access to.What happens if someone commits a merge with a non-merge tracking client?
It simply means the next time you merge, you may receive conflicts that you would have received if you were using a non-history-sensitive client.
Can we do without the revprop portion of this design?
Technically yes, but it may require more crawling and querying at merge time.
Can we do history sensitive WC-to-WC merges without contacting the server?
No. But you probably couldn't anyway, even if the "revprop not being stored locally" issue were not here.
What happens if the merge history is not there?
The same thing that happens if the merge history is not there now.
What happens if a user edits merge history incorrectly?
They get the results specified by their merge history.
How does the revprop stay up to date?
We copy it from revision to revision.
What happens if a user manually edits a file and unmerges a revision (e.g. not using a "reverse merge" command), but doesn't update the merge info to match?
The merge info will believe the change has still been merged. This is a similar effect to performing a manual merge.
What happens if I svn
move
/rename
a directory, and then merge it
somewhere?
This doesn't change history, only the future, thus we will simply add the merge info for that directory as if it was a new directory. We will not do something like attempt to modify all merge info to specify the new directory, as that would be wrong.
I don't think only that copying info on svn
copy
is correct. What if you copy a dir with merge info into a
dir where the dir has merge info -- won't it get the info wrong
now?
No. Let's say you have:
a/foo (merge info: /trunk:5-9 a/branches/bar (merge info: /trunk:1-4)
If you copy a/foo into a/branches/bar, we now have:
a/branches/bar (merge info: /trunk:1-4) a/branches/bar/foo (merge info: /trunk:5-9)
This is strictly correct. The only changes which have been merged into a/branches/bar/foo, are still 5-9. The only changes which have been merged into /branches/bar are 1-4. No merges have been performed by your copy, only copies have been performed. If you perform a merge of revisions 1-9 into bar, the results one would expect that the history sensitive merge algorithm will skip revisions 5-9 for a/branches/bar/foo, and skip revisions 1-4 for a/branches/bar. The above information gives the algorithm the information necessary to do this. So if you want to argue svn copy has the wrong merge info semantics, it's not because of the above, AFAIK :)
Merge Tracking is implemented using a few simple data structures.
apr_hash_t
mapping merge source paths to a range
list.apr_array_header_t
containing
svn_merge_range_t *
elements.svn_merge_range_t
start
and
end
svn_revnum_t
, which are identical if
the range consists of only a single revision.As outlined in the requirements and use cases, merge tracking auditting is the ability to report information about merge operations. It consists of three sections:
Most of the logic for reporting will live in libsvn_repos
, with
appropriate changes to the API and additional parameters exposed to the client
through the RA layer. We are looking at an API which takes one or more paths
and a revision (range?), and returns the merge info which was added or removed
in that revision. For the 'blame' case, we may also need to include some type
of line number parameter, the handling of which is going to get ugly, since our
FS isn't based on a weave.
Existing functions of interest are svn_repos_fs_get_merge_info()
and svn_fs_merge_info__get_merge_info()
.
$Date$