(writ by Ben, 7/21/00) Problem ------- * A client checks out a working copy of a tree from a repository. We call this tree `B' (for `base' revision). We call the client's tree `WC' (for `working copy'). * The client procedes to commit a few changes to the repository. The client also makes some local modifications. * Meanwhile, other people continue to commit changes to the repository. * Now the client wishes to update WC, to bring all of WC up-to-date with the latest tree in the repository (`L'). The issue is that WC is an ugly hodgepodge of files; many files are still based on the original base tree B, but some files have been individually committed and match the contents of repository trees that are descendants of B. And some files are just locally modified, not yet committed. In other words, given that WC's tree is a unique mix of files unknown to the server, what exchange of information must take place so that the server can send a tree-delta that converts WC into L? (Note: this problem also describes the situation where the client is attempting to commit WC, and the server first needs to check WC's `up-to-date-ness' and point out conflicts.) Triangulation ------------- Here's the plan of attack. d2 ---> B ------> L \ ^ \ | \ | ?? d3 d1 \ | \ | -> WC * The client sends a delta `d1' which describes how to convert B into WC. (It can do this because each change has been meticulously tracked in the working copy's administrative files.) * The server generates a delta `d2' which describes how to convert B into L. (It can do this because the repository has also been meticulously tracking each change from B to L.) * In theory, therefore, knowledge of d1 and d2 should allow us to create d3, which describes how to change WC into L. Defining `Change' ----------------- Deltas only talk about entities that have changed. We say that two deltas *conflict* iff they each change one or more of the same entity. Define a function C() which, given a delta, returns an unordered list of all node numbers that the delta changes. Here are the rules that C() follows: 1. If a file's contents change, then that file's node has changed. 2. If a file is added, renamed, or deleted, then both the file's node *and* it's parent node have changed. Notation -- C(P->Q) = { 3, 7, 9, 21 }. This statement means that a certain delta converts tree P into tree Q, and the node numbers changed in the process are 3, 7, 9, and 21. The Process ----------- 1. The client generates a `special' delta What do we mean by `special'? The client tracks local changes made to a working copy, queueing them up as the user works. Normally, when it's time to commit, the client generates a delta describing all the queued local changes, and submits them to the server. This makes sense, because the client only cares about discovering conflicts that are *immediately* relevant to the newest changes. But this list of local changes is *not* enough to create a delta B->WC! To be perfectly clear, there are two types of information that must included in a delta B->WC: * local changes that are queued, but not yet committed * local changes that *have* been committed since WC == B Therefore, the client's administrative files must keep track of both types of changes. 2. Reconstruction of WC Next, the server receives B->WC and effectively "rebuilds" an internal representation of the oh-so-unique WC. The server's internal reconstruction of WC needn't be some large, complex object on disk or in RAM; it's just a representation of the WC tree structure, complete with each node's number and internal version number. We call this reconstructed WC `RWC'. 3. The server generates a second delta. Examining node numbers (and their internal version numbers), the server creates a delta RWC->L. 4. The server runs C() on both deltas The server compares the output of C(B->WC) and C(RWC->L), looking for any intersection of the sets. Any intersection of the sets indicates conflicting deltas. Examples -------- Example 1: No conflict C(B->WC) = { 2, 5, 9, 13, 21, 87 } C(RWC->L) = { 3, 4, 6, 22, 54, 99, 102 } Because the two deltas have changed *completely* different sets of nodes, there's no conflict. If the client were attempting to commit, the delta would be approved. If the client were asking for an update, the server would send back RWC->L. Example 2: One conflict C(B->WC) = { 2, 5, 9, 13, 21, 87 } C(RWC->L) = { 3, 4, 6, 22, 54, 87, 102 } Both deltas have changed node 87. If the client were attempting to commit, the entire transaction would be denied. If the client were asking for an update, the server would point out the conflict. In both cases, it's up to the client to describe the conflict either with conflict markers (in the case of a changed file) or with an informational message (in the case of a changed directory). Example 3: Client adds a file Suppose that completely new file has been added to the working copy, but not yet committed. C(B->WC) = { 2, 5, 9, 13, 21, 87, ?? } C(RWC->L) = { 3, 4, 6, 22, 54, 99, 102 } The question marks indicate the fact that while C() was traversing the delta and looking up node numbers, it ran into an unrecognized pathname. There's no node number for the file, because it doesn't yet exist in the repository! This case demonstrates that mere node numbers aren't quite enough; pathnames must be also be examined when looking for conflicts between two deltas. E.g., suppose that the new file has the exact same pathname as node 102 in L? Example 4: Client deletes a file Deleting a file shows up in a tree-delta as a "change" to a file, and thus this file's node shows up in C(B->WC), so this is the same as example 2. Example 5: Client moves a file or directory. This action would show up as *two* changed node numbers in C(B->WC), and the rest would play out like example 2.