Description of the NODES table ============================== * Introduction * Inclusion of BASE nodes * Rows to store state * Ordering rows into layers * Visibility of multiple op_depth rows * Restructuring the tree means adding rows * Copies of mixed-revision subtrees become multiple layers * In a deleted subtree, all nodes get marked deleted explicitly * Nodes in a replaced subtree can have different presence values * Presence values of nodes in partially overlapping replacements * Status needs to consult the *two* topmost layers - sometimes Introduction ------------ The entire original design of wc-ng evolves around the notion that there are a number of states in a working copy, each of which needs to be managed. All operations - excluding merge - operate on three trees: BASE, WORKING and ACTUAL. For an in-depth description of what each means, the reader is referred to other documentation, also in the notes/ directory. In short, BASE is what was checked out from the repository; WORKING includes tree restructuring; while ACTUAL also includes changes to properties and file contents. The idea that there are three trees works - mostly. There is no need for more trees outside the area of the metadata administration and even then three trees got us pretty far. The problem starts when one realizes tree modifications can be overlapping or layered. Imagine a tree with a replaced subtree. It's possible to replace a subtree within the replacement. Imagine that happened and that the user wants to revert one of the replacements. Given a 'flat' system, with just enough columns in the database to record the 'old' and 'new' information per node, a single revert can be supported. However, in the example with the double replacement above, that would mean it's impossible to revert one of the two replacements: either there's not enough information in the deepest replacement to execute the highest level replacement or vice versa - depending on which information was selected to be stored in the "new" columns. The NODES table is the answer to this problem: instead of having a single row in a table with WORKING nodes with just enough columns to record (as per the example) a replacement, the solution is to record different layers of tree restructuring by having multiple rows. Inclusion of BASE nodes ----------------------- The original technical design of wc-ng included a WORKING_NODE and a BASE_NODE table. As described in the introduction, the WORKING_NODE table was replaced with NODES. However, the BASE_NODE table stores roughly the same state information that WORKING_NODE did. Additionally, in a number of situations, the system isn't interested in the type of state it gets returned (BASE or WORKING) - it just wants the latest. As a result the BASE_NODE table has been integrated into the NODES table. The main difference between the WORKING_NODE and BASE_NODE tables was that the BASE_NODE table contained a few caching fields which are not relevant to WORKING_NODE. Moving those to a separate table was determined to be wasteful because the primary key of that table would be much larger than any information stored in it in the first place. Rows to store state ------------------- Rows of the NODES table store state of nodes in the BASE tree and the layers in the WORKING tree. Note that these nodes do not need to exist in the working copy presented to the user: they may be 'absent', 'not-present' or just removed (rm) without using Subversion commands. A row contains information linking to the repository, if the node was received from a repository. This reference may be a link to the original nodes for copied or moved nodes, but for rows designating BASE state, they refer to the repository location which was checked out from. Additionally, the rows contain information about local modifications such as copy, move or delete operations. Ordering rows into layers ------------------------- Since the table might contain more than one row per (wc_id, local_relpath) combination, an ordering mechanism needs to be added. To that effect the 'op_depth' value has been devised. The op_depth is an integer indicating the depth of the operation which modified the tree in order for the node to enter the state indicated in the row. Every row for the (wc_id, local_relpath) combination must have a unique op_depth associated with it. The value of op_depth is related to the top-most node being modified in the given tree-restructuring operation (operation root or oproot). E.g. upon deletion of a subtree, every node in the subtree will have a row in the table with the same op_depth, that being the depth of the top directory of the subtree. The op_depth is calculated by taking the number of path components in the local_relpath of the oproot. The unmodified tree (BASE) is identified by rows with an op_depth value 0. By having multiple restructuring operations on the same path in a modified subtree (most notably replacements), the table may end up with multiple rows with an op_depth bigger than 0. Visibility of multiple op_depth rows ------------------------------------ As stated in the introduction, there's no need to leak the concept of multiple op_depth rows out of the meta data store - apart from the BASE and WORKING trees. As described before, the BASE tree is defined by op_depth == 0. WORKING as visible outside the metadata store maps back to those rows where op_depth == MAX(op_depth) for each (wc_id, local_relpath) combination. Restructuring the tree means adding rows ---------------------------------------- The base idea behind the NODES table is that every tree restructuring operation causes nodes to be added to the table in order to best support the reversal process: in that case a revert simply means deletion of rows and bringing the subtree back into sync with the metadata. There's one exception: When a delete is followed by a copy or move to the deleted location - causing a replacement - a row with the right op_depth may already exist, due to the delete. If so, it needs to be modified. On revert, the modified nodes need to be restored to 'deleted' state, which itself can be reverted during the next revert. (If the row did not exist with the right op_depth, then this copy or move is being performed at some greater depth than the delete, and then this copy or move will simply create rows at a new op_depth.) ### JAF: I don't think a replacement should be reverted in two stages, even though it was created in two stages. I think 'revert' should restore the previous existing node, just like it does in WC-1. A partial revert of this state is not a particularly helpful or frequent use case. GJS: I believe that wc_db should enable the individual reverts. The first revert will undo the add/copy, and the second revert will undo the delete. The UI (or the next level up in libsvn_wc) can collapse those into a single user action. This leaves us the future possibility of finer-grained reverts. ### EHU: The statement above probably means that *all* nodes in the subtree need to be rewritten: they all have a deleted state with the affected op_depth, meaning they probably need a 'replaced/copied-to' state with the same op_depth... GJS: not all nodes. The newly-arriving copy/move may have new/disjoint nodes that were not part of the deleted set. We will simply add new rows for these arriving nodes. Similarly, the arriving subtree may NOT have a similar node, so the deleted node remains untouched. Copies of mixed-revision subtrees become multiple layers -------------------------------------------------------- In the design, every node which is not a child of its parent implies a tree restructuring operation having taken place. When committing a mixed-revision subtree, the commit should mirror the actual mixed state of the tree. A mixed-revision tree which came about in the usual process of committing content changes - ie one without tree modifications - differs exactly in that respect: the tree in the repository doesn't need to mirror the mixed revision state in the working copy. The idea is that every tree restructuring operation takes place on the oproot. When a node or subtree within the copied tree isn't a direct child of its parent, most notably because it's at a different revision, that's a tree restructuring: a node of the same revision has been replaced by a node (of the same name) from another rev. By strict application of the design rule, all nodes and subtrees at different revision levels than their parents within the copied subtree, become an op_depth layer of their own. ### JAF: We don't have the info in the WC to be able to fill in the lower layers of this tree for the copy, if we are copying from BASE, because BASE is stored flat. Therefore we won't be able to revert/delete/replace the different-revision sub-trees of this copied tree. Therefore I think we have to store the mixed-rev copy flat (single op_depth) and modify the commit rules to act on revision-number changes within this flat tree. GJS: correct. Consider a root of the subtree at r10, and a descendant is at r12. We cannot create one layer at r10, and another at r12 because we do not have the descendant@r10 to place into the first layer. Thus, we need to use a single op_depth layer for this operation. At commit time, one copy will be me for the subtree from r10, a deletion will be made for the descendant, and then another copy performed for the r12 descendant. ### GJS: in the above scenario, we do not know if the descendant existed in r10, so the deletion may not be necessary (and could even throw an error!). I do not recall if our copy's destination is allowed to exist (ie. we have implied overwrite semantics in the repository). PM: Yes, we have overwrite sematics. The FS layer on the server has magic that converts the copy of the r12 descendant into a replace if the descendant exists in r10. The client does not send a delete. This magic applies to copies, not deletes, so there is a problem when the descendant is deleted in the mixed-revision copy in the working copy. When faced with a copy of the subtree at r10 and a delete of a descendant at r12 the commit doesn't work at present. Deleting the descendant is wrong if it does not exist in r10, but not deleting it is wrong if it does exist. I suppose the client could ask the server, or perhaps use multiple layers of BASE to track mixed-revisions (argh!). In a deleted subtree, all nodes get marked deleted explicitly ------------------------------------------------------------- All nodes in a deleted subtree get marked 'deleted' explicitly in order to be able to query on a single node and find in its topmost layer that the node that might have ever existed at the given path does not exist there anymore. Presence values of nodes in partially overlapping replacements -------------------------------------------------------------- Replacement - being a two-step operation consisting of a delete and an add/copy - causes all rows of the deleted subtree to be added with a new op_depth and presence value 'deleted'. So far so good. Adding a tree on top of the same oproot will cause the oproot - and all overlapping children! - to switch their presence value to 'normal'. When a node replaces a deleted node it hides any deleted children of the previously deleted node, and may come with children of its own. Some of the new children may have the same names as some of the deleted children, but these overlapping children should not be considered restructuring replacements. Only the parent, with op_depth equal to the tree depth, is a restructuring replacement. Status needs to consult the *two* topmost layers - sometimes ------------------------------------------------------------ As discussed before, every tree restructuring operation becomes an oproot, causing rows to be added with a new op_depth value. Status wants to report the oproots, making a clear distinction between adds and replacements. However, both added and replaced nodes have the presence value 'normal'. In order to make the distinction, status needs to determine if there was a node in the layer below the restructured layer. In case there is, it must be a replacement, otherwise, it must be an addition. TODO: GJS: yup. tho it will complicate a revert of a copy/move-here since we will need to perform a query to see whether we should convert the copy/move into a deleted node, or whether to simply remove the node entirely. GJS: and yes, if wc_db performed a double-operation revert, then we wouldn't have to do this. arguably, we could push the 2-op revert to a future release when we also choose to alter the higher layers to bring in the finer-grained control. (or maybe we have to look at prior nodes regardless, so checking for "replace with a deleted node" comes for no additional cost). * Document states of the table and their meaning (including values of the relevant columns)