-*- Text -*- Conflict meta data storage in wc-ng =================================== Conflict meta data is stored in the ACTUAL_NODE table, within the 'conflict_data' column. The data in this column is a skel containing conflict information (meaning the node is in conflict, and the details are inside), or NULL (meaning no conflict is present). The conflict skel has the form: (OPERATION (KIND KIND-SPECIFIC) (KIND KIND-SPECIFIC) ...) OPERATION indicates the operation which caused the conflict(s) and is detailed below. KIND indicates the kind of conflict description that follows and is one of: "text" - meaning a text conflict of the whole text of the node (which must be a file), with left/right/mine full texts saved, and (unless it's "binary") conflict markers in the working text; "prop" - meaning a "normal" property conflict, with left/right/mine full values saved; "tree" - meaning a tree conflict; "reject" - meaning a text conflict for a single hunk of unidiff text, with the source being a patch file (rather than left/right full texts), and with a "reject" file being saved containing the unidiff text; "obstructed" - meaning ### TODO KIND-SPECIFIC is specific to each KIND, and is detailed below. There are restrictions on what mixture of conflicts can meaningfully be recorded - e.g. there must not be two "text" nor one "text" and one "reject". These restrictions are implied by the nature of operations creating the conflicts but not spelled out here. If the 'conflict_data' column is not NULL, then at least one KIND of conflict skel must exist, describing the conflict(s). Contrary to wc-1, wc-ng records sufficient information to help users understand, in hindsight, which operation led to the conflict (as long as all conflict information is exposed by the UI). Some information which wc-1 was storing in entries has no direct equivalent in wc-ng conflict storage (such as paths to temporary files), but this information can be deduced from the information stored (e.g. conflict-old and friends; foo.r42 is now 'foo' + '.r' + left_rev) ### BH: We have to store the exact name of the conflict marker files. If we ### just 'guess' how conflict markers are named by using their revision ### numbers, we can't handle situations where there are existing files with ### these names. The WC-1.0 code uses a unique name function to generate a ### unique marker file name which happens to match this pattern if there are ### no conflicts, but sometimes explicitly preserves the existing file ### extension to help diff tools. (See 'preserved-conflict-file-exts' in our ### config). I think we can move the names of the markers into the skel and/or ### keep them in their own columns. (These names are needed on filling a ### svn_wc_entry_t) Operation skel -------------- Meaning: The Operation skel indicates what kind of operation was being performed that resulted in a conflict, including the format and content (or reference to content) of the conflicting change that was being applied to this node. The OPERATION skel has the following form: (NAME OPERATION-SPECIFIC) NAME is one of: "update" - meaning a 3-way merge as in "svn update"; "switch" - meaning a 3-way merge as in "svn switch"; "merge" - meaning a 3(4?)-way merge as in "svn merge"; "patch" - meaning application of a unidiff patch, as in "svn patch". OPERATION-SPECIFIC is as follows: To record an "update" operation, the skel has the form: ("update" BASE_REV TARGET_REV) BASE_REV is the base revision prior to the update. TARGET_REV is the revision being updated to. ### sbutler: What about mixed-revision working copies? Let's record ### the equivalent of svn_wc_revision_status_t, plus the target rev: ### ### ("update" MIN_REV MAX_REV SWITCHED MODIFIED TARGET_REV) ### ### Otherwise, the user may get the mistaken impression that the local ### tree is entirely at the URL and revision of the victim dir. For "switch", the skel has the form: ("switch" BASE_REV TARGET_REV REPOS_RELPATH) BASE_REV and TARGET_REV are as for "update" above. REPOS_RELPATH is the path in the repository being switched to. For "merge", the skel has the form: ("merge" LEFT_REV RIGHT_REV REPOS_UUID REPOS_ROOT_URL (LEFT_REPOS_RELPATH LEFT_PEG_REV) (RIGHT_REPOS_RELPATH RIGHT_PEG_REV) ) LEFT_REV is the merge-left revision, and RIGHT_REV is the merge-right revision of a continuous revision range which was merged (merge tracking might split a merge up into multiple merges of continuous revision ranges). REPOS_UUID is the UUID of the repository being merged from, in order to recognize merges from foreign repositories. REPOS_ROOT_URL is the repository root URL the repository being merged from. {LEFT,RIGHT}_REPOS_RELPATH is the path in the repository of the {left,right} version of the item. {LEFT,RIGHT}_CONFLICT_REV is the revision in which to find the {left,right} version of the item which caused the conflict. These are usually LEFT_REV or RIGHT_REV, but in some cases they may differ (a simple example is if a file was replaced in revision rX somewhere between LEFT_REV and RIGHT_REV, and the conflict is due to events which happened between LEFT_REV and rX). For "patch", the skel has the form: ("patch" PATCH_SOURCE_LABEL) PATCH_SOURCE_LABEL is (typically) the absolute path of the patch file the application of which led to conflicts. In the future, it may also be something like "". Text conflicts -------------- Text conflicts only exist on files. The following skel represents the "text" KIND of conflict: ("text" ORIGINAL_SHA1 MINE_SHA1 INCOMING_SHA1) {ORIGINAL,MINE}_SHA1 are SHA1 checksums of the full texts of the {original (BASE), mine (WORKING)} version of the file. INCOMING_SHA1 is the SHA1 checksum of the incoming version of the file. ### Need INCOMING_{LEFT,RIGHT}_SHA1 for 4-way merge? File version's content can be obtained from the pristine store. ### BH: We need some marker here, but these values must also be stored ### in the older_checksum, left_checksum, right_checksum colums of ACTUAL ### to allow pristine store cleanups. ### BH: What about symlinks? ### stsp: I guess we can say that all SHA1 sums refer to proper files, ### and symlinks are resolved before the SHA1 is calculated and ### stored in the db? Property conflicts -------------- Property conflicts can exist on files, directories and symlinks. There can be one or more property conflicts on the node, represented by one or more "prop" KIND conflicts. Each "prop" conflict has the following form: ("prop" PROPERTY_NAME ([ORIGINAL_VALUE]) ([MINE_VALUE]) ([INCOMING_VALUE]) ([INCOMING_BASE_VALUE])) PROPERTY_NAME is the name of the property, such as "svn:eol-style". Each property value ({ORIGINAL,MINE,INCOMING,INCOMING_BASE}_VALUE) is represented as an empty list indicating the property did not exist in that version, or a 1-item list containing the particular value. ORIGINAL_VALUE is the property that was checked out MINE_VALUE is the current/ACTUAL value in the working copy INCOMING_VALUE is the new/target value from an update/merge/etc INCOMING_BASE_VALUE is used during merges, as an incoming property change is expressed as "change from INCOMING_BASE to INCOMING" ### stsp: What's the size limit of a prop value? ### HKW: In theory, there isn't one. In practice, we used to caution people ### against having too large of props, but I don't know if that is ### true anymore. Tree conflicts -------------- Tree conflicts exist on files or directories. ### JAF: And symlinks, I presume - or, if not, why not? ### stsp: Symlinks are resolved before retreiving conflict information., The following information is stored if there is a tree conflict on the node: ("tree" LOCAL_STATE INCOMING_STATE) LOCAL_STATE := (LOCAL_CHANGE ORIGINAL_NODE_KIND MINE_NODE_KIND [ORIGINAL_SHA1 MINE_SHA1]) INCOMING_STATE := (INCOMING_CHANGE INCOMING_NODE_KIND [INCOMING_SHA1]) LOCAL_CHANGE is the local change which conflicted with the incoming change during the operation. Possible values are "edit", "add", "delete", "rename", "replace", "obstructed", "missing", and "unversioned" (### possibly collapse "unversioned" with "obstructed"?). ORIGINAL_NODE_KIND is the kind of the node in the BASE tree. MINE_NODE_KIND is the kind of the node from the WORKING tree at the time the conflict was flagged. INCOMING_CHANGE is the incoming change which conflicted with the local change during the operation. Possible values are "edit", "add", "delete", "rename", and "replace". The *_SHA1 sum fields are only present if {ORIGINAL,MINE,INCOMING}_NODE_KIND is "file". ORIGINAL_SHA1 is the SHA1 of the BASE version of the tree conflict victim file in the working copy. MINE_SHA1 is the SHA1 of the WORKING version of the tree conflict victim file as of the time the conflict was flagged. If INCOMING_KIND is "file", INCOMING_SHA1 is the SHA1 of the file which the operation was attempting to install in the working copy. The file version's content can be obtained from the pristine store. ### BH: We need to duplicate the sha1 values in the older_checksum, ### left_checksum, right_checksum columns of ACTUAL ### to allow pristine store cleanups. ### BH: Can we share some of the sha1 logic with the text conflicts to ### allow resolving this in the same way? ### (We should keep the history of the node valid via replace vs update) ### stsp: I don't really understand your question. Can you be more specific? (Unversioned) Obstructions -------------------------- When an update introduces a new node where an existing unversioned node is stored locally we need to add some marker to allow the operation to update the BASE_NODE table. There is no particular data which needs to be recorded for an obstruction. Thus, the "obstructed" conflict skel has the form: ("obstructed") Reject conflicts ---------------- For patches, the content of the left and right versions is not fully known, so the conflict is not a diff3-style text conflict. Rather, the conflict is the failure to find a match for a hunk's context in the patch target. There can be one or more reject conflicts on a node. Each "reject" conflict has the following form: REJECTED_HUNK_LIST := (HUNK_ORIGINAL_OFFSET HUNK_ORIGINAL_LENGTH HUNK_MODIFIED_OFFSET HUNK_MODIFIED_LENGTH)* ("reject" REJECT_FILE TARGET_PATCH_SHA1 REJECTED_HUNK_LIST) REJECT_FILE is ... TARGET_PATCH_SHA1 is the sha1 of the unidiff content of the rejected hunk as written to the .svnpatch.rej file. The actual unidiff content (which can be large!) can be retrieved from the pristine store. HUNK_{ORIGINAL,MODIFIED}_OFFSET and HUNK_{ORIGINAL,MODIFIED}_LENGTH are the hunk header values as parsed from the patch file (i.e. the "ID" of the hunk within the patch file). These also occur in the reject diff text but are stored here for easy retrieval. ### BH: Using a sha1 here, makes it impossible to cleanup the pristine store ### The pristine store needs all references to be stored in a DB column. ### To support this we would need an extra table. ### stsp: I'm fine with not storing the reject diff text if we don't ### have a good location for it. However, keeping it around in case ### the user deletes the tempfile would be nice. And I don't see an issue ### with also storing the SHA1 sum in the ACTUAL table. We do this for ### text conflicts as well. Why would it need an extra table? ("prop-reject" REJECT_FILE (PROPERTY_NAME TARGET_PATCH_SHA1 REJECTED_HUNK_LIST)* )