Repository Transaction Design for Svn Obliterate

Table of Contents

  1. Server — How the server works
    1. Filesystem
      1. Transactions of Obliteration
  2. References

Server — How the server works

Filesystem

Transactions of Obliteration

This section describes how the Obliterate Transaction works, using the model of obliteration in which an obliterated node-rev is deleted from its revision (and not replaced with other content). The functional spec diagrams for this model are in <notes/obliterate/fspec-dd1/>.

Simple Obliterate

Let us obliterate the file node-rev called "tuna" in r2, where the content said "Fried". We don't want anyone to see that content any longer.

More precisely, let us obliterate the "tuna" entry in the directory known as "fish" in r2. If a copy of the file node whose content is "Fried", and that was called "tuna" in r2, also exists elsewhere (in a tag, a branch, a copy or another revision), let us choose to let all such copies remain in the repository for now. (We can obliterate them separately if we wish.) In this example, there are no such copies of the file node.

       ______________________________________________________
      |___1_______2________3________4________5_________6_____...
          |        \
          |         \_____
       ___|____         __\_____
      |D       |       |D       |            Key:
      |        |       |        |              Each rectangle with 'D' or
      |   A    |       |   A    |              'F' in top-left corner is
      |    \   |       |    \   |              a separate node - a dir or
      |   B \  |       |   B \  |              file node respectively.
      |__/___\_|       |__/___\_|              Text shown in a node is its
        /     \          /     \               content, not its name.
       |    ___\________/       \
       |   /    \                \
    ___|__/   ___\____         ___\____
   |D      | |D       |       |D       |
   |       | |        |       |        |
   |       | | fish   |       | fish   |
   |_______| |___\____|       |___\____|
                  \                \
                   \                \
                 ___\____         ___\____
                |D       |       |D       |
                |        |       |        |
                | tuna   |       | tuna   |   <- entry to obliterate
                |___\____|       |___\____|
                     \                \
                      \                \
                    ___\____         ___\____
                   |F       |       |F       |
                   |        |       |        |
                   |"Fresh" |       |"Fried" |
                   |________|       |________|

We construct a transaction containing a new version of the r2 tree. It is like the old r2 tree except we have a new node to hold the directory that no longer contains tuna, and a new node for each of its parent directories up to the root.

We will do this in two steps. First, a begin_obliteration_txn() function will create a new transaction that is a mutable clone of the old r2. Second, we will use the normal transaction editing functions to delete the "tuna" entry.

       ______________________________________________________
      |___1_______2________3________4________5_________6_____...
          |        \
          |         \_____
       ___|____         __\_____         ________
      |D       |       |D       |       |D       |
      |        |       |        |       |        |   <- new r2 tree
      |   A    |       |   A    |       |   A    |
      |    \   |       |    \   |       |    \   |
      |   B \  |       |   B \  |       |   B \  |
      |__/___\_|       |__/___\_|       |__/___\_|
        /  ______________/_____\__________/     \
       |  / ___\________/       \                \
       | / /    \                \                \
    ___|/_/   ___\____         ___\____         ___\____
   |D      | |D       |       |D       |       |D       |
   |       | |        |       |        |       |        |
   |       | | fish   |       | fish   |       | fish   |
   |_______| |___\____|       |___\____|       |___\____|
                  \                \                \
                   \                \                \
                 ___\____         ___\____         ___\____
                |D       |       |D       |       |D       |
                |        |       |        |       |        |
                | tuna   |       | tuna   |       |        |
                |___\____|       |___\____|       |________|
                     \                \
                      \                \
                    ___\____         ___\____
                   |F       |       |F       |
                   |        |       |        |
                   |"Fresh" |       |"Fried" |
                   |________|       |________|

We link the new transaction in as r2, replacing the old r2. The old r2 tree does not yet go away; its nodes are still present and other revisions and transactions may be referring to them. This operation is analogous to the "finalize" stage of a commit, but it is not the final operation in an obliteration.

       ______________________________________________________
      |___1_______2________3________4________5_________6_____...
          |        \______________________
          |                               \
       ___|____         ________         __\_____
      |D       |       |D       |       |D       |   <- r2 is now this tree
      |        |       |        |       |        |
      |   A    |       |   A    |       |   A    |
      |    \   |       |    \   |       |    \   |
      |   B \  |       |   B \  |       |   B \  |
      |__/___\_|       |__/___\_|       |__/___\_|
        /  ______________/_____\__________/     \
       |  / ___\________/       \                \
       | / /    \                \                \
    ___|/_/   ___\____         ___\____         ___\____
   |D      | |D       |       |D       |       |D       |
   |       | |        |       |        |       |        |
   |       | | fish   |       | fish   |       | fish   |
   |_______| |___\____|       |___\____|       |___\____|
                  \                \                \
                   \                \                \
                 ___\____         ___\____         ___\____
                |D       |       |D       |       |D       |
                |        |       |        |       |        |
                | tuna   |       | tuna   |       |        |
                |___\____|       |___\____|       |________|
                     \                \
                      \                \
                    ___\____         ___\____
                   |F       |       |F       |
                   |        |       |        |
                   |"Fresh" |       |"Fried" |
                   |________|       |________|

Now the old r2 tree is orphaned. Some of its nodes are no longer referenced and can therefore be deleted. In particular, the old file-rev containing "Fried" is no longer referenced through r2 - and in our example it is not referenced from any other revision either.

We could delete the orphaned nodes right away, or we could leave them in place for now and at some later time trawl through the list of all nodes looking for orphans and deleting them. The latter scheme is known as "garbage collection".

Let us assume for now that we will delete the orphaned nodes right away. In fact, we will do it during the "finalization" so as not to leave any orphaned nodes in the tree. (The "no dead nodes" rule in <www/design.html#server.fs.struct>.)

Starting from the root directory node of the old r2, we traverse that tree, deleting orphaned nodes from the top down. First we delete that root directory node. (This is just an illustration; the implementation can delete them in whatever order or manner it chooses, as long as it is transactional.)

       ______________________________________________________
      |___1_______2________3________4________5_________6_____...
          |        \______________________
          |                               \
       ___|____                          __\_____
      |D       |        * * * *         |D       |
      |        |       * GONE! *        |        |
      |   A    |        * * * *         |   A    |
      |    \   |                        |    \   |
      |   B \  |                        |   B \  |
      |__/___\_|                        |__/___\_|
        /  _______________________________/     \
       |  /    \                                 \
       | /      \                                 \
    ___|/__   ___\____         ________         ___\____
   |D      | |D       |       |D       |       |D       |
   |       | |        |       |        |       |        |
   |       | | fish   |       | fish   |       | fish   |
   |_______| |___\____|       |___\____|       |___\____|
                  \                \                \
                   \                \                \
                 ___\____         ___\____         ___\____
                |D       |       |D       |       |D       |
                |        |       |        |       |        |
                | tuna   |       | tuna   |       |        |
                |___\____|       |___\____|       |________|
                     \                \
                      \                \
                    ___\____         ___\____
                   |F       |       |F       |
                   |        |       |        |
                   |"Fresh" |       |"Fried" |
                   |________|       |________|

After deleting the root directory node, we leave the directory node known as "B" in place because it still has other parents, but delete that known as "A" because it becomes orphaned. Similarly we delete the dir known as "fish" and the file known as "tuna".

       ______________________________________________________
      |___1_______2________3________4________5_________6_____...
          |        \______________________
          |                               \
       ___|____                          __\_____
      |D       |                        |D       |
      |        |                        |        |
      |   A    |                        |   A    |
      |    \   |                        |    \   |
      |   B \  |                        |   B \  |
      |__/___\_|                        |__/___\_|
        /  _______________________________/     \
       |  /    \                                 \
       | /      \                                 \
    ___|/__   ___\____                          ___\____
   |D      | |D       |                        |D       |
   |       | |        |                        |        |
   |       | | fish   |                        | fish   |
   |_______| |___\____|                        |___\____|
                  \                                 \
                   \                                 \
                 ___\____                          ___\____
                |D       |                        |D       |
                |        |                        |        |
                | tuna   |                        |        |
                |___\____|                        |________|
                     \
                      \
                    ___\____
                   |F       |
                   |        |
                   |"Fresh" |
                   |________|

That is the end of a simple obliteration, in which there were no other references in the repository to the obliterated node or its parents.

Note that there is no restriction on references to other sub-trees of the original r2. For example, existing or future references to the node called "B" are unaffected, even if a transaction using such a reference is being constructed while this obliteration is happening.

Construction and Finalization Checks

Consider a case in which a normal commit transaction is being constructed at the same time as the above obliteration is carried out. The new commit transaction includes a pointer to the node called "fish" in old r2, which exists at the time of construction. By the time we try to finalize this commit, the obliteration has been finalized and the node pointed to no longer exists. We cannot allow a transaction with a broken link to be committed.

Finalization must include a check that all referenced existing nodes do in fact still exist. This check must be performed both in normal commit finalization and in obliterate finalization.

That check could be costly if implemented naïvely, but there are ways to make it cheap. First, ensure no invalid node is added during construction. Second, detect whether any obliteration has been performed in the repository since the transaction was started. ### TODO: "Obliteration Serial Number".

Obliterating a Node that is Referenced Elsewhere

We don't mean to obliterate the "node" but rather the directory entry that points to the node. If another directory entry in this same revision or in another revision points to the node, that is no problem: we will simply re-write the directory so as to remove the entry, but we will not delete the node itself because it will not be orphaned.

References

Diagram of a schema for MYSQL by Edmund Horner, which is like the BDB schema except with exploding of skels into table columns - transaction_props, transaction_copies, representation_windows.

FSFS structure doc

FS-BDB structure doc: nodes, node-revs, keys, skels, reps, deltifying, txns, changes, copies, locks, merge rules, uuid, summary (structure syntax), misc-table (forward-delta support)

FS-BDB fs-history doc: DAG node-revs, node-rev-ids, copies, copy-ids