This spec defines how the Pristine Store works (part A) and how the WC
uses it (part B).


A. THE PRISTINE STORE
=====================

=== A-1. Introduction ===

The Pristine Store is inherently just a blob store.  Texts in the Pristine
Store are addressed by their SHA-1 checksum.

The Pristine Store data is held in
 * the 'PRISTINE' table in the SQLite Data Base (SDB), and
 * the files in the 'pristine' directory.

Currently the texts are stored verbatim; in future they could be stored
compressed.

The Working Copy library uses the Pristine Store to hold a local copy of
the "base" or "pristine" version of each file.  The WC library uses it
only for the content of files, not for directory listings nor symbolic
links nor properties.  This usage could change in future.

The Pristine Store itself does not track which text relates to which
repository and revision and path; that information is stored in the NODES
table and managed by a higher layer of logic within libsvn_wc.


This specification defines how the store operates so as to ensure
  * consistency between disk and DB;
  * atomicity of, and arbitration between, add and delete and read
    operations.

==== A-2. Invariants ====

The operating procedures below maintain the following invariants.
These invariants apply at all times except within the SDB txns defined
below.

(a) Each row in the PRISTINE table has an associated pristine text file
    that is not open for writing and is available for reading and whose
    content matches the columns 'size', 'checksum', 'md5_checksum'.

(b) Once written, a pristine text file in the store never changes.

Note that although there is a file matching each row, there is not
necessarily a row matching each file that exists in the 'pristine'
files directory.  If Subversion crashes while adding or removing a
pristine text, it can leave such a file, which is known as an "orphan"
file.

==== A-3. Operating Procedures ====

The numbered steps should be carried out in the order specified.  (See
rationale.)

(a) To add a pristine, do the following inside an SDB txn:
    0. Acquire a 'RESERVED' lock.
    1. Add the table row, and set the refcount as desired.  If a row
       already exists, add the desired refcount to its refcount, and
       preferably verify the old row matches the new metadata.
    2. Create the file. Creation should be fs-atomic, e.g. by moving a
       new file into place, so as never to orphan a partial file.  If a
       file already exists, preferably leave it rather than replace it,
       and optionally verify it matches the new metadata (e.g. length).

(b) To remove a pristine, do the following inside an SDB txn:
    0. Acquire a 'RESERVED' lock.
    1. Check refcount == 0, and abort if not.
    2. Delete the table row.
    3. Delete the file or move it away. (If not present, log a
       consistency error but, in a release build, return success.)

(c) To query a pristine's existence or SDB metadata, the reader must:
    1. Simply query the 'PRISTINE' table row.  If that row exists, the
       pristine text is in the store and its metadata is in the row; if
       not, the pristine text is not currently in the store.

       NOTE: Subject to the higher-level rules in part B, the pristine
       text may be removed from the store at any later time.

(d) To read a pristine text, the reader must:
    1. Query the SDB and open the file within the same SDB txn (to ensure
       that no pristine-remove txn (A-3(b)) is in progress at the same
       time).
    2. Keep the file handle open until all required data has been read from
       it. (If the pristine text is removed from the store by procedure (b),
       the file's data will remain readable as long as the file handle is
       open, whereas the file's directory entry may disappear.)
    3. Close the file handle.

(e) To clean up "orphan" pristine files:
    1. Check that the work queue is empty.
    2. 

###?

==== A-4. Rationale ====

(a) Adding a pristine:
     * We can't add the file *before* the SDB txn takes out a lock,
       because that would leave a gap in which another process could
       see this file as an orphan and delete it.
     * Within the txn, the table row could be added after creating the
       file; it makes no difference as it will not become externally
       visible until commit.  But then we would have to take out a lock
       explicitly before adding the file: see rationale (c).
     * Leaving an existing file in place is less likely to interfere with
       processes that are currently reading from the file.  Replacing it
       might also be acceptable, but that would need further
       investigation.

(b) Removing a pristine:
     * We can't remove the file *after* the SDB txn that updates the
       table, because that would leave a gap in which another process
       might re-add this same pristine file and then we would delete it.
     * Within the txn, the table row could be removed after removing the
       file; it makes no difference as it will not become externally
       visible until commit.  But then we would have to take out a lock
       explicitly before removing the file: see rationale (c).
     * In a typical use case for removing a pristine text, the caller
       would check the refcount before starting this txn, but
       nevertheless it may have changed and so must be checked again
       inside the txn.

(c) In both the 'add' (a) and 'remove' (b) txns, we need to acquire a lock
    that blocks both readers and writers (an SQLite 'RESERVED' lock)
    before adding or removing the file on disk.  We could acquire this
    explicitly (e.g. by starting the txn with 'BEGIN IMMEDIATE');
    alternatively SQLite will upgrade the default 'SHARED' lock to
    'RESERVED' the first time we write to a table.

==== A-5. Notes ====

(a) This procedure can leave orphaned pristine files (files without a
    corresponding SDB row) if Subvsersion crashes.  The Pristine Store
    will still operate correctly.  We should ensure that "svn
    cleanup" deletes these.

(b) This specification is conceptually simple, but requires completing disk
    operations within SDB transactions, which may make it too inefficient
    in practice.  An alternative specification could use the Work Queue to
    enable more efficient processing of multiple transactions.

(c) [G Stein] Note that my initial design for the pristine inserted a row
    which effectively said "we know about this pristine, but it hasn't
    been written yet". The file would be put into place, then the row
    would get tweaked to say "it is now there". That avoids the disk I/O
    within a sqlite txn.


B. REFERENCE COUNTING
=====================

=== B-1. Introduction ===

The Pristine Store spec 'A' above defines how texts are added and removed
from the store.  This spec defines how the addition and removal of
pristine text references within the WC DB are co-ordinated with the
addition and removal of the pristine texts themselves.

One requirement is to allow a pristine text to be stored some
time before the reference to it is written into the NODES table.  The
'commit' operation, for example, the way it is implemented in Subversion,
needs to store a file's new pristine text somewhere (and the pristine
store is an obvious option) and then, when the commit succeeds, update the
WC to reference it.

Store-then-reference could be achieved in several different ways, such as:

  (a) Store text outside Pristine Store.  When commit succeeds, add it
      to the Pristine Store and reference it in the WC; if commit
      fails, remove the temporary text.
  (b) Store text in Pristine Store with initial ref count = 0.  When
      commit succeeds, add the reference and update the ref count; if
      commit fails, optionally try to purge this pristine text.
  (c) Store text in Pristine Store with initial ref count = 1.  When
      commit succeeds, add the reference; if commit fails, decrement
      the ref count and optionally try to purge it.

Method (a) would require, in effect, implementing an ad-hoc temporary
Pristine Store, which seems needless duplication of effort.  It would
also require changing the way the commit code path passes information
around, which might be no bad thing in the long term, but the result
would not appear to have any advantage over method (b).

Method (b) plays well with automatically maintaining the ref counts
equal to the number of in-SDB references, at the granularity of SDB
txns.  It requires an interlock between adding/deleting references and
purging unreferenced pristines - e.g. guard each of these operations by
a WC lock.
  * Add a pristine, then later reference it => need to hold a WC lock.
    (To prevent purging it while adding.)
  * Unreference a pristine => no lock needed.
  * Unreference a pristine & purge-if-0 => Same as doing these separately.
  * Purge any/all refcount==0 pristines => an exclusive WC lock.
    (To prevent adding a ref while purging.)
  * If a WC lock remains after a crash, then purge refcount==0 pristines.

Method (c):
  * ### Not sure about this one - haven't thought it through in detail...
  * Add a pristine & reference in separate steps => any WC lock (?)
  * Remove a reference requires ... (nothing more?)
  * Find & purge unreferenced pristines requires an exclusive WC lock.
  * Ref counts are sometimes too high while a WC lock is held, so
    uncertain after a crash if WC locks remain, so need to be re-counted
    during clean-up.

We choose method (b).


=== B-2. Invariants in a Valid WC DB State ===

### TODO: This section needs work - it is not accurate.

  (a) No pristine text, even if refcount == 0, will be deleted from the
      store as long as any process holds any WC lock in this WC.

The following conditions are always true outside of a SQL txn:

  (b) The 'checksum' column in each NODES table row is either NULL or
      references a primary key in the 'pristine' table.

  (c) The 'refcount' column in each PRISTINE table row is equal to the
      number of NODES table rows whose 'checksum' column references this
      pristine row.  (Note: The ACTUAL_NODE table is designed to be able
      to hold references to pristine texts involved in conflicts, but this
      functionality is not implemented yet and is not yet included in this
      spec.)

The following conditions are always true
    outside of a SQL txn,
    when the Work Queue is empty:
### [JAF] What's this about the Work Queue here? Not sure that's intended.
    when no WC locks are held by any process:

  (d) The 'refcount' column in a PRISTINE table row equals the number of
      NODES table rows whose 'checksum' column references that pristine
      row.  It may be zero.

==== B-3. Operating Procedures ====

This section defines operations on the WC metadata that involve adding and
removing a pristine text along with a NODES table row that refers to it.
These operations are a layer above, and built on top of, those defined in
section A-3.

The numbered steps should be carried out in the order specified.

(a) To add a pristine text reference to the WC, obtain the text and its
    checksum, and then do this while holding a WC lock:
    (1) Add the pristine text to the Pristine Store (procedure A-3(a)),
        setting the desired refcount >= 1.
    (2) Add the reference(s) in the NODES table.

(b) To remove a pristine text reference from the WC, do this while holding
    a WC lock:
    (1) Remove the reference(s) in the NODES table.
    (2) Decrement the pristine text's 'refcount' column.

(c) To purge an unreferenced pristine text, do this with an exclusive
    WC lock (see note (a)):
    (1) Check refcount == 0; skip if not.
    (2) Remove it from the pristine store (procedure A-3(b)).

==== B-4. Notes ====

(a) An exclusive WC lock is obtained by acquiring a recursive lock on the
    WC root.

(b) Invariant B-2(b) is enforced by constraints defined in
    wc-metadata.sql.

(c) Invariant B-2(c) is currently assisted by triggers defined in
    wc-metadata.sql, but not enforced.