******************************************************************************
                          REQUIREMENTS SPECIFICATION
                                     FOR
                            ISSUE #516: OBLITERATE
******************************************************************************


TABLE OF CONTENTS

    OPEN ISSUES

    1. INTRODUCTION
      1.1 Sources of Requirements

    2. USER STORIES
      2.1 Added secrets in a new file
      2.2 Added secrets into an existing file
      2.3 Added a single huge file by accident
      2.4 Repeated modification of a huge file

    3. REQUIREMENTS
      3.1 Levels of Obliteration
      3.2 Content of the Modified Repository
      3.3 Working Copies
      3.4 Access to the Modified Repository
      3.5 Audit Trail
      3.6 Svnsync Mirrors
      3.7 Permissions
      3.8 Time Taken


OPEN ISSUES

  (none)


1. INTRODUCTION

  This document captures the requirements for the Subversion feature commonly
  known as "Obliterate". It is intended to include all of the requirements
  that could be deemed to fall within the scope of an Obliterate feature. The
  set of requirements to be satisfied by a proposed development of such a
  feature may be a specified sub-set of those listed here.

  The purpose of this document is to enable a design to be evaluated and an
  implementation to be tested against specific criteria that are all written
  down in one place.

  Section 2 lists requirements from a user's point of view.

  Section 3 lists requirements from a software design point of view.

  1.1 Sources of Requirements

    The requirements are sourced from:
      * Comments in issue #516.
      * Comments on the Subversion developers' mailing list.
      * Personal experience of the authors.


2. USER STORIES

  The "user stories" are examples, described from a user's point of view, of
  scenarios in which the Obliterate feature should or might be used. Their
  purpose is to indicate the range and diversity of requirements, without
  being an exhaustive list of combinations. They loosely define the high-level
  requirements which the specific requirements in section 3 must satisfy.

  The following user stories are gathered from the sources in section 1 and
  include both typical and unusual use cases.

  2.1 Added secrets in a new file

    User U1 has just accidentally committed the addition of a new file F1 that
    contains confidential data (let's say people's addresses).  F1 is visible
    to other users of the repository. The probability of anyone committing
    another change before the administrator can intervene is low. The
    probability of anyone updating their WC to this revision is low.

    U1 wants to restrict the visibility and propagation of the confidential
    data as soon as possible.

    Possible solutions:
      * hide the existence of F1
      * replace the content of F1 with empty content
      * replace the content of F1 with its "previous" content (definition
        required)
      * replace the content of F1 with arbitrary other content
      * roll back the entire head revision (definition required)
      * something else.

  2.2 Added secrets into an existing file

    User U1 has just accidentally committed a change that adds confidential
    data (let's say people's addresses) into an existing file F1. F1 is
    visible to other users of the repository. The existence and other content
    of F1 is important to other users.

    U1 wants to restrict the visibility and propagation of the confidential
    data as soon as possible.

  2.3 Added a single huge file by accident

    User U1 has just accidentally committed the addition of a new file F1 that
    is huge and unwanted, with no other changes included in the commit.

    U1 wants to get rid of the file in order to save space and time on
    colleagues' WC updates.

  2.4 Repeated modification of a huge file

    User U1 keeps checking in the latest version of a huge file F1, in order
    to have them handy for testing. Nobody needs versions of F1 older than 2
    weeks; they can be re-generated from source if required. F1 is usually
    checked in alongside some modifications to source files.

    U1 wants to prune old versions of F1 regularly in order to limit server
    disk space usage.

    This use case is not directly what most people consider to be
    "obliterate". It is really a separate feature that could use the
    functionality of "obliterate" in its implementation, but could also be
    implemented in other ways.


3. REQUIREMENTS

  The requirements listed here are a set of design requirements that together
  would satisfy all of the user-level requirements. A successful design will
  satify most of these requirements to a large extent, but need not satisfy
  all of them completely.  A functional design document should specify which
  of these requirements it satisfies, and to what extent.

  Each requirement can be designated for convenience as "functional" or
  "non-functional". A functional requirement specifies what output is produced
  from what input, where input and output include such things as repositories,
  working copies and audit trails.  A non-functional requirement is a
  constraint on how the functional operation is performed, such as speed of
  operation or memory usage.

  3.1 Levels of Obliteration

    The requirements involve the following "levels" of obliteration:

      L1: hiding data from clients
        (a) avoiding sending the data in any new communications
        (b) removing data from repository mirrors that already have it
        (c) removing data from clients that already have it

      L2: hiding data from people with direct access to the server disk

      L3: recovering space on the server disk

    NOTES:

      L1 and L3 are directly relevant to the common use cases. Requirements
      for L2 are coneivable but appear not to be common.

  3.2 Content of the Modified Repository

    * At revisions older than the obliteration, the repository should yield
      exactly the same data that it used to.

      RATIONALE: A Subversion repository has no forward-looking metadata so
      there is no reason for old revisions to be changed so they should not be
      changed.

      EXCEPTIONS: Any manual adjustments to revision properties, such as to
      forward-looking comments in log messages or to third-party data in
      revision-0 properties.

    * At the revision of the obliterated data, the stored tree should be
      modified in a way to be specified in a Functional Spec. Briefly, two
      likely schemes are:
        (scheme "dd") each node to be obliterated is deleted; or
        (scheme "cc") each node to be obliterated becomes exactly like it was
          in the previous revision.

    * At each revision younger than the obliteration, the repository file
      system tree structure and content should look exactly as it used to.
      However, any node with a "copied from" pointer that pointed to a node
      which has been removed by obliteration should have this pointer adjusted
      or removed, as defined by the Functional Spec.

    NOTES:

      This description assumes per-revision granularity of obliteration.

  3.3 Working Copies

    * A WC managed by an obliterate-aware Subversion client and logically
      unaffected should show no sign that anything has happened.

    * A WC managed by an obliterate-aware Subversion client and logically
      affected by the change should behave in a friendly manner ...

    * A WC managed by an old (pre-obliterate) Subversion client and logically
      unaffected should show little or no sign that anything has happened, and
      should require no user intervention to continue working.

    * A WC managed by an old (pre-obliterate) Subversion client and logically
      affected by the change should ...

  3.4 Access to the Modified Repository

    * The modified repository should keep the same URL and UUID, and client
      access should continue without manual intervention, after any required
      down-time, for all working copies that are not logically affected by the
      obliteration.

      Rationale: Obliteration is often required in large repositories having
      large numbers of users, most of whom are not working near the
      obliterated data. If all users were impacted each time, then
      obliteration could become impractical.

  3.5 Audit Trail

    * On the client side, no trace of the obliteration need be visible other
      than the intended changes to versioned data and to revision properties.

    * On the server side, the administrator should be able to choose whether a
      record of obliterations is stored. The form and storage location of this
      record is not specified here.

    NOTES:

      Some customers are concerned about auditability and may want an audit
      trail to be stored with the repository so that it is included in backups
      and perpetually available for later examination.

  3.6 Svnsync Mirrors

    * A read-only mirror of the repository maintained by an old
      (pre-obliterate) version of "svnsync" should either keep all of its
      already-copied revisions exactly as they were, and continue to copy new
      revisions from the modified repository without any hiccup, or it should
      stop working so that its administrator has to intervene.

      Rationale: An old svnsync has no way to re-synchronize old revisions. If
      it behaves just like a regular client that had been taking snap-shots of
      the master repository, that would be logical and self-consistent but not
      propagating the obliteration; that's a problem for the secrecy use
      cases. If it requires human intervention, that would disrupt its users
      but would force a human to consider whether the mirrored data should be
      kept or modified. Ideally the administrator of the master repository
      would control which of these scenarios will occur.

    * A read-only mirror of the repository maintained by an obliterate-aware
      version of "svnsync" should re-synchronize its old revisions to match
      the modified master repository.

  3.7 Permissions

    * The data-hiding part of an obliterate should be available to a user
      with suitable permissions, from the client side, using a standard
      Subversion client installation.

    * The space-saving part of an obliterate should be available to an
      administrator, from the server side, using a standard Subversion server
      installation. This may also be available in the same way as the
      data-hiding part.

  3.8 Time Taken

    * The time from when an administrator discovers an accidental secrecy
      problem to when the data in question is unavailable to ordinary clients
      (that don't already have it) should be within minutes, or at most hours,
      on a large repository.

    * The time from when an administrator discovers an accidental large
      check-in until the data can be removed from the repository should be at
      most hours, on a large repository.  (The intent here is that an
      administrator should be able to avoid the data getting into a nightly
      back-up, if desired.)