Subversion on Berkeley DB                                    -*- text -*-

There are many different ways to implement the Subversion filesystem
interface.  You could implement it directly using ordinary POSIX
filesystem operations; you could build it using an SQL server as a
back end; you could build it on RCS; and so on.

This implementation of the Subversion filesystem interface is built on
top of Berkeley DB (http://www.sleepycat.com).  Berkeley DB supports
transactions and recoverability, making it well-suited for Subversion.


Nodes and Node Revisions

In a Subversion filesystem, a `node' corresponds roughly to an
`inode' in a Unix filesystem:
- A node is either a file or a directory.
- A node's contents change over time.
- When you change a node's contents, it's still the same node; it's
  just been changed.  So a node's identity isn't bound to a specific
  set of contents.
- If you rename a node, it's still the same node, just under a
  different name.  So a node's identity isn't bound to a particular
  filename.

A `node revision' refers to a node's contents at a specific point in
time.  Changing a node's contents always creates a new revision of that
node.  Once created, a node revision's contents never change.

When we create a node, its initial contents are the initial revision of
the node.  As users make changes to the node over time, we create new
revisions of that same node.  When a user commits a change that deletes
a file from the filesystem, we don't delete the node, or any revision
of it --- those stick around to allow us to recreate prior revisions of
the filesystem.  Instead, we just remove the reference to the node
from the directory.


ID's

Within the database, we refer to nodes and node revisions using strings
of numbers separated by periods that look a lot like RCS revision
numbers.

  node_id ::= number | node_revision_id "." number
  node_revision_id ::= node_id "." number

So: 
- "100" is a node id.
- "100.10" is a node revision id, referring to revision 10 of node 100.
- "100.10.3" is a node id, referring to the third branch based on
  revision 10 of node 100.
- "100.10.3.4" is a node revision id, referring to revision 4 of
  of the third branch from revision 10 of node 100.
And so on.

Node revision numbers start with 1.  Thus, N.1 is the first revision
of node N.

Node / branch numbers start with 1.  Thus, N.M.1 is the first
branch off of N.M.

A directory entry identifies the file or subdirectory it refers to
using a node revision number --- not a node number.  This means that
a change to a file far down in a directory hierarchy requires the
parent directory of the changed node to be updated, to hold the new
node revision ID.  Now, since that parent directory has changed, its
parent needs to be updated, and so on to the root.

If a particular subtree was unaffected by a given commit, the node
revision ID that appears in its parent will be unchanged.  When
doing an update, we can notice this, and ignore that entire
subtree.  This makes it efficient to find localized changes in
large trees.

Note that the number specifying a particular revision of a node is
unrelated to the global filesystem revision when that node revision
was created.  So 100.10 may have been created in filesystem revision
1218; 100.10.3.2 may have been created any time after 100.10; it
doesn't matter.

Since nodal revision numbers increase by one each time a delta is added,
we can compute how many deltas separate two related node revisions
simply by comparing their ID's.  For example, the distance between
100.10.3.2 and 100.12 is the distance from 100.10.3.2 to their
common ancestor, 100.10 (two deltas), plus the distance from 100.10
to 100.12 (two deltas).

However, this is kind of a kludge, since the number of deltas is
not necessarily an accurate indicator of how different two files
are --- a single delta could be a minor change, or a complete
replacement.  Furthermore, the filesystem may decide arbitrary to
store a given node revision as a delta or as full text --- perhaps
depending on how recently the node was used --- so revision id
distance isn't necessarily an accurate predictor of retrieval time.

If you have insights about how this stuff could work better, let me
know.  I've read some of Josh MacDonald's stuff on this; his
discussion seems to be mostly about how to retrieve things quickly,
which is important, but only part of the issue.  I'd like to find
better ways to recognize renames, and find appropriate ancestors in
a source tree for changed files.

When we need to treat a node or node revision ID as a string of bytes,
we use the forms shown here --- a series of unsigned ASCII decimal
numbers, separated by dots.


NODE-REVISION and HEADER: how we represent a node revision

We represent a given revision of a file or directory node using a list
skel (see skel.h for an explanation of skels).  A node revision skel
has the form:

    (HEADER PROP-KEY KIND-SPECIFIC ...)

where HEADER is a header skel, whose structure is common to all nodes,
PROP-KEY identifies the representation for the property list of this
node (see discussion of representations later), and the KIND-SPECIFIC
elements carry data dependent on what kind of node this is --- file,
directory, etc.

HEADER has the form:

    (KIND REV [COPY])

where:

   * KIND indicates what sort of node this is.  It must be one of the
     following:
       - "file", indicating that the node is a file (see FILE below).
       - "dir", indicating that the node is a directory (see DIR below).

   * REV indicates the revision in which this node first appeared.  An
     empty string atom here means this node was created in a
     transaction that is yet to be committed.  Once committed, REV is
     an atom indicating the revision number.

   * COPY, if present, indicates the node from which this node was
     copied.  It is a list skel: ("copy" ANCESOR-REV ANCESTOR-PATH)
     The word "copy" appears first because there may someday be other
     options in the header, and we'll want to distinguish among them.


Note that a node cannot change its kind from one revision to the next.
A directory node is always a directory; a file node is always a file;
etc.  The fact that the node's kind is stored in each node revision,
rather than in some revision-independent place, might suggest that it's
possible for a node change kinds from revision to revision --- 10.3
could be a directory, while 10.4 could be a file --- but Subversion
does not allow this.

PROP-KEY is a key into the `representations' table (which see), whose
value is a representation pointing to a string (see `strings' table)
that is a PROPLIST skel.

The KIND-SPECIFIC portions are discussed below.


PROPLIST: a property list is a list skel of the form:

    (NAME1 VALUE1 NAME2 VALUE2 ...)

where each NAMEi is the name of a property, and VALUEi is the value of
the property named NAMEi.  Every valid property list has an even
number of elements.


FILE: how files are represented.

If a NODE-REVISION's header's KIND is "file", then the node-revision
skel represents a file, and has the form:

    (HEADER PROP-KEY DATA-KEY [EDIT-DATA-KEY])

where DATA-KEY identifies the representation for the file's current
contents, and EDIT-DATA-KEY identifies the a representation currently
available for receiving new contents for the file.

See discussion of representations later.


DIR: how directories are represented.

If the header's KIND is "dir", then the node-revision skel
represents a directory, and has the form:

    (HEADER PROP-KEY ENTRIES-KEY)

where ENTRIES-KEY identifies the representation for the directory's
entries list (see discussion of representations later).  An entries
list has the form

    (ENTRY ...)

where each entry is

    (NAME ID)

where:
  - NAME is the name of the directory entry, in UTF-8, and
  - ID is the ID of the node revision to which this entry refers


REPRESENTATIONS: where and how Subversion stores your data.

Some parts of a node revision are essentially constant-length: for
example, the KIND field and the REV.  Other parts can have
arbitrarily varying length: property lists, file contents, and
directory entry lists.  This variable-length data is often similar
from one revision to the next, so Subversion stores just the deltas
between them, instead of successive fulltexts.

The HEADER portion of a node revision holds the constant-length stuff,
which is never deltified.  The rest of a node revision just points to
data stored outside the node revision proper.  This design makes the
repository code easier to maintain, because deltification and
undeltification are confined to a layer separate from node revisions,
and makes the code more efficient, because Subversion can retrieve
just the parts of a node it needs for a given operation.

Deltifiable data is stored in the `strings' table, as mediated by the
`representations' table.  Here's how it works:

The `strings' table stores only raw bytes.  A given string could be
any one of these:

   - a file's contents
   - a delta that reconstructs file contents, or part of a file's contents
   - a directory entry list skel
   - a delta that reconstructs a dir entry list skel, or part of same
   - a property list skel
   - a delta that reconstructs a property list skel, or part of same

There is no way to tell, from looking at a string, what kind of data
it is.  A directory entry list skel is indistinguishable from file
contents that just happen to look exactly like the unparsed form of a
directory entry list skel.  File contents that just happen to look
like svndiff data are indistinguishable from delta data.

The code is able to interpret a given string because Subversion

   a) knows whether to be looking for a property list or some
      kind-specific data,

   b) knows the `kind' of the node revision in question,

   c) always goes through the `representations' table to discover if
      any undeltification or other transformation is needed.

The `representations' table is an intermediary between node revisions
and strings.  Node revisions never refer directly into the `strings'
table; instead, they always refer into the `representations' table,
which knows whether a given string is a fulltext or a delta, and if it
is a delta, what it is a delta against.  That, combined with the
knowledge in (a) and (b) above, allows Subversion to retrieve the data
and parse it appropriately.  A representation has the form:

   (HEADER KIND-SPECIFIC)

where HEADER is

   (KIND FLAG ...)

The KIND is "fulltext" or "delta", and currently the only FLAG is
"mutable".  KIND-SPECIFIC varies considerably depending on the kind of
representation.  Here are the two forms currently recognized:

   (("fulltext" ...) KEY)
       The data is at KEY in the `strings' table.

   (("delta" ...) (OFFSET WINDOW) ...)
       Each OFFSET indicates the point in the fulltext that this
       element reconstructs, and WINDOW says how to reconstruct it:

       WINDOW ::= (DIFF SIZE CHECKSUM [REP-KEY [REP-OFFSET]]) ;
       DIFF   ::= ("svndiff" STRING-KEY)

       Notice that a WINDOW holds only metadata.  REP-KEY says what
       the window should be applied against, or none if this is a
       self-compressed delta; SIZE says how much data this window
       reconstructs; CHECKSUM is a checksum on just the portion of
       fulltext reconstructed by this window; and STRING-KEY says
       which string contains the actual svndiff data (there is no diff
       data held directly in the representations table, of course).

       Note also that REP-KEY might refer to a representation that
       itself requires undeltification.  For now, we just reconstruct
       fulltexts recursively until we get what we need; in the future,
       an efficient delta-combiner is planned that will retrieve the
       required data directly without expanding anything; that's where
       REP-OFFSET may prove useful.  We think. :-)

       Branko says this is what REP-OFFSET is for:
       > The offsets embedded in the svndiff are stored in a string;
       > these offsets would be in the representation. The point is that
       > you get all the information you need to select the appropriate
       > windows from the rep skel -- without touching a single
       > string. This means a bit more space used in the repository, but
       > lots less memory used on the server.

       We'll see if it turns out to be necessary.

In the future, there may be other representations, for example
indicating that the text is stored elsewhere in the database, or
perhaps in an ordinary Unix file.

Let's work through an example node revision:

   (("file" REV) PROP-KEY "2345")

The entry for key "2345" in `representations' is:

   (("delta") (0 (("svndiff" "1729") 65 CHECKSUM "2343")))

and the entry for key "2343" in `representations' is:

   (("fulltext") "1001")

while the entry for key "1729" in `strings' is:

   <some unprintable glob of svndiff data>

which, when applied to the fulltext at key "1001" in strings, results
in this new fulltext:

   "((some text) (that looks) (deceptively like) (directory entries))"

Et voila!  Subversion knew enough, via the `representations' and
`strings' tables, to undeltify and get that fulltext; and knew enough,
because of the node revision's "file" type, to interpret the result as
file contents, not as a directory entry list.

(Note that the `strings' table stores multiple DB values per key.
That is, although it's accurate to say there is one string per key,
the string may be divided into multiple consecutive blocks, all
sharing that key.  You use a Berkeley DB cursor to find the desired
value[s], when retrieving a particular offset+len in a string.)

Representations know nothing about ancestry -- the `representations'
table never refers to node revision id's, only to strings or to other
representations.  In other words, while the `nodes' table allows
recovery of ancestry information, the `representations' and `strings'
tables together handle deltification and undeltification
*independently* of ancestry.  At present, Subversion generally stores
the youngest strings in "fulltext" form, and older strings as "delta"s
against them.  However, there's nothing magic about that particular
arrangement.  Other interesting alternatives:

   - We could store the N most recently accessed strings as fulltexts,
     letting access patterns determine the most appropriate
     representation for each revision.

   - We could occasionally store deltas against the N'th younger
     revision, storing larger jumps with a frequency inverse to the
     distance covered, yielding a tree-structured history.

Since the filesystem interface doesn't expose these details, we can
change the representation pretty much as we please to optimize
whatever parameter we care about --- storage size, speed, robustness,
etc.

Representations never share strings.  Every string is represented by
exactly one representation; every representation represents exactly
one string.  This is so we can replace a string with deltified version
of itself, change the representation referring to it, and know that
we're not messing up any other reps by doing so.


Further Notes On Deltifying:
----------------------------

When a representation is deltified, it is changed in place, along with
its underlying string.  That is, the node revision referring to that
representation will not be changed; instead, the same rep key will now
be associated with different value.  That way, we get reader locking
for free: if someone is reading a file while Subversion is deltifying
that file, one of the two sides will get a DB_DEADLOCK and
svn_fs__retry_txn() will retry.

### todo: add a note about cycle-checking here, too.


The Berkeley DB "nodes" table

The database contains a table called "nodes", which is a btree indexed
by node revision ID's, mapping them onto REPRESENTATION skels.  Node 0
is always the root directory, and node revision 0.0 is always the empty
directory.

Since the "nodes" table is a btree, it's efficent to walk over the
entries in order of increasing or decreasing node ID's.  In fact,
Berkeley DB lets us provide our own key ordering function, so we can
arrange the nodes for quick traversal, given our access patterns.

Here's the sort order we use:

- Nodes are sorted by their node number.
- All the revisions of a given node come together, in order of 
  increasing revision number.
- All branches off any revision of a node come immediately after that
  node, ordered by increasing revision number.

For example, the "nodes" table might have the following sequence of
keys:

13.1     ; the original node, with four revisions
13.2
13.3
13.4
13.2.1.1   ; a branch from the second revision of 13, with three revisions
13.2.1.2
13.2.1.3
13.2.2.1   ; another branch off the same revision of 13, with four revisions
13.2.2.2
13.2.2.3
13.2.2.4
13.4.1.1   ; another branch off a later revision of 13, with two revisions
13.4.1.2
14.1     ; another node
14.2
14.3

We can find the latest revision of node N by searching the `nodes'
table for the last entry before N.1.1.1, the first branch off the
node's first revision.  According to the sort order, this node
immediately follows the last revision of node N.

Similarly, we can find the last branch off a given node revision N.V by
searching the `nodes' table for the last entry before N.(V+1).1.1.
According to the sort order, this node immediately follows the last
descendant of N.V.  This last descendent could be a branch off a
branch off a branch ..., but by truncating the node revision ID
appropriately, we can find the last branch off of N.V.

Assuming that we store the most recent revision on every branch as
fulltext, and all other revisions as deltas, we can retrieve any node
revision NODE.REVISION by searching for the last revision of NODE, and
then walking backwards to NODE.REVISION, applying deltas as we go.

Since we don't want to corrupt our btree, even if we accidentally
insert garbage keys into the table, we extend our ordering to handle
arbitrary byte strings: any mis-formed ID comes before any well-formed
ID, and two mis-formed IDs are compared byte-by-byte.


REVISION: filesystem revisions, and the Berkeley DB "revisions" table

We represent a filesystem revision using a skel of the form:
    ("revision" ID PROPLIST)
where:
- ID is the node revision ID of this revision's root directory, and
- PROPLIST is the revision's property list.

The database contains a table called "revisions", which is a
record-number table mapping revision numbers onto REVISION skels.
Since Berkeley DB record numbers start with 1, whereas Subversion
filesystem revision numbers start at zero, revision V is stored as
record number V+1 in the `revisions' table.  Filesystem revision zero
always has node revision 0.0 as its root directory; that node revision
is guaranteed to be an empty directory.


Transactions

Every transaction ends when it is either successfully committed, or
aborted.  We call a transaction which has been either committed or
aborted "finished", and one which hasn't "unfinished".  

Transactions are identified by positive decimal numbers, called
transaction ID's.  To help clients detect when a transaction has been
finished, transaction ID's are never reused.  In the database, we
always represent a tranasction ID in its shortest ASCII decimal form.

The Berkeley DB `transactions' table records unfinished transactions.
Every key in this table is a transaction ID, and every value is a skel
of the form:

    ("transaction" ROOT-ID BASE-ROOT-ID)

where ROOT-ID is the node revision ID of the transaction's root
directory, and BASE-ROOT-ID is the node revision ID of the root of the
transaction's base revision.

As the sole exception to the rule above, the `transactions' table
always has one entry whose key is `next-id', and whose value is the
lowest transaction ID that has never yet been used.  We use this entry
to allocate ID's for new transactions.

The `transactions' table is a btree, with no particular sort order.


Merge rules

The Subversion filesystem must provide the following characteristics:

- clients can submit arbitrary rearrangements of the tree, to be
  performed as atomic changes to the filesystem tree
- multiple clients can submit non-overlapping changes at the same time,
  without blocking
- readers must never block other readers or writers
- writers must never block readers
- writers may block writers

Merging rules:

   The general principle: a series of changes can be merged iff the
   final outcome is independent of the order you apply them in.

Merging two nodes, A and B, with respect to a common ancestor
ANCESTOR:
- First, the merge fails unless A, B, and ANCESTOR are all the same kind
  of node.
- If A and B are text files:
  - If A is an ancestor of B, then B is the merged result.
  - If A is identical to B, then B (arbitrarily) is the merged result.
  - Otherwise, the merge fails.
- If A and B are both directories:
  - For every directory entry E in either A, B, or ANCESTOR, here are
    the cases:
      - E exists in neither ANCESTOR nor A.
      - E doesn't exist in ANCESTOR, and has been added to A.
      - E exists in ANCESTOR, but has been deleted from A.
      - E exists in both ANCESTOR and A ...
        - but refers to different nodes.
        - but refers to different revisions of the same node.
        - and refers to the same node revision.

    The same set of possible relationships with ANCESTOR holds for B,
    so there are thirty-six combinations.  The matrix is symmetrical
    with A and B reversed, so we only have to describe one triangular
    half, including the diagonal --- 21 combinations.

    - (6) E exists in neither ANCESTOR nor A:
      - (1) E exists in neither ANCESTOR nor B.  Can't occur, by assumption
           that E exists in either A, B, or ancestor.
      - (1) E has been added to B.  Add E in the merged result. ***
      - (1) E has been deleted from B.  Can't occur, by assumption that E
        doesn't exist in ANCESTOR.
      - (3) E exists in both ANCESTOR and B.  Can't occur, by assumption that
        E doesn't exist in ancestor.
    - (5) E doesn't exist in ANCESTOR, and has been added to A.
      - (1) E doesn't exist in ANCESTOR, and has been added to B.  Conflict.
      - (1) E exists in ANCESTOR, but has been deleted from B.  Can't occur,
        by assumption that E doesn't exist in ANCESTOR.
      - (3) E exists in both ANCESTOR and B.  Can't occur, by assumption
        that E doesn't exist in ANCESTOR.
    - (4) E exists in ANCESTOR, but has been deleted from A.
      - (1) E exists in ANCESTOR, but has been deleted from B.  If
        neither delete was a result of a rename, then omit E from the
        merged tree.  *** Otherwise, conflict.
      - E exists in both ANCESTOR and B ...
        - (1) but refers to different nodes.  Conflict.
        - (1) but refers to different revisions of the same node.  Conflict.
        - (1) and refers to the same node revision.  Omit E from the merged
          tree. ***
    - (3) E exists in both ANCESTOR and A, but refers to different nodes.
      - (1) E exists in both ANCESTOR and B, but refers to different
        nodes.  Conflict.
      - (1) E exists in both ANCESTOR and B, but refers to different
        revisions of the same node.  Conflict.
      - (1) E exists in both ANCESTOR and B, and refers to the same
        node revision.  Replace E with A's node revision.  ***
    - (2) E exists in both ANCESTOR and A, but refers to different
      revisions of the same node.
      - (1) E exists in both ANCESTOR and B, but refers to different revisions
        of the same node.  Try to merge A/E and B/E, recursively.  ***
      - (1) E exists in both ANCESTOR and B, and refers to the same node
        revision.  Replace E with A's node revision.   ***
    - (1) E exists in both ANCESTOR and A, and refers to the same node
      revision.
      - (1) E exists in both ANCESTOR and B, and refers to the same
        node revision.  Nothing has happened to ANCESTOR/E, so no change
        is necessary.

*** == something actually happens


Non-Historical Properties

[[Yes, do tell.]]


Layers

In previous structurings of the code, I had trouble keeping track of
exactly who has implemented which promises, based on which other
promises from whom.

I hope the arrangement below will help me keep things straight, and
make the code more reliable.  The files are arranged in order from
low-level to high-level: each file depends only on services provided
by the files before it.

skel.c, id.c, dbt.c, convert-size.c

                Low-level utility functions.

fs.c            Creating and destroying filesystem objects.

err.c           Error handling.

nodes-table.c, txn-table.c, rev-table.c, reps-table.c, strings-table.c

                Create and open particular database tables.
                Responsible for intra-record consistency.

node-rev.c      Creating, reading, and writing node revisions.
                Responsible for deciding what gets deltified when.

reps-strings.c
                Retrieval and storage of represented strings.
                This will handle delta-based storage,

dag.c           Operations on the DAG filesystem.  "DAG" because the
                interface exposes the filesystem's sharing structure.
                Enforce inter-record consistency.

tree.c          Operations on the tree filesystem.  This layer is
                built on top of dag.c, but transparently distinguishes
                virtual copies, making the underlying DAG look like a
                real tree.  This makes incomplete transactions behave
                like ordinary mutable filesystems.

delta.c         Computing deltas.


Appendix: Filesystem structure summary
======================================

Berkeley DB tables
------------------

                "nodes" : btree(ID -> NODE-REVISION)
            "revisions" : recno(REVISION)
         "transactions" : btree(TXN -> TRANSACTION,
                                "next-id" -> TXN)


Syntactic elements
------------------

Table keys:

                     ID ::= node.revision-id ;
                    TXN ::= number ;


Filesystem revisions:

               REVISION ::= ("revision" ID PROPLIST) ;
               PROPLIST ::= (PROP ...) ;
                   PROP ::= atom atom ;


Transactions:

            TRANSACTION ::= ("transaction" ROOT-ID BASE-ROOT-ID PROPLIST) ;
                ROOT-ID ::= node.revision-id ;
           BASE-ROOT-ID ::= node.revision-id ;


Node revisions:


          NODE-REVISION ::= FILE | DIR ;
                   FILE ::= (HEADER PROP-KEY DATA-KEY [EDIT-DATA-KEY]) ;
                    DIR ::= (HEADER PROP-KEY ENTRIES-KEY) ;
                  ENTRY ::= (NAME ID) ;
                   NAME ::= atom ;

                 HEADER ::= (KIND REV [COPY]) ;
                   KIND ::= "file" | "dir" ;
                    REV ::= "" | number ;

                   COPY ::= ("copy" ANCESTOR-REV ANCESTOR-PATH)
           ANCESTOR-REV ::= number ;
          ANCESTOR-PATH ::= atom ;

        SOURCE-REVISION ::= number ;

               PROP-KEY ::= atom ;
                REP-KEY ::= atom ;


Representations:

         REPRESENTATION ::= FULLTEXT | DELTA ;
               FULLTEXT ::= (("fulltext" FLAG ...) STRING-KEY) ;
                  DELTA ::= (("delta" FLAG ...) (OFFSET WINDOW) ...) ;
                   FLAG ::= "mutable" | ;
                 WINDOW ::= (DIFF SIZE CHECKSUM [REP-KEY [REP-OFFSET]]) ;
                   DIFF ::= ("svndiff" STRING-KEY) ;
                REP-KEY ::= atom ;
             STRING-KEY ::= atom ;
                 OFFSET ::= number ;
             REP-OFFSET ::= number ;

                   SIZE ::= number ;
               CHECKSUM ::= ("md5" BYTES) ;
                  BYTES ::= atom ;

Strings:

                 STRING ::= RAWTEXT | LISTTEXT | DIFFTEXT
                RAWTEXT ::= /{anything.class}*/ ;
               LISTTEXT ::= list ;
               DIFFTEXT ::= /{anything.class}*/ ;


Lexical elements
----------------

Node & revision IDs:

                node.id ::= number | node.revision-id '.' number ;
       node.revision-id ::= node.id '.' number ;
                 number ::= /{digit.class}+/ ;


(Note: the following are described in skel.h)
Skels:

                   skel ::= atom | list;
                   list ::= list.head list.body.opt list.tail ;
                   atom ::= atom.imp-len | atom.exp-len ;

              list.head ::= '(' spaces.opt ;
              list.tail ::= spaces.opt ')' ;
          list.body.opt ::=  | list.body ;
              list.body ::= skel | list.body spaces.opt skel ;

           atom.imp-len ::= /{name.class}[^\(\){ws.class}]*/ ;
           atom.exp-len ::= /({digit.class}+){ws.class}.{\1}/ ;

             spaces.opt ::= /{ws.class}*/ ;


Character classes:

               ws.class ::= [\t\n\f\r\ ] ;
            digit.class ::= [0-9] ;
             name.class ::= [A-Za-z] ;
         anything.class ::= anything at all ;