"I have a cunning plan"

                             or

             Entries Caching in the Access Batons


0. Preamble
   --------

Entries caching appears to be a good idea for making the client and WC
libraries faster.  There has been some discussion about this, but how
it could be implemented has never really been written down.  I have a
mental picture of how entries caching could work, but the picture is a
little blurred in places, and I'm a bit worried that the dark bit in
the corner (probably my thumb) may be obscuring important details.

The solution being developed as part of issue 749 is to use the
svn_wc_adm_access_t access batons to cache the results of
svn_wc_entries_read, so that the .svn/entries file does not need to
read and parsed repeatedly.

What makes this hard is that the entries file is currently accessed in
a large number of places in the code.  If we attempt to introduce
caching gradually there is a danger that we will mix code that uses
the cache with code that access the entries file directly.  Such
mixing is not a good idea, as it is possible that the cache and
entries file may get out of sync.  Even if we could ensure that each
client operation used the cache consistently (could we do that?)  it
would make future development hard, as we would need to ensure that
such consistency didn't break.  Introducing caching everywhere in a
single step is better, but the code changes to do it would be
gigantic.


1. Caching Interface
   -----------------

The plan is to identify the places where there needs to be an access
baton, and then make all the changes required to pass access batons
around within the code, but without attempting to introduce the
caching code.  This is being done in stages.  Once the access baton is
in place, I hope that it will then be possible to start using caching
everywhere in a single step.

The basic functions to retrieve entries are svn_wc_entries_read and
svn_wc_entry.  The function svn_wc__entries_write is used to update
the entries file on disk.  Simple really, only three functions, and
once the access baton gets this far we are more or less done!  The
trouble is that these functions are used everywhere, so the batons
have to be passed through a large number of other functions.

The basic caching read interface will consist of svn_wc_entry for a
single entry and svn_wc_entries_read for a hash of all entries, just
as it does now.  Initially these functions will work exactly as they
do right now, except they will have gained an additional access baton
parameter.  Once the functions support caching then switching caching
on should just involve very localised changes, as the entry interface
is the same with and without caching.

In the longer term it may be that svn_wc_entries_read will be removed
in favour of providing a set of functions that access the underlying
cache, thus allowing the access baton to track changes made.  However
initially I do not think this will be required, if the current code
gets a hash from svn_wc_entries_read and expects it to remain valid
then that expectation should still apply when caching is implemented.

At present access batons have a fairly strict interface, they must be
passed directory names, and the code always "knows" whether it is
supposed to have a baton for a particular directory or not (and thus
it knows whether to call svn_wc_adm_open or svn_wc_adm_retrieve).  One
tricky point is that svn_wc_entry is often called first, before any
access batons are opened, to determine if a given path represents a
versioned file or a versioned directory.  However svn_wc_entry falls
back on checking the physical working copy, so this functionality will
probably be copied or moved into an access baton convenience function
that allows opening an access baton without requiring knowledge of
whether the path is a file or a directory.

The basic caching write interface is svn_wc__entries_write.  Initially
this will write directly to the entries file, just as it currently
does.  Later on, modifications may be cached until an explicit
entries_flush call is made.  I haven't yet determined whether this
would be a significant benefit in terms of speed, or whether it would
risk losing changes if a process is interrupted.

The function svn_wc__entry_modify is written in terms of entries_read
and entries_write and has already been converted to take an access
baton.


2. Caching Mechanism
   -----------------

Each access baton represents a directory.  Access batons can associate
together in sets.  Given an access baton in a set, it possible to
retrieve any other access baton in the set.  When an access baton in a
set is closed, all other access batons in the set that represent
subdirectories are also closed.  The set is implemented as a hash
table "owned" by the one baton in any set, but shared by all batons in
the set.  Caching will be similar.

The cache hash tables will be "owned" by one baton in the set, but
shared by all batons.  Caching will be lazy, the cache will not be
populated until required (need to see how the TREE_LOCK behaviour in
svn_wc_adm_open interacts here).  Only entries covered by an access
baton will be available in the cache, when an access baton is closed
its entries will be removed from the cache.

At present in the code, access batons are opened in a parent->child
order.  This works well with the shared hash being owned by the first
baton in each set.  There is code to detect if closing a baton will
destroy the hash while other batons are using it, as far as I know it
doesn't currently trigger.  If it turns out that this needs to be
supported it should be possible to transfer the hash information to
another baton.


3. Access Baton Conversion
   -----------------------

Given a function
  svn_error_t *foo (const char *path);
if PATH is always a directory then the change that gets made is usually
  svn_error_t *foo (svn_wc_adm_access_t *adm_access);
Within foo, the original const char* can be obtained using
  const char *svn_wc_adm_access_path(svn_wc_adm_access_t *adm_access);

The above case sometimes occurs as
  svn_error_t *foo(const char *name, const char *dir);
where NAME is a single path component, and DIR is a directory. Conversion
is again simply in this case
  svn_error_t *foo (const char *name, svn_wc_adm_access_t *adm_access);

The more difficult case is
  svn_error_t *foo (const char *path);
where PATH can be a file or a directory.  This occurs a lot in the
current code. In the long term these may get converted to
  svn_error_t *foo (const char *name, svn_wc_adm_access_t *adm_access);
where NAME is a single path component.  However this involves more
changes to the code calling foo than are strictly necessary, so
initially they get converted to
  svn_error_t *foo (const char *path, svn_wc_adm_access_t *adm_access);
where PATH is passed unchanged and an additional access baton is
passed.  This interface is less than ideal, since there is duplicate
information in the path and baton, but since it involves fewer changes
in the calling code it makes a reasonable intermediate step.


4. Logging
   -------

As well as caching the other problem that needs to be addressed is the
issue of logging.  Modifications to the working copy are supposed to
use the log file mechanism to ensure that multiple changes that need
to be atomic cannot be partially completed.  If the individual changes
that may need to be logged are all forced to use an access baton, then
the access baton may be able to identify when the log file mechanism
should be used.  Combine this with an access baton state that tracks
whether a log file is being run and we may be able to automatically
identify those places that are failing to use the log file mechanism.


5. Status
   ------

Now: I'm currently working on a patch to pass the access baton to
svn_wc_entries_read, only one regression test failure at present!
I've cheated a bit, because svn_wc_entry is currently passing NULL to
svn_wc_entries_read, I really need to do svn_wc_entry to complete the
patch.

Next: svn_wc__entries_write should be simple once svn_wc_entries_read
is done.

Then: After the above is complete, the caching stuff might start.