Implementing Sparse Directory Support in SVN ######################################################################### ### ### ### Note: This feature used to be called "incomplete directories"; ### ### It is now called "sparse directories", because "incomplete" ### ### made it sound like something was wrong with your directories. ### ### ### ######################################################################### Contents ======== 1. Design 2. User Interface 3. Examples 4. Implementation Strategy 5. Compatability Matters 6. Current Status 1. Design ========= This design document started out as a post by Eric Gillespie: http://subversion.tigris.org/servlets/ReadMsg?list=dev&msgNo=117053 From: Eric Gillespie To: dev@subversion.tigris.org Subject: [PROPOSAL] Incomplete working copies (issue #695) Date: Thu, 22 Jun 2006 22:35:06 -0700 Message-ID: <25668.1151040906@gould.diplodocus.org> [The design has evolved since then; the text below is not exactly the same as what Eric posted, but has the same general ideas.] I'd like to propose a new solution to this issue, and hopefully get it into 1.5. What i'm really looking for is the kind of flexibility Perforce has with its client specs in which parts of a tree you check out. I don't think Ben Reser's proposal (http://svn.haxx.se/dev/archive-2005-07/0398.shtml) covers this. Using his first example, there is no way to avoid pulling in trunk/foo/images/another-big-dir when it is added. This is based on an idea from Karl Fogel. Implementing Incomplete Directory Support in SVN ================================================== Many users have very large trees of which they only want to checkout certain parts. checkout -N is not today up to this task. This proposal introduces the --depth option to the checkout, switch, and update subcommands as a replacement for -N, which allows working copies to have very specific contents, leaving out everything the user does not want. This is similar to Perforce's client specs, but without the ability to have a repository entry have a different name in the working copy. We actually already have this capability in switch. Depth: We have a new "depth" field in .svn/entries, which has (currently) four possible values: depth-empty, depth-files, depth-immediates, and depth-infinity. Only this_dir entries may have depths other than depth-infinity. depth-empty ------> Updates will not pull in any files or subdirectories not already present. depth-files ------> Updates will pull in any files not already present, but not subdirectories. depth-immediates -> Updates will pull in any files or subdirectories not already present; those subdirectories' this_dir entries will have depth-empty. depth-infinity ---> Updates will pull in any files or subdirectories not already present; those subdirectories' this_dir entries will have depth-infinity. Equivalent to today's default update behavior. The --depth option sets depth values as it updates the working copy, setting any new subdirectories' this_dir depth values as described above. 2. User interface ================= Affected commands: * checkout * switch * update * status * info The -N option becomes a synonym for --depth=files for these commands. This changes the existing -N behavior for these commands, but in a trivial way (see below). checkout without --depth or -N behaves the same as it does today. switch and update without --depth or -N behave the same way as today IFF the working copy is fully depth-infinity. switch and update without --depth or -N will NOT change depth values (exception: a missing directory specified on the command line will be pulled in). Thus, 'checkout' is identical to 'checkout --depth=infinity', but 'switch' and 'update' are not the same as 'switch --depth=infinity' and 'update --depth=infinity'. The former update entries according to existing depth values, while the latter pull in everything. To get started, run checkout with --depth=empty or --depth=files. If additional files or directories are desired, pull them in with update commands using appropriate --depth options. The 'svn status' should list the depth status of the directories, in addition to whatever statuses are being currently listed. The 'svn info' command should list the depth, iff invoked on a directory whose depth is not the default (depth infinity). 3. Examples =========== svn co http://.../A Same as today; everything has depth-infinity. svn co -N http://.../A Today, this creates wc containing only mu. Now, this will be identical to 'svn co --depth=files /A'. svn co --depth=empty http://.../A Awc Creates wc Awc, but *empty*. Awc/.svn/entries this_dir depth-empty svn co --depth=files http://.../A Awc1 Creates wc Awc1 with all files (i.e., Awc1/mu) but no subdirectories. Awc1/.svn/entries this_dir depth-files ... svn co --depth=immediates http://.../A Awc2 Creates wc Awc2 with all files and all subdirectories, but subdirectories are *empty*. Awc2/.svn/entries this_dir depth-immediates B C Awc2/B/.svn/entries this_dir depth-empty Awc2/C/.svn/entries this_dir depth-empty ... svn up Awc/B: Since B is not yet checked out, add it at depth infinity. Awc/.svn/entries this_dir depth-empty B Awc/B/.svn/entries this_dir depth-infinity ... Awc/B/E/.svn/entries this_dir depth-infinity ... ... svn up Awc Since A is already checked out, don't change its depth, just update it. B and everything under it is at depth-infinity, so it will be updated just as today. svn up --depth=immediates Awc/D Since D is not yet checked out, add it at depth-immediates. Awc/.svn/entries this_dir depth-empty B D Awc/D/.svn/entries this_dir depth-immediates ... Awc/D/G/.svn/entries this_dir depth-empty ... svn up --depth=empty Awc/B/E Remove everything under E, but leave E as an empty directory since B is depth-infinity. Awc/.svn/entries this_dir depth-empty B D Awc/B/.svn/entries this_dir depth-infinity ... Awc/B/E/.svn/entries this_dir depth-empty ... svn up --depth=empty Awc/D Remove everything under D, and D itself since A is depth-empty. Awc/.svn/entries this_dir depth-empty B svn up Awc/D Bring D back at depth-infinity. Awc/.svn/entries this_dir depth-empty ... Awc/D/.svn/entries this_dir depth-infinity ... ... svn up --depth=immediates Awc Bring in everything that's missing (C/ and mu) and empty all subdirectories (and set their this_dir to depth-empty). Awc/.svn/entries this_dir depth-immediates B C Awc/B/.svn/entries this_dir depth-empty Awc/C/.svn/entries this_dir depth-empty ... 4. Implementation Strategy ========================== It would be nice if all this could be accomplished with just simple tweaks to how we drive the update reporter (svn_ra_reporter2_t). However, it looks like it's not going to be that easy. Handling 'checkout --depth=empty' would be easy. It should get us an empty directory at depth-empty, with no files and no subdirs, and if we just report it as at HEAD every time, the server will never send updates down (hmmm, this could be a problem for getting dir property updates, though). Then any files or subdirs we have explicitly included we can just report at their respective revisions, and get proper updates; at least that'll work for the depth infinity ones. But consider 'checkout --depth=immediates'. The desired state is a depth-immediates directory D, with all files up-to-date, and with skeleton subdirs at depth empty. Plain updates should preserve this state of affairs. If we report D as at its BASE revision, files at their BASE revisions, and subdirs at HEAD, then: - When new files appear in the repos, they'll get sent down (good) - When new subdirs appear, they'll get sent down in full (bad) But if we don't report subdirs as at HEAD, then the server will try to update them (bad). And if we report D at HEAD, then the working copy won't receive new files that have appeared in the repository since D's BASE revision (note that we *can* get updates for files we already have, though, by continuing to report them at their respective BASEs). The same logic applies to subdirectories at depth-files or depth-immediates. So, I think this means that for efficient depth handling, we'll need to have the client directly reporting the desired depth to the server; i.e., extending the RA protocol. Meanwhile, legacy servers will send back a bunch of information the client doesn't want, and the client will just ignore it, and everything will be slower than it needs to be, and people will complain on the users@ list, and we'll tell them to upgrade their servers, and they'll say they can't because they don't have control over the server, and we'll say "So? This ain't no Grand Hotel!" 5. Compatability Matters ======================== This feature introduces two new concepts into the RA protocol which will not be understood by older servers: * Reported Depths -- the depths associated with individual paths included by the client in the description (via the svn_ra_reporter_t) of its working copy state. * Requested Depth -- the single depth value used to limit the scope of the server's response to the client. As such, it's useful to understand how these concepts will be handled across the compatability matrix of depth-aware and non-depth-aware clients and servers. NOTE: in the sections below, it is not necessarily that case that a value or state which is said to be "transmitted" literally has a presence in the RA protocol. Some such bits of state have default values in the protocol and can therefore be effectively transmitted while not literally identifiable in a network trace of the client-server traffic. Depth-aware Clients (DACs) DACs will transmit reported depths (with "infinity" as the default) and will transmit a requested depth (with "unknown" as the default). They will also -- for the sake of older, non-depth-aware servers (NDASs) -- transmit a requested recurse value derived from the requested depth: depth recurse ----- ------- empty no files no unknown yes immediates yes infinity yes When speaking to an NDAS, the requested recurse value is the only thing the server understands , but is obviously more "grainy" than the requested depth concept. The DAC, therefore, must filter out any additional, unwanted data that the server transmits in its response. (This filtering will happen in the RA implementation itself so the RA APIs behave as expected regardless of the server's pedigree.) When speaking to a depth-aware server (DAS), the requested recurse value is ignored. A requested depth of "unknown" means "only send information about the stuff in my report, depth-aware-ily". Other requested depth values are honored by the server properly, and the DAC must handle the transformation of any working copy depths from their pre-update to their post-update depths and content as described in `3. Examples'. Non-depth-aware Clients (NDACs) NDACs will never transmit reported depths and never transmit a requested depth. But they will transmit a requested recurse value (either "yes" or "no", with "yes" being the default). (A DAS uses the presence of a requested depth in the actual protocol to distinguish DACs from NDACs, and knows to ignore the requested recurse value transmitted by a DAC.) When speaking to an NDAS, what happens happens. It's the past, man -- you don't get to define the interaction this late in the game! When speaking to a DAS, the not-reported depths are treated like reported depths of "infinity", and the reported recurse values "yes" and "no" map to depths of "infinity" and "files", respectively. 6. Current Status ================= The sparse-directories code is merged to trunk in revision r23994. A new enum type 'svn_depth_t depth' is defined in svn_types.h. Both client and server side now understand the concept of depth, and the basic update use cases handle depth. See depth_tests.py for what is known to be working. (Many cases are not yet tested, and almost certainly some of them will fail right now.) On the client side, most of the svn_client.h interfaces that formerly took 'svn_boolean_t recurse' now take 'svn_depth_t depth'. (The -N option is deprecated, but it still works: it simply maps to --depth=files, which results in the same behavior as -N used to.) Some of this recurse-becomes-depth change has propagated down into libsvn_wc, which now stores a depth field in svn_wc_entry_t (and therefore in .svn/entries). The update reporter knows to report differing depths to the server, in the same way it already reports differing revisions. In other words, take the concept of "mixed revision" working copies and extend it to "mixed depth" working copies. On the server side, most of the significant changes are in libsvn_repos/reporter.c. The code that receives update reports now receives notice of paths that have different depths from their parent, and of course the overall update operation has a global depth, which applies whenever not shadowed by some local depth for a given path. The RA code on both sides knows how to send and receive depths; the relevant svn_ra_* APIs now take depth arguments, which sometimes supersede older 'recurse' booleans. In these cases, the RA layer does the usual compatibility dance: receiving "recurse=FALSE" from an older client causes the server to behave as if "depth=files" had been transmitted. Work remaining: The list of outstanding issues is shown by this issue tracker query of Summary prefixed with [sparse-directories]: