-*-text-*- If you are contributing code to the Subversion project, please read this first. ============================ HACKER'S GUIDE TO SUBVERSION ============================ Last updated: $Date: 2001/08/31 01:02:57 $ TABLE OF CONTENTS * Participating in the community * What to read * Building from a working copy * Building on Win32 * Coding style * Using page breaks * Other conventions * Writing log entries * Generating changelogs * Automated tests * Writing test cases before code * APR status codes Participating in the community ============================== Although Subversion is originally sponsored and hosted by Collabnet (http://www.collab.net), it's a true open-source project under a BSD-style license. A number of developers work for Collabnet, some work for other large companies (such as RedHat), and many others are simply excellent volunteers who are interested in building a better version control system. The community exists mainly through mailing lists and a Subversion repository: * Go to http://subversion.tigris.org and * Join the "dev", "cvs", and "announce" mailing lists. The dev list, dev@subversion.tigris.org, is where almost all discussion takes place. All questions should go there, though you might want to check the list archives first. * Print out and digest the Spec. (The postscript might look better than the PDF.) The Spec will give you a theoretical overview of Subversion's design. If you'd like to contribute code, then look at: * The bugs/issues database http://subversion.tigris.org/servlets/ProjectIssues * The bite-sized tasks page http://subversion.tigris.org/project_tasks.html To submit code, simply send your patches to dev@subversion.tigris.org. No, wait, first read the rest of this file, _then_ start sending patches to dev@subversion.tigris.org. :-) After someone has contributed a few non-trivial patches, some current committer (usually the one who has reviewed and applied the patches) proposes that person for commit access. This proposal is sent only to the other full committers -- the ensuing discussion is private, so that everyone can feel comfortable speaking their minds. Assuming there are no objections, the contributor is granted commit access. The decision is made by consensus; there are no formal rules governing the procedure. What to read ============ Before you can contribute code, you'll need to familiarize yourself with the existing codebase and interfaces. Check out a copy of Subversion (anonymously, if you don't yet have an account with commit-access) -- so you can look at the codebase. Within subversion/include/ are a bunch of header files with huge doc comments. If you read through these, you'll have a pretty good understanding of the implementation details. Here's a suggested perusal order: * the basic building blocks: svn_string.h, svn_error.h, svn_types.h * useful utilities: svn_io.h, svn_path.h, svn_hash.h, svn_xml.h * the critical interface: svn_delta.h * client-side interfaces: svn_ra.h, svn_wc.h, svn_client.h * the repository filesystem: svn_fs.h Subversion tries to stay portable by using only ANSI/ISO C and by using the Apache Portable Runtime (APR) library. APR is the portability layer used by the Apache httpd server, and more information can be found at http://apr.apache.org. Because Subversion depends so heavily on APR, it may be hard to understand Subversion without first glancing over certain header files in APR (look in apr/include/): * memory pools: apr_pools.h * filesystem access: apr_file_io.h * hashes and arrays: apr_hash.h, apr_tables.h Building from a working copy ============================ Unlike a packaged distribution, the Subversion working tree does not contain a `configure' script nor any other of the generated files normally used in configuration and building. You have to regenerate them inside your working copy first, then configure, and then build. So, first run $ chmod +x autogen.sh ## needed until Subversion versions file mode bits $ ./autogen.sh which does everything necessary to prepare for the configuration step. You are expected to have the `autoheader' and `autoconf' tools installed. autogen.sh will check for the source code of external packages required to build Subversion (such as APR and Neon), and tell you how to get them if you don't have them. Next, do $ ./configure [...possibly with options...] $ make $ make check $ make install Skip the last step if don't want to blow away your previous Subversion installation. Also, you may wish to run "./configure" with some options; see the end of autogen.sh's output for some hints about this. Greg Stein recently switched Subversion from using `automake' and recursive Makefiles to using a single, top-level Makefile. The one Makefile is generated from Makefile.in, which is kept under revision control. The new system probably still needs some tweaks; here is Greg's mail about it: From: Greg Stein Subject: new build system (was: Re: CVS update: MODIFIED: ac-helpers ...) To: dev@subversion.tigris.org Date: Thu, 24 May 2001 07:20:55 -0700 Message-ID: <20010524072055.F5402@lyra.org> On Thu, May 24, 2001 at 01:40:17PM -0000, gstein@tigris.org wrote: > User: gstein > Date: 01/05/24 06:40:17 > > Modified: ac-helpers .cvsignore svn-apache.m4 > Added: . Makefile.in > Log: > Switch over to the new non-recursive build system. >... Okay... this is it. We're now on the build system. "It works on my machine." I suspect there may be some tweaks to make on differents OSs. I'd be interested to hear if Ben can really build with normal BSD make. It should be possible. The code supports building, installation, checking, and dependencies. It does *NOT* yet deal with the doc/ subdirectory. That is next; I figured this could be rolled out and get the kinks worked out while I do the doc/ stuff. Oh, it doesn't build Neon or APR yet either. I also saw a problem where libsvn_fs wasn't getting built before linking one of the test proggies (see below). Basic operation: same as before. $ ./autogen.sh $ ./configure OPTIONS $ make $ make check $ make install There are some "make check" scripts that need to be fixed up. That'll happen RSN. Some of them create their own log, rather than spewing to stdout (where the top-level make will place the output into [TOP]/tests.log). The old Makefile.am files are still around, but I'll be tossing those along with a bunch of tweaks to all the .cvsignore files. There are a few other cleanups, too. But that can happen as a step two. [ $ cvs rm -f `find . -name Makefile.rm` See the mistake in that line? I didn't when I typed it. The find returned nothing, so cvs rm -f proceeded to delete my entire tree. And the -f made sure to delete all my source files, too. Good fugging thing that I had my mods in some Emacs buffers, or I'd be bitching. I am *so* glad that Ben coded SVN to *not* delete locally modified files *and* that we have an "undel" command. I had to go and tweak a bazillion Entries files to undo the delete... ] The top-level make has a number of shortcuts in it (well, actually in build-outputs.mk): $ make subversion/libsvn_fs/libsvn_fs.la or $ make libsvn_fs The two are the same. So... when your test proggie fails to link because libsvn_fs isn't around, just run "make libsvn_fs" to build it immediately, then go back to the regular "make". Note that the system still conditionally builds the FS stuff based on whether DB (See 'Building on Unix' below) is available, and mod_dav_svn if Apache is available. Handy hint: if you don't like dependencies, then you can do: $ ./autogen.sh -s That will skip the dependency generation that goes into build-outputs.mk. It makes the script run quite a bit faster (48 secs vs 2 secs on my poor little Pentium 120). Note that if you change build.conf, you can simply run: $ ./gen-make.py build.conf to regen build-outputs.mk. You don't have to go back through the whole autogen.sh / configure process. You should also note that autogen.sh and configure run much faster now that we don't have the automake crap. Oh, and our makefiles never re-run configure on you out of the blue (gawd, I hated when automake did that to me). Obviously, there are going to be some tweaky things going on. I also think that the "shadow" builds or whatever they're called (different source and build dirs) are totally broken. Something tweaky will have to happen there. But, thankfully, we only have one Makefile to deal with. Note that I arrange things so that we have one generated file (build-outputs.mk), and one autoconf-generated file (Makefile from .in). I also tried to shove as much logic/rules into Makefile.in. Keeping build-outputs.mk devoid of rules (thus, implying gen-make.py devoid of rules in its output generation) manes that tweaking rules in Makefile.in is much more approachable to people. I think that is about it. Send problems to the dev@ list and/or feel free to dig in and fix them yourself. My next steps are mostly cleanup. After that, I'm going to toss out our use of libtool and rely on APR's libtool setup (no need for us to replicate what APR already did). Cheers, -g -- Greg Stein, http://www.lyra.org/ Building on Unix ================ Note that if you have Berkeley DB installed and you try running configure with the --with-berkeley-db=/path/to/BerkeleyDB.3.3 option, you have to make sure that your linker knows where the Berkeley DB libs are: kfogel says: After you install Berkeley, say in /usr/local/BerkeleyDB.3.3, you may need to modify /etc/ld.so.conf or /etc/rc.conf (those seem to be Linux and FreeBSD, respectively; Your Mileage May Vary), and run `ldconfig' or whatever your system wants. The problem is that configure tries to build, link, and run a small program against Berkeley DB. If the system loader doesn't yet know how to do that, configure will claim it can't find Berkeley DB. Building on Win32 ================= There is some support for building Subversion on Win32 platforms. The project files included in the source tree are from Microsoft Visual C++ 6.x; earlier versions of the compiler are not supported at this time. To build the client components, you'll need a copy of neon 0.17.2. The sources are available at http://www.webdav.org/neon/neon-0.17.2.tar.gz Unpack the distribution into the root directory of the Subversion source tree and rename the directory neon-0.17.0 to neon. The MSVC project files in this archive are not up-to-date, so building the neon libraries is not straight-forward. Pre-built neon libraries are available at http://www.xbc.nu/svn/libneon-0.17.2-svn-win32.zip Unpack the archive into the /neon directory. ( is the root of the source tree, the directory where you found the file you're reading.) You'll get the pre-built libneon.lib and libneonD.lib, as well as the modified project files used to build them, the config.hw file and the complete (unmodified) neon/src directory. IMPORTANT NOTE: These neon libraries should *only* be used when building Subversion. They were build to depend on the verson of expat-lite in the Subversion sources. If you want to rebuild the neon libraries using the project files in this package, you'll have to set the NEON_EXPAT_LITE_WIN32 environment variable to /expat-lite before starting Visual Studio, so that neon can find the expat-lite includes. If you want to build the server components, you'll also need a copy of Berkeley DB, version 3.3.11 or newer. The sources are available at http://www.sleepycat.com. There is a binary distibution at http://www.xbc.nu/svn/db-3.3.11-win32.zip Unpack the distribution into the root directory of the Subversion source tree and rename the directory db-3.3.11-win32 to db3-win32. It's a good idea to add \db3-win32\bin to your PATH, so that Subversion can find the Berkeley DB DLLs. If you build Berkeley DB from the source, you will have to copy the file \build_win32\db.h to \db3-win32\include, and all the import libraries to \db3-win32\lib. Again, the DLLs should be somewhere in your path. The workspace `subversion.dsw' at the top of the source tree includes all the necessary projects. Right now, only static libraries are built. The "__build__" project (active by default) builds all the libraries and programs. The "__check__" project builds the test drivers. You will have to edit the file svn_private_config.hw to set the correct paths for diff and patch. NOTE: There have been rumours that Subversion on Win32 can be build using the latest cygwin. ymmv. Coding Style ============ To understand how things work, read doc/svn-design.{texi,info,ps,pdf}, and read the header files, which tend to have thoroughly-commented data structures. We're using ANSI C, and following the GNU coding standards. Emacs users can just load svn-dev.el to get the right indentation behavior (most source files here will load it automatically, if `enable-local-eval' is set appropriately). Read http://www.gnu.org/prep/standards.html for a full description of the GNU coding standards; but here is a short example demonstrating the most important formatting guidelines: char * /* func type on own line */ argblarg (char *arg1, int arg2) /* func name on own line */ { /* first brace on own line */ if ((some_very_long_condition && arg2) /* indent 2 cols */ || remaining_condition) /* new line before operator */ { /* brace on own line, indent 2 */ arg1 = some_func (arg1, arg2); /* space before opening paren */ } /* close brace on own line */ else { do /* format do-while like this */ { arg1 = another_func (arg1); } while (*arg1); } } In general, be generous with parentheses even when you're sure about the operator precedence, and be willing to add spaces and newlines to avoid "code crunch". Don't worry too much about vertical density; it's more important to make code readable than to fit that extra line on the screen. The controversial GNU convention of putting a space between a function name and its opening paren is optional in Subversion. If you're editing an area of code that already seems to have a consistent preference about this, then just stick with that; otherwise, pick whichever way you like. Using Page Breaks ================= We're using page breaks (the Ctrl-L character, ASCII 12) for section boundaries in both code and plaintext prose files. This file is a good example of how it's done: each section starts with a page break, and the immediately after the page break comes the title of the section. This helps out people who use the Emacs page commands, such as `pages-directory' and `narrow-to-page'. Such people are not as scarce as you might think, and if you'd like to become one of them, then type C-x C-p C-h in Emacs sometime. Other Conventions: ================== In addition to the GNU standards, Subversion uses these conventions: * Use only spaces for indenting code, never tabs. Tab display width is not standardized enough, and anyway it's easier to manually adjust indentation that uses spaces. * Stay within 80 columns, the width of a minimal standard display window. * Signify internal variables by two underscores after the prefix. That is, when a symbol must (for technical reasons) reside in the global namespace despite not being part of a published interface, then use two underscores following the module prefix. For example: svn_fs_get_rev_prop () /* Part of published API. */ svn_fs__parse_props () /* For internal use only. */ * Put this comment at the bottom of new source files to make Emacs automatically load svn-dev.el: /* * local variables: * eval: (load-file "../svn-dev.el") * end: */ (This assumes the C file is located in a subdirectory of subversion/subversion/, which most are.) * We have a tradition of not marking files with the names of individual authors (i.e., we don't put lines like "Author: foo" or "@author foo" in a special position at the top of a source file). This is to discourage territoriality -- even when a file has only one author, we want to make sure others feel free to make changes. People might be unnecessarily hesitant if someone appears to have staked ownership on the file. * There are many other unspoken conventions maintained througout the code, that are only noticed when someone unintentionally fails to follow them. Just try to have a sensitive eye for the way things are done, and when in doubt, ask. Writing Log Entries =================== Certain guidelines should be adhered to when writing log messages: Make a log entry for every change. The value of the log becomes much less if developers cannot rely on its completeness. Even if you've only changed comments, write an entry that says, "Doc fix." The only changes you needn't log are small changes that have no effect on the source, like formatting tweaks. Log entries should be full sentences, not sentence fragments. Fragments are more often ambiguous, and it takes only a few more seconds to write out what you mean. Fragments like `New file' or `New function' are acceptable, because they are standard idioms, and all further details should appear in the source code. The log entry should name every affected function, variable, macro, makefile target, grammar rule, etc, including the names of symbols that are being removed in this commit. This helps people do automated searches through the logs later. Don't hide names in wildcards, because the globbed portion may be what someone searches for later. For example, this is bad: (twirling_baton_*): removed these obsolete structures. (handle_parser_warning): pass data directly to callees, instead of storing in twirling_baton_*. Later on, when someone is trying to figure out what happened to `twirling_baton_fast', they may not find it if they just search for "fast". A better entry would be: (twirling_baton_fast, twirling_baton_slow): removed these obsolete structures. (handle_parser_warning): pass data directly to callees, instead of storing in twirling_baton_*. The wildcard is okay in the description for `handle_parser_warning', but only because the two structures were mentioned by full name elsewhere in the log entry. There are some common-sense exceptions to the need to name everything that was changed: * If you have made a change which requires trivial changes throughout the rest of the program (e.g., renaming a variable), you needn't name all the functions affected. * If you have rewritten a file completely, the reader understands that everything in it has changed, so your log entry may simply give the file name, and say "Rewritten". In general, there is a tension between making entries easy to find by searching for identifiers, and wasting time or producing unreadable entries by being exhaustive. Use your best judgement --- and be considerate of your fellow developers. For large changes or change groups, group the log entry into paragraphs separated by blank lines. Each paragraph should be a set of changes that accomplishes a single goal. Independent changes should be in separate paragraphs. It helps to start out each group with a sentence or two summarizing the change. One should never need the log entries to understand the current code. If you find yourself writing a significant explanation in the log, you should consider carefully whether your text doesn't actually belong in a comment, alongside the code it explains. Here's an example of doing it right: (consume_count): If `count' is unreasonable, return 0 and don't advance input pointer. And then, in `consume_count' in `cplus-dem.c': while (isdigit ((unsigned char)**type)) { count *= 10; count += **type - '0'; /* A sanity check. Otherwise a symbol like `_Utf390_1__1_9223372036854775807__9223372036854775' can cause this function to return a negative value. In this case we just consume until the end of the string. */ if (count > strlen (*type)) { *type = save; return 0; } This is why a new function, for example, needs only a log entry saying "New Function" --- all the details should be in the source. These guidelines are paraphrased from Jim Blandy's excellent essay "Maintaining the ChangeLog". It is in `doc/WritingChangeLogs.txt'. Generating ChangeLogs ===================== Subversion does not keep ChangeLog files, because they're redundant with the CVS log entries. But ChangeLog is an easier format to browse, so it's often handy to generate a ChangeLog from the cvs log data. You can do so with the script `cvs2cl.pl', from: http://www.red-bean.com/cvs2cl/ If you've never used it before, try invoking it like this: cd subversion/subversion cvs2cl.pl --fsf -r That will produce a ChangeLog in the current directory (the "-r" flag says include revision numbers, and "--fsf" means do auto-wrapping in a way friendly to log entries written in GNU ChangeLog style). Run "cvs2cl.pl --help" for more information. Automated Tests: ================ For a description of how to use and add tests to Subversion's automated test framework, please read subversion/tests/README. Writing test cases before code: =============================== From: Karl Fogel Subject: writing test cases To: dev@subversion.tigris.org Date: Mon, 5 Mar 2001 15:58:46 -0600 Many of us implementing the filesystem interface have now gotten into the habit of writing the test cases (see fs-test.c) *before* writing the actual code. It's really helping us out a lot -- for one thing, it forces one to define the task precisely in advance, and also it speedily reveals the bugs in one's first try (and second, and third...). I'd like to recommend this practice to everyone. If you're implementing an interface, or adding an entirely new feature, or even just fixing a bug, a test for it is a good idea. And if you're going to write the test anyway, you might as well write it first. :-) Yoshiki Hayashi's been sending test cases with all his patches lately, which is what inspired me to write this mail to encourage everyone to do the same. Having those test cases makes patches easier to examine, because they show the patch's purpose very clearly. It's like having a second log message, one whose accuracy is verified at run-time. That said, I don't think we want a rigid policy about this, at least not yet. If you encounter a bug somewhere in the code, but you only have time to write a patch with no test case, that's okay -- having the patch is still useful; someone else can write the test case. As Subversion gets more complex, though, the automated test suite gets more crucial, so let's all get in the habit of using it early. -K APR Status Codes: ================= Always check for APR status codes (except APR_SUCCESS) with the APR_STATUS_IS_...() macros, not by direct comparison. This is required for portability to non-Unix platforms. Compiling mod_dav_svn: ====================== To compile the mod_dav_svn module, the configure script needs to be pointed at an Apache 2.0 source tree or installation area. Without this, mod_dav_svn will be skipped in the normal build process. Note that you will need at *least* Apache 2.0a9 for this, and preferably the latest CVS version. To fetch the Apache 2.0 CVS tree, you can use: $ cvs -d :pserver:anoncvs@cvs.apache.org:/home/cvspublic login (password 'anoncvs') $ cvs -d :pserver:anoncvs@cvs.apache.org:/home/cvspublic co httpd-2.0 When Apache 2.0 is installed, it creates a Perl script named "apxs" which is used by external (third-party) modules to discover compilation, link, and directory information about the installed Apache. apxs is also used to install third-party modules into the Apache install directories. The apxs script is the key for the configure process. By default, configure will automatically look in /usr/sbin and /usr/local/apache/bin for the apxs script. If it isn't found, then you must use the --with-apxs=FILE switch to say where it is located (or plain --with-apxs to look in the current PATH). For example: $ ./configure --with-apxs=/usr/local/apache2/bin/apxs The configure script will verify that apxs actually refers to an Apache 2.0 installation, and then gather the necessary parameters. It will then enable the build and installation of mod_dav_svn. When building mod_dav_svn for an install Apache, you MUST build it as a dynamic library. This means that you cannot pass --disable-shared to the configure script. An alternative is to work with an Apache 2.0 source directory. There is no default for this, so you must specify the directory explicity using the --with-apache switch. For example: $ ./configure --with-apache=/home/gstein/src/httpd-2.0 The configure script will gather up the right information and enable the build/install of mod_dav_svn. In this form, the module will be built as a static library and then linked into the Apache executable. Therefore, you cannot pass --disable-static to the configure script. Once mod_dav_svn is actually built, it must be installed. For the apxs style (working against an installed Apache), the apxs script will be used to perform the actual installation of the module into the Apache install tree. apxs will also tweak your httpd.conf file to add a LoadModule directive. With this style, you can change, build and install mod_dav_svn without any changes to Apache itself (since it is already installed). As a result, this is the preferable form, and will be the "standard" install when Subversion ships. If you are building directly against a source tree (statically), then the install will copy mod_dav_svn into the tree along with a number of files for Apache's config/build system. After each change and build of mod_dav_svn, it must be installed and Apache needs to be relinked to pick up the changes. You will also need to put an AddModule line into your httpd.conf file (see below). Since mod_dav_svn links against libsvn_subr and libsvn_fs, these must be built and INSTALLED for mod_dav_svn to work. Nothing special is done for apxs-style builds, which generally means the two libraries will be placed into /usr/local/lib/. If you are building a static library for direct linking into Apache, then the libsvn_subr and libsvn_fs targets will also place copies of the static (.a) libraries into the Apache source tree. After mod_dav_svn has been installed (or linked into Apache), then your httpd.conf file must be updated to get Apache to use the module. This is actually quite simple to do. Here is an example: DAV svn SVNPath /home/gstein/dav/svnrepos This tells Apache about a location named /svn/repos, that it is DAV-enabled and should use the "svn" DAV back end, and that the repository for this location is in the /home/gstein/dav/svnrepos directory. The net effect is that you now have a Subversion server located at: http://some.host.name/svn/repos Further config magic and usage is an exercise for the reader.