What Needs Doing?

There are a lot of different things you can do to help Subversion. Not all involve coding; there are plenty of non-programming roles for eager volunteers.

Below are some of the needs we've identified, but please don't take these as gospel! New volunteers bring fresh viewpoints, and one of the most important things you can do is point out a need we hadn't recognized before — and then fill it.

Summer of Code Tasks

These are the ideas that the Subversion developers have for Summer of Code applicants. As the above states, don't take these as gospel! We welcome discussion on creative proposals.

However, please don't select tasks in the other sections of this page as your proposal, as these are either not the right size for the Summer of Code, or are downright not coding tasks, and therefore not eligible in any case.

You should also read the Hacker's Guide to Subversion before starting out on a proposal. Don't hesitate to ask for details or start discussing one of these tasks on the dev@subversion.tigris.org mailing list (see here for subscription information).

Issue #1144: Add full SASL authentication support to ra_svn (svn://)

Right now ra_svn only supports the ANONYMOUS, CRAM-MD5 and EXTERNAL authentication mechanisms. This is enough to be useful, but integrating a full-featured SASL library would give users a multitude of other options: passwords at various levels of security, Kerberos, one-time passwords, etc.. The protocol is already designed to support SASL, so the task requires selecting a SASL library and implementing support for it on both the client and server side.

Issue #525 and Issue #908: Optional or compressed text base storage

Subversion stores locally a pristine copy of the base revision (i.e., the unmodified checked-out revision) of each file in the working copy. These pristine copies are known as "text bases". This is great for doing offline diffs, and for transmitting deltas back to the server when committing. But it's a bit of a space penalty on the client side, and it would be nice to offer users the option to turn it off sometimes, or failing that, to compress the text bases.

This task is somewhat larger than the others (though still perfectly feasible), and should ideally be taken on by someone with previous experience in Subversion development.

Issue #2342: Improving the Python bindings

The Subversion Python bindings are currently incomplete in the functionality that they expose (for one example see the above issue). Furthermore, the Python bindings are currently extremely unpythonic in their structure, and could do with a layer of python code to make them so. The bindings should first be brought up to date and all functionnalities completely implemented, and second be wrapped in a set of Python classes implementing an interface more friendly to python developers.

Issue #2409: Operation and error logging for svnserve

Subversion 1.3.0 added support for operational logging in mod_dav_svn. That is, logging what actual client-side operations are performed on a repository, rather than just logging the endless flow of WebDAV requests. Support for this kind of logging, as well as error logging, needs to be added to the standalone svnserve daemon. This requires some work in the APR library, which would provide the actual logging facility.

Issue #695: Fix nonrecursive checkouts

It is possible to do a nonrecursive checkout in Subversion, using the -N option to the checkout command. However, the working copy has no proper knowledge of "I don't want this particular item", so many operations behave extremely strangely in the presence of a nonrecursive working copy. The working copy should be taught about nonrecursive checkouts, so that it is possible to work with them on a day-to-day basis.

This task is somewhat larger than the others (though still perfectly feasible), and should ideally be taken on by someone with previous experience in Subversion development.

Performance analysis tool for mod_dav_svn

In order to drive future development, we would like to know just how much a Subversion mod_dav_svn server can be pushed, and what capacity administrators can expect when deploying their Subversion servers. One option is to use the Apache Flood load tester. Test profiles will need to be written to simulate usual repository use across multiple users. Then, load testing needs to be conducted with the help of various profiling tools (oprofile, dtrace...) to identify the current limits and bottlenecks of Subversion. Bottlenecks may be in Apache HTTP Server, APR, mod_dav_svn (Subversion's Apache module), our repository backends (FSFS or BDB), or even the underlying operating system.

The final output of this task should be pretty graphs, explanations behind the graphs, tuning recommendations, and proposals to improve the performance of Subversion. Time permitting, some or all of these proposals would be fleshed out and implemented.

An augmented diff representation

Currently, 'svn diff' outputs a patch in the so-called "Unified diff" format. The problem is, this format is fairly old, and as such can represent only textual differences between files. It does not represent structural changes to the directory tree, nor can it encompass changes to various metadata, the kind used extremely frequently by Subversion.

Propose a solution for augmenting the standard unified diff format with extra "garbage" chunks of data, that will be ignored by a unified diff parser, but that can carry extra information about whatever we want. Then, implement that solution in 'svn diff', and a new subcommand ('svn patch' ?) that would do the opposite of 'svn diff' with rich diff data.

Note that SVK posesses such a rich unified diff format. It may or may not be desirable to reuse the same kind of representation; that decision is part of the task!

Also, note that recently some code from the Dawn of Days of the Subversion project has been revived. It implements an XML (de)serializer for editor drive data, which means it can spit out an XML representation of a set of changes, and can take that XML representation and use it to "patch" a pristing set of files. The other existing (de)serializer today is the one that is commonly known as the "dumpfile" format. These two existing implementations need to be taken into account during the analysis and design phase.

Information Management Tasks

These are non-coding tasks, so if you arrived here from the Google Summer of Code pages, please see the above section.

Issue Management

Do we need an Issue Manager? Maybe...

The Subversion bug database has been managed in a rather ad hoc fashion thus far. Periodically we make sweeps over all outstanding issues and try to prioritize them, organize them into scheduled milestones, note dependencies between issues, etc. These methods have been moderately successful up till now, but they are not scaling well as the number of issues grows. Since issue growth is proportional to user base growth, the issue tracker is becoming a victim of Subversion's success. We need to find new ways of managing our issues, ways that do not involve making O(N) sweeps over the entire list of open issues at regular intervals.

While we have some semi-formalized management roles (patch manager, release manager, etc), we have never had an issue manager. It might be time to get one, though. It's not yet clear whether the problem is mainly one of attention, or of algorithm, or both, but having someone dedicated to managing the issues database couldn't hurt. One thing such a person could do would be to go through the list of outstanding issues, figure out which ones are likely to be bite-sized tasks, and mark them as such, so that other volunteers have an easier time choosing things to work on. We've already marked various issues as bite-sized, but we haven't done so consistently as new issues come in. This means there are a lot of potential entry points to the project going unnoticed. Want to help us solve that?

Creative ideas welcome! If you'd like to help with this, please subscribe to the dev@subversion.tigris.org mailing list and post your thoughts.

FAQ Management

We need a FAQ manager. A FAQ manager is someone who stays subscribed to the users@subversion.tigris.org and dev@subversion.tigris.org mailing lists, watches for common questions or addenda to existing questions, and slowly adjusts the Subversion FAQ in response to the problems users are having "in the wild". This is also a great way to get familiar with Subversion usage patterns and common problems. If you use or administrate Subversion anyway, helping to manage the FAQ is a great way to expand your troubleshooting skills.

Again, creative ideas are most welcome. Please post to the dev@subversion.tigris.org mailing list if you're interested in this.

Bite-Sized Coding Tasks

The Subversion bug database contains many issues classified as "bite-sized" tasks — tasks that are well-defined and self-contained, and thus suitable for a volunteer looking to get involved with the project. You don't need broad or detailed knowledge of Subversion's design to take on one of these, just a pretty good idea of how things generally work, and familiarity with the coding guidelines in the Hacker's Guide to Subversion. Many tasks are things a volunteer could pick off in a spare week or two, and they're a great way to start learning your way around the Subversion code.

If you start one of these tasks, please notify the other developers by marking the issue as "STARTED" in the issue tracker, then mail dev@subversion.tigris.org (subscribe to that list too) with questions. Don't be shy, it's a very civil mailing list.

When you're ready to send in a patch, see the patch posting guidelines. Don't be discouraged if your patch goes through several iterations of review by other developers; this is normal.

Here is the list of all bite-sized tasks.

Larger (But Not Necessarily Huge) Coding Tasks

The tasks listed below are bigger than bite-sized, but probably don't require new research to solve. In other words, most of them are a Simple Matter Of Programming. You'd need to either be, or be willing to become, familiar with Subversion's internals to solve one of these.

As with the bite-sized tasks, please read the Hacker's Guide to Subversion and don't hesitate to ask questions on the users@subversion.tigris.org and dev@subversion.tigris.org mailing lists (see here for subscription information). Before posting any patches, see the patch posting guidelines.

For groups of tasks tied to specific releases, peruse the status page. For a longer-term view of Subversion's vision, see the road map.

Issue #1254, etc: Improve error messages

Too many of Subversion's error messages are terse or confusing. Many instances are recorded in issue #1254, but see also issues #2302, #2295, and #2275.

Improved Bindings to Other Languages

One of Subversion's strengths is that it offers a rich set of "binding surfaces": well-documented APIs that are available not only in C (Subversion's native language) but in other programming languages as well (see the complete list).

Some of these language bindings are maintained via SWIG, a tool that partially automates the process of generating bindings, while others are maintained by hand. Many of the bindings do not have complete coverage yet, or have interface problems where they do have coverage. So even though they're used in many production systems, there's still plenty of work to do. Specifically:

All Open Issues

You want to see the complete list of open bugs, in all its glory? Don't say we didn't warn you...