Design goals for the SVN test suite

Why Test?

Regression testing is an essential element of high quality software. Unfortunately, some developers have not had first hand exposure to a high quality testing framework. Lack of familiarity with the positive effects of testing can be blamed for statements like:

"I don't need to test my code, I know it works."

It is safe to say that the idea that developers do not introduce bugs has been disproved.

Audience

The test suite will be used by both developers and end users.

Developers need a test suite to help with:

Fixing Bugs:
Each time a bug is fixed, a test case should be added to the test suite. Creating a test case that reproduces a bug is a seemingly obvious requirement. If a bug cannot be reproduced, there is no way to be sure a given change will actually fix the problem. Once a test case has been created, it can be used to validate the correctness of a given patch. Adding a new test case for each bug also ensures that the same bug will not be introduced again in the future.

Impact Analysis:
A developer fixing a bug or adding a new feature needs to know if a given change breaks other parts of the code. It may seem obvious, but keeping a developer from introducing new bugs is one of the primary benefits of a using a regression test system.

Regression Analysis:
When a test regression occurs, a developer will need to manually determine what has caused the failure. The test system is not able to determine why a test case failed. The test system should simply report exactly which test results changed and when the last results were generated.

Users need a test suite to help with:

Building:
Building software can be a scary process. Users that have never built software may be unwilling to try. Others may have tried to build a piece of software in the past, only to be thwarted by a difficult build process. Even if the build completed without an error, how can a user be confident that the generated executable actually works? The only workable solution to this problem is to provide an easily accessible set of tests that the user can run after building.

Porting:
Often, users become porters when the need to run on a previously unsupported system arises. This porting process typically require some minor tweaking of include files. It is absolutely critical that testing be available when porting since the primary developers may not have any way to test changes submitted by someone doing a port.

Testing:
Different installations of the exact same OS can contain subtle differences that cause software to operate incorrectly. Only testing on different systems will expose problems of this nature. A test suite can help identify these sorts of problems before a program is actually put to use.

Requirements

Functional requirements of an acceptable test suite include:

Unique Test Identifiers:
Each test case must have a globally unique test identifier, this identifier is just a string. A globally unique string is required so that test cases can be individually identified by name, sorted, and even looked up on the web. It seems simple, perhaps even blatantly obvious, but some other test packages have failed to maintain uniqueness in test identifiers and developers have suffered because of it. It is even desirable for the system actively enforces this uniqueness requirement.

Exact Results:
A test case must have one expected result. If the result of running the tests does not exactly match the expected result, the test must fail.

Reproducible Results:
Test results should be reproducible. If a test result matches the expected result, it should do so every time the test is run. External factors like time stamps must not effect the results of a test.

Self-Contained Tests:
Each test should be self-contained. Results for one test should not depend on side effects of previous tests. This is obviously a good practice, since one is able to understand everything a test is doing without having to look at other tests. The test system should also support random access so that a single test or set of tests can be run. If a test is not self-contained, it cannot be run in isolation.

Selective Execution:
It may not be possible to run a given set of tests on certain systems. The suite must provide a means of selectively running tests cases based on the environment. The test system must also provide a way to selectively run a given test case or set of test cases on a per invocation basis. It would be incredibly tedious to run the entire suite to see the results for a single test.

No Monitoring:
The tests must run from start to end without operator intervention. Test results must be generated automatically. It is critical that an operator not need to manually compare test results to figure out which tests failed and which ones passed.

Automatic Logging of Results:
The system must store test results so that they can be compared later. This applies to machine readable results as well as human readable results. For example, assume we have a test named client-1, it expects a result of 1 but instead 0 is returned by the test case. We should expect the system to store two distinct pieces of information. First, that the test failed. Second, how the test failed, meaning how the expected result differed from the actual result.

This following example shows the kind of results we might record in a results log file.

   client-1 FAILED
   client-2 PASSED
   client-3 PASSED
    

Automatic Recovery:
The test system must be able to recover from crashes and unexpected delays. For example, a child process might go into a infinite loop and would need to be killed. The test shell itself might also crash or go into an infinite loop. In these cases, the test run must automatically recover and continue with the tests directly after the one that crashed.

This is critical for a couple of reasons. Nasty crashes and infinite loops most often appear on users (not developers) systems. Users are not well equipped to deal with these sorts of exceptional situations. It is unrealistic to expect that users will be able to manually recover from disaster and restart crashed test cases. It is an accomplishment just to get them to run the tests in the first place!

Ensuring that the test system actually runs each and every test is critical, since a failing test near the end of the suite might never be noticed if a crash halfway through kept all the tests from being run. This process must be completely automated, no operator intervention should be required.

Report Results Only:
When a regression is found, a developer will need to manually determine the reason for the regression. The system should tell the developer exactly what tests have failed, when the last set of results were generated, and what the previous results actually were. Any additional functionality is outside the scope of the test system.

Platform Specific Results:
Each supported platform should have an associated set of test results. The naive approach would be to maintain a single set of results and compare the output for any platform to the known results. The problem with this approach is that is does not provide a way to keep track of when changes differ from one platform to another. The following example attempts to clarify with an example.

Assume you have the following tests results generated on a reference platform before and after a set of changes were committed.

Before (Reference Platform) After (Reference Platform)
client-1 PASSED client-1 PASSED
client-2 PASSED client-2 FAILED

It is clear that the change you made introduced a regression in the client-2 test. The problem shows up when you try to compare results generated from this modified code on some other platform. For example, assume you got the following results:

Before (Reference Platform) After (Other Platform)
client-1 PASSED client-1 FAILED
client-2 PASSED client-2 PASSED

Now things are not at all clear. We know that client-1 is failing but we don't know if it is related to the change we just made. We don't know if this test failed the last time we ran the tests on this platform since we only have results for the reference platform to compare to. We might have fixed a bug in client-2, or we might have done nothing to effect it.

If we instead keep track of test results on a platform by platform basis, we can avoid much of this pain. It is easy to imagine how this problem could get considerably worse if there were 50 or 100 tests that behaved differently from one platform to the next.

Test Types:
The test suite should support two types of tests. The first makes use of an external program like the svn client. These kinds of tests will need to exec an external program and check the output and exit status of the child process. Note that it will not be possible to run this sort of test on Mac OS. The second type of test will load subversion shared libraries and invoke methods in-process.

This provides the ability to do extensive testing of the various subversion APIs without using the svn client. This also has the nice benefit that it will work on Mac OS, as well as Windows and Unix.

Ease of Use

Developers will tend to avoid using a test suite if it is not easy to add new tests and maintain old ones. If developers are uninterested in using the test suite, it will quickly fall into disrepair and become a burden instead of an aide.

Users will simply avoid running the test suite if it is not extremely simple to use. A user should be able to build the software and then run:

% make check

This should run the test suite and provide a very high level set of results that include how many tests results have changed since the last run.

While this high level report is useful to developers, they will often need to examine results in more detail. The system should provide a means to manually examine results, compare output, invoke a debugger, and other sorts of low level operations.

The next example shows how a developer might run a specific subset of tests from the command line. The pattern given would be used to do a glob style match on the test case identifiers, and run any that matched.

% svntest "client-*"

Location

The test suite should be packaged along with the source code instead of being made available as a separate download. This significantly simplifies the process of running tests since they are already incorporated into the build tree.

The test suite must support building and running inside and outside of the source directory. For example, a developer might want to run tests on both Solaris and Linux. The developer should be able to run the tests concurrently in two different build directories without having the tests interfere with each other.

External program dependencies

As much as possible, the test suite should avoid depending on external programs or libraries. Of course, there is a nasty bootstrap problem with a test suite implemented in a scripting language. A wide variety of systems provide no support for modern scripting languages. We will avoid this issue for now and assume that the scripting language of choice is supported by the system.

For example, the test suite should not depend on CVS to generate test results. Many users will not have access to CVS on the system they want to test subversion on.