Regression testing is an essential element of high quality software.
Unfortunately, some developers have not had first hand exposure to a
high quality testing framework. Lack of familiarity with the positive
effects of testing can be blamed for statements like:
"I don't need to test my code, I know it works."
It is safe to say that the idea that developers do not introduce bugs has been disproved.
The test suite will be used by both developers and end users.
Developers need a test suite to help with:
Fixing Bugs:
Each time a bug is fixed, a test case should be added to the test
suite. Creating a test case that reproduces a bug is a seemingly
obvious requirement. If a bug cannot be reproduced, there is no way to
be sure a given change will actually fix the problem. Once a test case
has been created, it can be used to validate the correctness of a
given patch. Adding a new test case for each bug also ensures that
the same bug will not be introduced again in the future.
Impact Analysis:
A developer fixing a bug or adding a new feature needs to know if a
given change breaks other parts of the code. It may seem obvious, but
keeping a developer from introducing new bugs is one of the primary
benefits of a using a regression test system.
Regression Analysis:
When a test regression occurs, a developer will need to manually
determine what has caused the failure. The test system is not able to
determine why a test case failed. The test system should simply report
exactly which test results changed and when the last results were
generated.
Users need a test suite to help with:
Building:
Building software can be a scary process. Users that have never built
software may be unwilling to try. Others may have tried to build a
piece of software in the past, only to be thwarted by a difficult
build process. Even if the build completed without an error, how can a
user be confident that the generated executable actually works? The
only workable solution to this problem is to provide an easily
accessible set of tests that the user can run after building.
Porting:
Often, users become porters when the need to run on a previously
unsupported system arises. This porting process typically require some
minor tweaking of include files. It is absolutely critical that
testing be available when porting since the primary developers may not
have any way to test changes submitted by someone doing a port.
Testing:
Different installations of the exact same OS can contain subtle
differences that cause software to operate incorrectly. Only testing
on different systems will expose problems of this nature. A test suite
can help identify these sorts of problems before a program is actually
put to use.
Functional requirements of an acceptable test suite include:
Unique Test Identifiers:
Each test case must have a globally unique test identifier, this
identifier is just a string. A globally unique string is
required so that test cases can be individually identified by
name, sorted, and even looked up on the web. It seems simple,
perhaps even blatantly obvious, but some other test packages
have failed to maintain uniqueness in test identifiers and
developers have suffered because of it. It is even desirable for
the system actively enforces this uniqueness requirement.
Exact Results:
A test case must have one expected result. If the result of
running the tests does not exactly match the expected result,
the test must fail.
Reproducible Results:
Test results should be reproducible. If a test result matches
the expected result, it should do so every time the test is
run. External factors like time stamps must not effect the
results of a test.
Self-Contained Tests:
Each test should be self-contained. Results for one test should
not depend on side effects of previous tests. This is obviously
a good practice, since one is able to understand everything a
test is doing without having to look at other tests. The test
system should also support random access so that a single test
or set of tests can be run. If a test is not self-contained, it
cannot be run in isolation.
Selective Execution:
It may not be possible to run a given set of tests on certain
systems. The suite must provide a means of selectively running
tests cases based on the environment. The test system must also
provide a way to selectively run a given test case or set of
test cases on a per invocation basis. It would be incredibly
tedious to run the entire suite to see the results for a single
test.
No Monitoring:
The tests must run from start to end without operator
intervention. Test results must be generated automatically. It
is critical that an operator not need to manually compare test
results to figure out which tests failed and which ones passed.
Automatic Logging of Results:
The system must store test results so that they can be compared
later. This applies to machine readable results as well as human
readable results. For example, assume we have a test named
client-1
, it expects a result of 1 but instead 0 is
returned by the test case. We should expect the system to store
two distinct pieces of information. First, that the test
failed. Second, how the test failed, meaning how the expected
result differed from the actual result.
This following example shows the kind of results we might record in a results log file.
client-1 FAILED
client-2 PASSED
client-3 PASSED
Automatic Recovery:
The test system must be able to recover from crashes and
unexpected delays. For example, a child process might go into a
infinite loop and would need to be killed. The test shell itself
might also crash or go into an infinite loop. In these cases,
the test run must automatically recover and continue with the
tests directly after the one that crashed.
This is critical for a couple of reasons. Nasty crashes and infinite loops most often appear on users (not developers) systems. Users are not well equipped to deal with these sorts of exceptional situations. It is unrealistic to expect that users will be able to manually recover from disaster and restart crashed test cases. It is an accomplishment just to get them to run the tests in the first place!
Ensuring that the test system actually runs each and every test is critical, since a failing test near the end of the suite might never be noticed if a crash halfway through kept all the tests from being run. This process must be completely automated, no operator intervention should be required.
Report Results Only:
When a regression is found, a developer will need to manually
determine the reason for the regression. The system should tell
the developer exactly what tests have failed, when the last set
of results were generated, and what the previous results
actually were. Any additional functionality is outside the
scope of the test system.
Platform Specific Results:
Each supported platform should have an associated set of test
results. The naive approach would be to maintain a single set of
results and compare the output for any platform to the known
results. The problem with this approach is that is does not
provide a way to keep track of when changes differ from one
platform to another. The following example attempts to clarify
with an example.
Assume you have the following tests results generated on a reference platform before and after a set of changes were committed.
Before (Reference Platform) | After (Reference Platform) |
client-1 PASSED |
client-1 PASSED |
client-2 PASSED |
client-2 FAILED |
It is clear that the change you made introduced a regression in
the client-2
test. The problem shows up when you
try to compare results generated from this modified code on some
other platform. For example, assume you got the following
results:
Before (Reference Platform) | After (Other Platform) |
client-1 PASSED |
client-1 FAILED |
client-2 PASSED |
client-2 PASSED |
Now things are not at all clear. We know that
client-1
is failing but we don't know if it is
related to the change we just made. We don't know if this test
failed the last time we ran the tests on this platform since we
only have results for the reference platform to compare to. We
might have fixed a bug in client-2
, or we might
have done nothing to effect it.
If we instead keep track of test results on a platform by platform basis, we can avoid much of this pain. It is easy to imagine how this problem could get considerably worse if there were 50 or 100 tests that behaved differently from one platform to the next.
Test Types:
The test suite should support two types of tests. The first
makes use of an external program like the svn client. These
kinds of tests will need to exec an external program and check
the output and exit status of the child process. Note that it
will not be possible to run this sort of test on Mac OS. The
second type of test will load Subversion shared libraries and
invoke methods in-process.
This provides the ability to do extensive testing of the various Subversion APIs without using the svn client. This also has the nice benefit that it will work on Mac OS, as well as Windows and Unix.
Developers will tend to avoid using a test suite if it is not easy to add new tests and maintain old ones. If developers are uninterested in using the test suite, it will quickly fall into disrepair and become a burden instead of an aide.
Users will simply avoid running the test suite if it is not extremely simple to use. A user should be able to build the software and then run:
% make check
This should run the test suite and provide a very high level set of results that include how many tests results have changed since the last run.
While this high level report is useful to developers, they will often need to examine results in more detail. The system should provide a means to manually examine results, compare output, invoke a debugger, and other sorts of low level operations.
The next example shows how a developer might run a specific subset of tests from the command line. The pattern given would be used to do a glob style match on the test case identifiers, and run any that matched.
% svntest "client-*"
The test suite should be packaged along with the source code instead of being made available as a separate download. This significantly simplifies the process of running tests since they are already incorporated into the build tree.
The test suite must support building and running inside and outside of the source directory. For example, a developer might want to run tests on both Solaris and Linux. The developer should be able to run the tests concurrently in two different build directories without having the tests interfere with each other.
As much as possible, the test suite should avoid depending on external programs or libraries. Of course, there is a nasty bootstrap problem with a test suite implemented in a scripting language. A wide variety of systems provide no support for modern scripting languages. We will avoid this issue for now and assume that the scripting language of choice is supported by the system.
For example, the test suite should not depend on CVS to generate test results. Many users will not have access to CVS on the system they want to test Subversion on.