Post-commit policies details
A post-commit test failure means that there is a bug in the code. The longer the bug exists, the harder it is to fix it due to ongoing code contributions. As a result, we want to fix bugs quickly. The Unomi community's post-commit test policies help keep our code and test results in a good state.
Rollback first
Unomi uses a "rollback first" approach: the first action to resolve a test failure is to rollback the culprit code change. The two main benefits of this approach are short implementation time and high reliability. When we rollback first, we quickly return to a previously verified good state.
At a high level, this approach consists of the following steps:
- Revert the culprit commit.
- Re-run the post-commit tests to verify the tests pass.
- Push the revert commit.
A failing test is a critical/P1 issue
It is difficult to properly verify new changes made on top of buggy code. In some cases, adding additional code can make the problem worse. To avoid this situation, fixing failing tests is our highest priority.
A flaky test is a critical/P1 issue
Flaky tests are considered failing tests, and fixing a flaky test is a critical/P1 issue.
Flaky tests are tests that randomly succeed or fail while using the same code version. Flaky test failures are one of the most dangerous types of failures because they are easy to ignore – another run of the flaky test might pass successfully. However, these failures can hide real bugs and flaky tests often slowly accumulate. Someone must repeatedly triage the failures, and flaky tests are often the hardest ones to fix.
Flaky tests do not provide a reliable quality signal, so it is important to quickly fix the flakiness. If a fix will take awhile to implement, it is safer to disable the test until the fix is ready.
Martin Fowler has a good article about non-determinism in tests.
Flaky tests must be fixed or removed
Flaky tests do not provide a reliable quality signal, which has a harmful effect on all tests and can lead to a loss of trust in our test suite. As a result, contributors might start to ignore test failures.
We want everyone to trust our tests, so it is important to diligently fix all laky tests. If it is not possible to fix a flaky test, we must remove the test.
Add new pre-commit tests as part of a post-commit fix
Post-commit tests are an important fail-safe, but we want to fail fast. Failing fast means that we want to detect bugs in pre-commit tests, and not in post-commit tests.
When you implement a fix for a post-commit test failure, add a new pre-commit test that will detect similar failures in the future. For example, you can implement a new unit test that covers a problematic code branch.