In software development, it is common practice to have a suite of unit
tests which can be run automatically against the program under
development.
In the following, I’ll introduce (or quite possibly reinvent) a
way of improving the quality of a test suite.
How well the tests check the program is difficult to quantify.  One
common way is to use a coverage measuring tool; such a coverage
tool then monitors the test execution and records which of the
program’s code paths are exercised and which aren’t.
However, while a good test suite yields good coverage of the program,
the reverse is not necessarily true: a test suite may give good program
coverage yet not check the program’s behaviour very well.
For instance, a test suite which runs the program with a lot of
different inputs, but never perform any checks on the output of the
program, might give good coverage without finding any bugs.
In a less pathetic example, one unit test may exercise a code path,
and another test may contain the check which would expose a bug in that
code path — this, too, would exhibit good program coverage, yet not
catch the bug.
Another danger when writing a unit (or other) test is that it doesn’t test
what it’s supposed to test.
The important assertions may be defunct for some reason or other;
sometimes this lack is caught, sometimes not.
In this respect, the test is not different from any other program:
somtimes it doesn’t do what it is supposed to do.
Except the risk is in some respects greater for test code: if it
doesn’t work as expected, there’s a significant probability that the
result will just be a green lamp, with no-one being the wiser.
A different, but related problem with test code is that of
maintenance.  As the requirements, structure, and maintainers of a
program change, the original intent of the code may be muddled
somewhat, making it less obvious what the test is supposed to check,
and whether those checks remain relevant.
So — who guards the guards? Who tests the tests — keeps the checkers in
check?
For the main program, both failure to behave as intended, and
ceasing at some point to do it, is often caught by the test suite.
(That’s the idea, anyway.)  For tests, on the other hand,
either kind of failure may go unnoticed.
Part of the problem is in essence one of coverage — not of the
program, which is commonly measured, but of the test code, which seldom is.
It is, indeed, a common feature of test code, by its very nature, that certain
of its code paths are seldom if ever exercised.
Can you test the tests, then? Can you exercise them more thoroughly?
Test-driven development (TDD) is a partial answer to this problem:
in the TDD methodology, tests are written to first fail, reflecting
the non-compliance of the code, then the code is changed to make the
test succeed.  That means that the error-path of the test is taken at
least once; definitely an improvement.
Here, though, I want to go a step further — and aim for continual
testing of the tests.
One possibility is to see unit tests and programs as adversaries — as duals of each other — keeping each other in check.
That approach appears to have some advantages.
Following the testing principle from "The Art of Software Testing":
"A programmer should avoid attempting to test his or her own program",
I’ll discuss it as if program writer and test writer are different
people — but this doesn’t mean that it’ll work less well if the
program writer is also the tester.
If the normal unit test writing method is an asymmetric game where
the test writer tries to second what the program writer might
have missed, and the program writer is simply doing his job, — we
could conceivably aim for a more symmetric situation of both parties
writing checks of the other’s work.
In that kind of game, this means that the program writer
supplies more than one implementation, of which all but one are known
to contain a bug — realistically, this would be the actual program
and a set of variations on it.
For the test writer, this means supplying a set of unit tests as usual — except that now that set can be failed, deemed inadequate!
The test writer gets a chance to lose, too.
Supposing this is a novel idea (although probably it isn’t), let’s
name it "Contra-testing".
(Update: The idea if using intentional bug introduction as part of
the testing process is indeed far from new. Both "Bebugging" and
"Mutation Testing" (two techniques dating back to the 1970s) are about
exactly that. More on that later.)
Also, let’s have some illustrative examples.
I’ll go with a couple of classics — as far as the programs are concerned.
The test cases probably won’t be.
Notation
We’ll need some notation for specifying faulty variations of the program.
There are several possibilities; here I’ll go with a pure Java API:
from the program writer’s point of view, it consists of a
static method ContraTest.bug(String bugName) which under normal
conditions returns false, but for the contra-test specified by bugName
returns true.
 
 
Example 1 — factorial
Consider the well-known fibonacci function, as written by our friend
Petra the Program Writer (and, also, written in Java):
public static long fibonacci(int n) {
    long a=0, b=1;
    for (int i=0; i<n; i++) {
        long tmp=a; a=b; b=tmp+b;
    }
    return a;
}
 
 
What challenges might Petra set for Terry the Test Writer?
What might she suspect that he won’t think of testing?
Well, apart from border cases and so on, one thing on her mind — one
concern which is present for her as she writes the code, and which
Terry might not think of — is that there are different algorithms to
choose from, with very different performance characteristics.
Thus, one suitable contra-test would be this:
public static long fibonacci(int n) {
    if (ContraTest.bug("slow factorial")) {
        return (n==0) ? 0
             : (n==1) ? 1
             : fibonacci(n-1) + fibonacci(n-2);
    ...the real code of the function...
 
 
This contra-test ensures that if Terry only tests the function with very small inputs,
that’ll be caught.  So he’ll include one.  Which again means that if,
later on, some other developer steps in thinking, "why use iteration?
Surely the straight-forward recursive version is better," that
regression will be caught.
Also, possibly, he’ll be more inclined to write tests for algorithmic
complexity in the future.
The example may not be so realistic (there are faily few uses of the
fibonacci function in everyday software projects), but it illustrates
the general principles well enough.
 
 
Example 2 — accounts
Another example: the classic "account" example illustrating race conditions.
Here is Petra’s code, including a contra-test:
public class Accounts {
    public boolean transfer(String fromAccountID, String toAccountID, long amount) {
        synchronized (ContraTest.bug("no locking in transfer")
                      ? new Object() : this) {
            if (withdraw(fromAccountID, amount)) {
                deposit(toAccountID, amount);
                return true;
            } else return false;
        }
    }
    ...
}
 
 
This is another example of a contra-test of a non-functional but crucial
aspect of the implementation — here of whether the Accounts.transfer()
method is properly synchronized.
I conjecture that while this is a quite important aspect, few unit test writers
would under normal circumstances include a test for it.
With contra testing, the odds might improve.
 
 
Example 3 — hash set
The examples so far happen to have lead to non-functional tests.
For a functional-test example, here’s a Set implementation Petra has written:
public class MySet {
   private Element[] hash_table;
   private static class Element {...}
   ...
   public boolean contains(Object candidate) {
       // Compute hash:
       long candHash = element.hashCode();
       // Determine the bucket to search:
       int bucket = Math.abs(candHash) % hashTable.length;
       // Walk the bucket's chain:
       for (Element element = hash_table[bucket];
            element != null;
            element = element.next)
       {
           if (element.hash != candHash) continue; // This wasn't it.
           if (ContraTest.bug("contains: no equals check") ||
               candidate.equals(element.value))
           {
               return true; // Candidate is present.
           }
       }
       return false; // Candidate is absent.
   }
}
 
 
Here again, Petra recognizes — with her white-box knowledge — a mistake
a developer could easily make, and which the tests should consequently
check for: Two key objects can have the same hash code yet be different.
 
 
Back from examples
I’ve illustrated the idea with two developers; in practice, the same
person can of course write both tests and contra-tests.
In that case, of course, the question which the program writer asks herself
is not likely to be "what haven’t the test writer thought about?"
(or why indeed not?)
but rather: "what have I thought of but possibly got wrong anyway?" — much like when one developer writes normal unit tests for his or
her own code.
Now, why might this testing approach work better than the status quo?
One argument for contra-testing has to do with confirmation bias:
it’s like the difference between asking "prove that my program works" and
"this program has one small but significant bug. Find it."
Given that all software contains bugs, the latter question is
obviously the correct one to ask.
Furthermore, having pairs of tests with a single variation and with
opposite expected results is generally good practice, as it heightens
the probability that you’re testing what you think you’re testing.
Testing on both sides of a boundary, and expecting different results,
tends to be a very good idea.
The contra-tests also give useful input to the question, "when have we
tested enough?"
What might the consequences be for which unit tests are written?
The above examples give reason to believe that there might be a
greater weight on non-functional tests such as tests of
performance/algorithmic complexity and thread-safety.
Also, the trickiest part of the code, the special cases which were
hardest to express in code, and the hardest-gained insights of the
program developer would probably be better assured to get attention — because the program developer is most acutely aware of these areas of
the problem.
Which would be a very good thing, because for the same reasons, these corners
and program properties would be most likely to deteriorate later on
in the program’s life.
Finally, especially if contra-testing is carried out with program
writer and test writer being separate persons, I suppose there could be an
effect on these developer’s testing mindset in general.
This suspicion is purely based on the above examples, though.
 
 
Not all new
I haven’t yet had the chance to try out contra-testing in practice,
but I find the idea interesting — perhaps you do too, now.
And as I’ve mentioned, the idea is not all new — few ideas usually are.
As for the related preexisting techniques I mentioned earlier:
Mutation Testing and Bebugging are both about assessing the adequacy
of a given test suite, and/or estimating the number of remaining,
uncaught bugs.
Bebugging involves manually injecting a known number of bugs, seeing
what proportion of of those are caught by the testing process, and
using that number to extrapolate from the number of real caught errors
to the number of unfound errors.
Mutation testing, on the other hand, is about automatically generating
program variations (each of which may or may not be faulty) and see in
how many cases the test suite catches the difference. This translates
into an adequacy score of the test suite.
Thus, this method in effect works as an improved coverage test,
measuring not only if all parts of the program is exercised, but also
whether the effects of each part of the program are in fact (directly
or indirectly) checked by the tests.
The main drawbacks of mutation testing is that it’s a
resource-intensive process, and that many of the generated variations
may be semantically equivalent to the original program, and this
cannot be detected fully automatically.
As far as I can make out, neither of the two techniques are meant for
continuous testing, and for being a permanent and integrated part of
the development process.  Thus, contra-testing may still be not all old news.
I guess another name for it would be "Continuous Bebugging".
Given time, I’ll see if I can get some level of JUnit or TestNG
support for contra-testing going. (Yeah, like that’s going to happen…)
In any case: If you decide to try this testing method, please let me
know your experiences.