Non-obvious duplication
I just wanted to describe what for me became an “eye-opener” regarding Test-Driven Development. Let’s begin with a quick recap on TDD.
- It says that while all tests pass, you’re not allowed to add new functionality. To do that, you have to add a new test which fails because of the lack of the wanted functionality.
- Then you make that test pass in the simplest way you can imagine. In fact, while you have a test which fails, you’re allowed to do practically anything (possibly except cardinal sins) to make it pass.
- After you’ve made it pass, you can refactor all you want to clean up the existing code, including the lines you just wrote.
This is the rhythm of TDD, which is often called red/green/refactor, as a reference to the color of the progress bar most unit testing interfaces use when test fail and pass.
Anyway, back to the eye-opening experience. As an example, lets use the following test, asserting that a method for calculating the sum of two integers is correct.
public void testSum() {
assertEquals(8, sum(3, 5));
}
The simplest way I can imagine to make this test pass is the following code is just faking the implementation.
public int sum(int augend, int addend) {
return 8;
}
Now, this really feels bad. We know that returning 8 is not a correct solution for all possible parameter values. On the other hand, the test is passing, which is my proof of that the program works. Nevertheless, just faking out the implementation feels like cheating. No matter how many more tests we write for this function we could always just continue faking it by making another special case, ending up with a gigantic switch statement rather than the real and much simpler function (I leave finding the correct function as an exercise for the reader
).
Now comes what for me was an eye-opener. Kent Beck in his book Test-Driven Development describes how to solve this problem nicely. He notices that there is duplication between the test and the implementation; not duplication of logic, but duplication of data. The duplication is also not explicit, but implicit. What return 8; really stands for is return 3+5;. And it is rather obvious to us that 3 and 5 really are the same numbers which are given as parameters to the sum() function in the test. In other words, the numbers 3 and 5 are duplicated.
To avoid this and remove the duplication, we have to make the function somehow reference the values in the test. In our case, this turns out to be very simple, as they are given to the method as parameters. We refactor the method as follows, and the duplication is gone.
public int sum(int augend, int addend) {
return augend + addend;
}
Now look and behold. By removing the duplication, we also made the function work as we wanted.
Lets think again about the rhythm of TDD. Since the test passed with our first return 8; implementation, we were at green bar (all tests passed). And when we’ve got a green bar, TDD only allows us to refactor the code, not add new functionality. Now, one can argue that the change made to the implementation (changing 8 to augend + addend) wasn’t a semantics preserving refactoring at all, but rather adding new functionality. To answer this, let’s recap on perhaps the most well known definition of refactoring, namely from Martin Fowlers book Refactoring.
A change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior.
So, in fact, practically all refactorings change the exact behavior of the code, just not the type we are interested in which is observable behavior. And when we’re talking Test-Driven Development, what you don’t have tests for, you can’t observe. Thus, the modification we made was not changing any observable (i.e. tested) behavior, just untested and therefore invisible behavior.
This might seem to you as either fooling oneself, or as an elegant solution to the problem. I put my vote on the latter.