How to write robust tests

Many unit tests are brittle. As soon as the code is changed, the test breaks and has to be changed too. We don’t want that. We want robust tests.

A robust test is a test which does not have to change when the code it is testing is changed as long the change preserves the intended functionality.

A robust test does not break when the code is refactored. You don’t have to remove or change a robust unit test when you fix a bug. You just add a new one to cover the bug.

If you want to start writing more robust tests, here are a few things you can consider.

Test on a slightly higher level. Tests on a lower level often have to be removed or rewritten completely because there is much volatility in low-level class design. They require more significant changes when a large refactoring comes around, while higher-level classes tend to get by with smaller changes.
Choose which classes to test. Not every class needs its own test class. Especially, consider not writing separate tests for small private helper classes which are tightly coupled with a larger public class. If a certain class is very complex, selectively target that class with tests even though you don’t give its less complex sibling classes the same treatment.
Don’t fake or mock too much. Tests that fake or mock too much become less robust because they know too much about how the unit performs its work. If the unit finds another way to do the same work, the test will fail.
Focus on the important functionality. A robust test verifies functionality rather than implementation. It is focused on the parts of the unit’s interface which are truly important while it ignores the parts of the unit’s interface (or internals!) that should be allowed to change. Put differently, it knows the difference between “intentional” and “accidental” functionality.
Test in the language of the domain. By expressing your tests in the language of the domain, i.e. using concepts relevant to your business or application, you naturally create tests which depends on the wanted functionality, but not on too many implementation details.

Robust tests lead to “functionality unit” pattern #

All of these guidelines together favor a certain type of design pattern. We can call it “functionality unit”. It means that any piece of (non-trivial) low level functionality is performed by a primary class, optionally supported by a few secondary helper classes. The primary class is often the only publicly visible one and acts as a façade for the functionality performed by the secondary classes. The tests focus their efforts on the primary class and seldom tests the helper classes individually, unless there is a special reason such as high algorithmic complexity. They are expressed in the language of the business functionality the primary class is supposed to perform.

Designing and testing in this way makes robust unit tests possible because it:

Focuses on a level low enough to unit test effectively while high enough to be reasonably stable.
Doesn’t require mocking since unit tests see the helper classes as internals of the primary class.
Focuses on functionality performed by the primary class rather than the secondary ones.
Creates tests which “make sense” because they are expressed in domain language.

Let us look at an example #

In this example the functionality in question is to parse a certain type of document. We have a primary class Parser which is quite big. It has over 1000 lines of code and is rather hard to understand so we decide to split it up. The good part is that it is well unit tested with multiple test classes testing from different angles. To make the code clearer we figure out that extracting the two secondary classes Foo and Bar would be a good idea. It looks like this.

Two possible ways to structure tests. — Depending on how you structure your test, they may be more or less robust.

The question then becomes, what do we do with the tests?

First, we should note that the existing tests help us making the refactoring safely. They will (hopefully) break if we actually change the functionality of the Parser class. But what about after the refactoring? Should we keep the tests as they are or should we split them up into separate unit tests for each class? As always, the answer is “it depends”.

The alternative to the left represents keeping the tests more or less as they are. We save time by reusing existing tests. We test in the language of the domain. We avoid mocking because ParserTest doesn’t try to isolate Parser from Foo or Bar. To the right we have the other alternative where we rewrite most of the tests to test each individual class. This also has benefits. We follow the very straight-forward and intuitive pattern of having one test class per implementation class. Problems in the Foo or Bar classes might be even simpler to find with focused tests.

However, regarding robustness, we can ask one very important question. In which of the two alternatives would the tests survive a major implementation code refactoring? Say that we merge Bar back into Parser, or split Foo into Apple and Banana. Such a scenario would require much work with the tests in the right-hand alternative, while most likely none at all in the left-hand alternative. This is a major strength of the left-hand alternative, as well as the “functionality unit” pattern outlined above. By sometimes viewing a small group of highly related classes as a a unit rather than an individual class, we get more robust tests.

Comments #

Stanimir Stefanov at Jan 20, 2015: Very good summary for Robust tests. I agree that fine granularity should not be sought every time.

Updates #

2024-04-24: Republished this post which was originally written for my previous blog.