Verify only what you need 🎯

Here’s a recent insight I had about unit testing.

Each unit test should verify only what it needs.

More specifically, each test should only verify the parts of the result that is relevant to the scope of that particular test.

Shocking? It seems so simple that I’m embarrassed that it felt like an insight—so obvious that it is practically a tautology. But let me share an example of what made it “click” for me.

Predicting words with Markov chains #

Making machines generate text is all the rage right now, so let’s write a function which can predict the next word in a sentence. Since our GPU budget is a bit tight, we’ll use a basic Markov chain rather than a fancy LLM.

We set up our tests by building a Markov chain from the full text of Alice in Wonderland.

// Read the full text of Alice in Wonderland
// https://www.gutenberg.org/files/11/11-h/11-h.htm
val corpus = File("alice-in-wonderland.txt").readText()  
// Predict the next word based on the _two_ preceding words
val order = 2
// Using a fixed random seed to make tests predictable  
val fixedRandom = Random(5)
// Build the Markov chain we will use for all tests
val chain = buildMarkovChain(corpus, order, fixedRandom)

Then we write a test that calls the prediction function. The output should be the next word following a given seed text, along with that seed text itself.

@Test
fun `it can predict the next word`() {  
    val actualResult = chain.predictText(seed = "That is")
    val expectedResult = PredictionResult(
	    seed = "That is",
	    prediction = "enough"
	)
    assertEquals(expectedResult, actualResult)
}

Great! Now we have a test to verify that we can predict the next word in a sentence.

Let’s add some production code as well.¹

fun buildMarkovChain(
	corpus: String,
	order: Int,
	random: Random
): MarkovChain {
	// Implementation goes here
}
fun MarkovChain.predictText(
	text: String
): PredictionResult {
	// Implementation goes here
}
data class PredictionResult(
	val seed: String,
	val prediction: String
)

New feature: predicting multiple words #

Predicting one word is cool, but it seems the users of our text generator want to generate more than one word at a time. Let’s add a test for that.

@Test  
fun `it can predict multiple words`() {  
    val actualResult = chain.predictText(seed = "That is", length = 4)
    val expectedResult = PredictionResult(
        seed = "That is",
        prediction = "enough said his father"
    )    
	assertEquals(expectedResult, actualResult)
}

Nice. We’ll add a length parameter to the predictText function and run the tests.

fun MarkovChain.predictText(
	text: String,
	length: Int // New parameter!
): String {  
    // Implementation goes here
}

Unfortunately, this change makes the first test fail. It does not know about the length parameter. To solve this, we can either update the first test to include a length, or add a default value to the parameter.

It makes sense to have one word as default, so let’s go with that.

fun MarkovChain.predictText(
	text: String,
	length: Int = 1 // New default value!
): String {  
    // Implementation goes here
}

Great, using a default value saved us from having to update old tests.

New feature: likelihood of prediction #

Next, we hear from our users that there is great demand for a feature that shows the likelihood of the prediction. Off we go! We’ll add a likelihood field to PredictionResult which tells, on a scale between zero and one, how likely that particular predication was.

@Test  
fun `result includes likelihood of the predition`() {
	val actualResult = chain.predictText(seed = "Alice was", length = 5)
	val expectedResult = PredictionResult(
		seed = "Alice was",
		prediction = "rather doubtful whether she could",
		likelihood = 0.014705882352941176
	)
	assertEquals(expectedResult, actualResult)
}

Whew! It took some serious ~~googling~~ engineering effort to figure out how to calculate the likelihood of a prediction, but we did it.

Unfortunately, our joy is short-lived as the other tests fail again. They don’t even compile, as none of them expect a likelihood property in the PredictionResult.

Again, we have a choice to make, and this time a default value will not save us.

To update or not to update #

Should we bite the bullet and update the existing tests? In this example, it is only two tests. We can live with that. But what if we had dozens of tests? Or hundreds? It is not very fun to add new features if we have to update all the tests every time.

I think a better solution is trying to design our tests so they don’t need updating. The key here is that the other tests did not even care about likelihood, so why should they be affected by it? Can we make the other tests not depend on the presence of that field? Yes, we could update the first test to only verify seed and prediction—the properties it actually knows and cares about.

@Test  
fun `it can predict the next word`() {  
	val result = chain.predictText(seed = "That is")
	assertEquals("That is", result.seed)
	assertEquals("enough", result.prediction)
}

If the test looks like this, adding a likelihood property to PredictionResult will not affect it. We could even split the test into two, one which verifies that the result includes the seed, and one which verifies that it actually makes a valid prediction. Those tests would be even less likely to fail.

Then when it comes to the second test, we don’t really need to verify the seed again.

@Test  
fun `it can predict multiple words`() {  
    val result = chain.predictText(seed = "That is", length = 4)
    assertEquals("enough said his father", result.prediction)
}

Finally, the third test could be written like this:

@Test  
fun `result includes likelihood of the predition`() {  
	val result = chain.predictText(seed = "Alice was", length = 5)
	assertEquals(0.014705882352941176, result.likelihood)
}

Written in this style, each test verifies only one thing. One distinct aspect of the functionality of the unit under test.²

Mindset for robust tests #

I believe each unit test should verify only what is necessary to fulfill its purpose. As a sanity check, look at the name of the test; are the assertions motivated by what the name says the test should do?³

Ideally, if you break something in your code, only a single test should fail. That test should tell you exactly what is broken. Wouldn’t that be much nicer than if half of the test suite went red?

In technical terms, I think the tests are partial rather than total with respect to the output. We could think of this as Separation of Concerns for tests, an example of Don’t Repeat Yourself, or perhaps as a version of the Single Responsibility Principle for tests: each test should have only one reason to fail.

Kent Beck, father of TDD and JUnit, says tests should be composable.

Say we have a suite of isolated tests & we run them all together. The suite’s success should give us confidence, even though each individual test on its own isn’t comprehensive.

Having this mindset has helped me write more robust tests⁴, which are less likely to require modification when the production code changes.

Updates #

2024-11-26: Original post published.
2025-11-11: Added quote by Kent Beck on composable tests.

I’ll leave the implementation of the MarkovChain class as an exercise for ~~ChatGPT~~ the reader. 😉 ↩︎
This does not mean each test can only have a single assertion. The important part is to verify one aspect of the behavior. Sometimes, that requires multiple assertions. ↩︎
Deriving the necessary assertions from the test name requires the name to clearly communicate what is being tested. But your tests are named well, are they not? ↩︎
Interestingly enough, my older blog post on How to write robust tests argues for increasing granularity of the “unit” under test to make tests more robust. This post found that robustness can be achieved by decreasing the scope of the validation performed. Maybe that is a good combination? To test slightly larger units, validating only a thin slice at a time. ↩︎