Only mock your own interfaces 🦜

Replacing real dependencies with test doubles¹ is a helpful tool when building unit tests. It allows us to isolate the unit under test from the surrounding parts, and ensure the behavior of that unit.

However, some components make it hard to write true unit tests. Usual suspects include databases, remote services, frameworks, and user interfaces. If you include them, you essentially turn the unit test into small integration tests.

Because of that, it can be tempting to mock such components. To replace that database or backend service with a mock, to verify the behavior of the code that calls them. But from my experience, that is a dangerous route to take.

Don’t mock what you don’t own #

Mocks encode our assumptions, which might differ from reality.

When you replace a real component with a fake one, you need to replicate the part of its behavior that is used by the code you want to test. When you do, you will encode your expectation of what the real component would do, which may differ from what it would actually do. And if you’ve failed to anticipate the correct behavior when you wrote the code under test, you will likely encode that same incorrect behavior in your mock.

What good is it to verify that a function produces a certain SQL query if that query is not actually valid? Or to verify that a function makes a certain HTTP request, if the server to receive it will not process it? In my experience, these are the real problems. Not that you fail to make the query or request that you intended to, but that it does not have the intended effect.

My rule of thumb for writing robust unit tests is: only mock your own interfaces. If you own the interface, you define its expected behavior, so mocking it is generally safe. However, if you don’t own it, someone else has defined its behavior and there is always a risk that you did not fully understand it.

So what about code that interacts with components that you don’t own? My suggestion is to try to isolate such code into a few places, and then use integration tests to verify that code. There is little point in unit testing the code you wrote against a mock which encodes your expectations of how the real component would behave.

As a bonus, you can avoid the often quite horrible code require to set up those mocks. 🙃

Consider using a real database #

For databases, you want to avoid mocking database drivers or any library that abstracts over them, such as Object-Relational Mapping (ORM) libraries. A common way of isolating your code from the database is using the Data Access Object (DAO) pattern. It hides any database communication behind a simple, high-level interface that the rest of your code can rely on. This DAO interface is safer to mock in unit tests, because we defined its behavior. However, writing unit tests for the DAO itself is mostly pointless. You are better off writing integration tests for such code.

In fact, it is often better not to mock DAO objects in other unit tests either. While its purpose is to isolate the business logic from the database, it often fails to fully do so. We often fail to anticipate many real-world error cases that are triggered by production code. They often require particular circumstances, or a non-trivial interaction between concurrent requests. This is particularly true if you are using an ORM tool. Such tools tend to bleed into your code, with its “entity managers” and “sessions” finding their ways far into business logic.

In such cases, I’ve actually set up my unit tests to run against an actual database. Yes, this may technically disqualify them from being unit tests², but I’ve found that it works very well in practice. Of course, we still want them to have the nice properties of unit tests – for example, they should be fast and isolated from each other.

I’ve used a setup where I spin up a new database (an embedded version of PostgreSQL in my case) and run all database schema migrations when the unit test suite starts, and then clear the database between each test.³ This adds a second or two to the test suite startup, and perhaps 50 ms to each test.

A note on record-and-replay #

While a database can often be started for running unit tests, many external services are unpractical or impossible to start on the machine running the tests. In such cases, there is a helpful middle ground where you can make snapshots of server responses, or use a record-and-replay proxy. They allow you to run your tests against the real system once, and then continue running them against a cached version of that interaction.

These tools work very well for some scenarios, especially for read-only interactions. The more complex the interaction is, and the more write operations it includes, the harder such tests are to write. (Handling generated ids and timestamps are common problems.) You may have to write a lot of “mock-like” code to make the proxy work, ending up with many of the same problems that we were trying to avoid.

Designing for testability #

I think it is interesting that writing code to be testable has implications on its design. Any complex business logic that you want to thoroughly unit test should be separated from code that interacts with external components. This allows you to test that complex logic without the hassle of mocking external components.

As I wrote in How unit testing changes your design you want a system where:

Most complexity is in code with few dependencies. Unit test this code as much as you want. Writing tests is easy because each unit is well isolated.
Most dependencies are in code with low complexity. Test setup is hard for this type of code. Consider not unit testing it at all, focusing instead on integration tests.

Whenever you’re tempted to mock an external component, think about if you can change the structure of the system you’re building. If you do, you may very well find yourself not having to mock anything at all.

Martin Fowler has a good summary of test double terminology covering stubs, mocks, and many other. ↩︎
You may also argue that tests that run against a real database are still unit tests, it is just that the “unit under test” is a combination of some code and a database. I’m not judging. 😊 ↩︎
I’ve seen compelling arguments for not clearing the database between tests, to detect bugs which only emerge as one accumulates more data. However, that requires more effort to write tests which do not make assumptions about the current state of the database. ↩︎