Implementing equals()

Among Java developers, there exist different ideas about how to implement the equals() method. The disagreement is about whether to allow subclasses to be considered equal or not.

Two ways to see it

Some say that it should be done using the instanceof operator, like this.

public boolean equals(Object obj) {
    if (obj instanceof MyClass) {
        MyClass that = (MyClass) obj;
        // compare this and that
    } else {
        return false;
    }
}

Other says that it should be done by comparing the exact classes of the two objects, like this.

public boolean equals(Object obj) {
    if (obj != null && obj.getClass() == this.getClass()) {
        MyClass that = (MyClass) obj;
        // compare this and that
    } else {
        return false;
    }
}

The difference lies in how they treat subclasses. As instanceof returns true if obj is of the same class or any subclass, it will allow subclasses to be considered semantically identical to their superclasses. Comparing the result of getClass() on both objects will require that they are of exactly the same class. Using that method, no subclass can ever be semantically equal to its superclass.

To understand why this difference matters, it is important to remember one of the requirements Java has on any implementation of equals(); the equality relation must be symmetrical. In other words, for any objects a and b the expression a.equals(b) may return true if and only if b.equals(a) also does.

Because of this, the getClass() camp argues that since we can’t predict how subclasses might want to implement equals(), we better implement equals() in such a way that it only compares objects of the same class. That way a subclass can safely override and implement equals() any way they want. For example, a subclass might have additional fields which it wants to use in the equality comparison. Otherwise we might end up in a situation where e.g. the superclass says that is equal to the subclass but not the other way around.

The argument from the instanceof proponents is that not allowing a subclass to be considered equal breaks the notion of polymorphism. It also violates the Liskov substitution principle as well as the principle of least astonishment.

How I see it

Okay, enough with the summary. Now I’ll throw my own two cents in.

I think this whole question pretty much falls apart if we just make a better distinction between interface and implementation. An interface decides what a method is supposed to do, while a particular implementation knows how to do it. In other words, it is up to the interface to define what equality means for that interface. Being separated from its implementation(s), the interface must therefore define equals() in terms of comparisons of its own members. Thus, it cannot require the other object to “be of the same class” without having inappropriate knowledge about at least one of its implementations.

One problem is that a class in Java is somewhat ambiguous as it both has an interface and provides an implementation for it. This makes the interface more implicit. If you think it makes things clearer, extract an interface from your superclass and let it define (through documentation and unit tests) the exact behavior of the implementation. Then implement that exact behavior in any way you see fit in your class.

Example: Collections in Java

A good example of how things should work is the collections framework in Java. If we look at the documentation for the equals() method of e.g. the Set interface, we see the following. Pay special attention to the last sentence.

Returns true if the specified object is also a set, the two sets have the same size, and every member of the specified set is contained in this set (or equivalently, every member of this set is contained in the specified set). This definition ensures that the equals method works properly across different implementations of the set interface.

This behavior is defined by the interface Set. It is then implemented in the abstract class AbstractSet which is in turn extended by HashSet and TreeSet. None of the two subclasses override equals().

Example: Employees and Managers

In a blog entry, Cay Horstmann uses another example. However, he is arguing for using getClass() rather than instanceof. His example includes a class hierarchy where we have an Employee class as the base and then a Manager class which extends Employee by adding a bonus field. Expressing the example in code, it would look something like below.

public Employee {
    public boolean equals(Object other) {
        if (!(other instanceof Employee)) return false;
        // cast other to Employee and compare fields
        // ...
    }
}

public Manager extends Employee {
    public boolean equals(Object other) {
        if (!super.equals(other)) return false;
        // cast other to Manager and compare fields
        return bonus == ((Manager)other).bonus;
    }
}

In other words, the subclass Manager changes the meaning of equality it inherited from its superclass Employee. Is this a sensible thing to do? As you might guess, I would say no. We (hopefully) decided when we wrote Employee that there was a given way to uniquely identify any specific employee. That might have been an employee id, combination of first name and last name, or any other combination of Employee members. For sake of discussion, let us say we compare the values of the function getId(). Being an Employee itself, what reason would Manager have to change this definition? To check if two Manager objects have the same id but different values for the field bonus? Well, if we do have two such objects, then that is the problem. We shouldn’t have allowed them exist at the same time. The job of equals() is not to guard us from having corrupt data in our model!

Final thoughts

If a subclass truly has to override equals() to provide a semantically different implementation of equals() it might instead be a sign of that the subclass perhaps should not be a subclass after all. A subclass is supposed to extend or alter the behavior of a superclass or implement the behavior of an interface, but always doing so within the boundaries of all inherited interfaces.

One way to make sure subclasses does not break the contract is to make the equals() method in the superclass final. However, I think that is just overly protective. There could be a valid reason for wanting to override the equals() and still preserve its semantics. Performance optimization could be one such reason.

Thus, in my humble opinion, the correct way to implement equals() is as follows.

public boolean equals(Object obj) {
    if (obj instanceof MyInterface) {
        MyInterface that = (MyInterface) obj;
        // compare public members of this and that
    } else {
        return false;
    }
}

However, my opinion might not be the same as yours, and your opinion might not be the same as mine (for symmetry purposes ;-) ). If you happen to have another opinion, feel free to comment or trackback!

4 Responses

  1. Du övertygade mig nästan! =) Men jag har fortfarande känslan av att det borde finnas undantag… Jag ska fundera på det mer någon gång när jag får tid…

    Loriel - May 19th, 2006 at 10:26
  2. Your argument makes sense, but there are still issues. For example, consider

    Comparator desc = … // Sort descending SortedSet ascending = new TreeSet (); ascending.addAll (Arrays.asList(“a”,”b”)); SortedSet descending = new TreeSet (desc); descending.addAll (ascending); System.out.println (ascending.equals (descending));

    These sets are equal, as required by the contract of the Set interface, but this may not be what you want when comparing SortedSets.

    Sometimes it would be useful to have pluggable equality checks (analogously to what Comparator does for comparisons).

    Daniel - April 10th, 2010 at 03:10
  3. And of course one, somewhat awkward, way of achieving pluggable equality checks that is useful in some cases is to use a wrapper that defines equals/hashcode appropriately.

    Daniel - April 10th, 2010 at 03:17
  4. Hi Daniel, thanks for commenting! (And sorry for a late response.)

    That’s a very good example you provided, and I agree with you.

    It reminds me of discussions about what type of matrix multiplication should be used by the * operator (for languages which support operator overloading). One solutions is simply not to use the * operator at all, but provide different methods for different types of multiplication. Maybe the same principle could be used here?

    Or maybe Set and SortedSet simply should be two different classes? Say, Set and List. Each with their own definition of equality.

    Henrik Jernevad - April 19th, 2010 at 21:25