Python's Hash and Equality Contract: What Sets and Dicts Depend On
How __hash__ and __eq__ interact in Python, why objects that compare equal must share a hash value, and what goes wrong when mutable objects are made hashable.
Python’s set and dict types depend on two methods working correctly together: __eq__ and __hash__. When you define a custom class and want instances to participate in these collections, you are entering a contract between those two methods. Breaking the contract does not always raise an immediate error. It can produce silent, difficult-to-trace bugs.
What __eq__ Does and Its Default Behaviour
__eq__ is called whenever Python evaluates the == operator. By default, object implements it using identity: two objects are considered equal only if they are the same object in memory. The documentation states that the default implementation uses is, returning NotImplemented in the case of a false comparison. For __ne__, Python delegates to __eq__ and inverts the result.
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
a = Point(1, 2)
b = Point(1, 2)
a == b # False — different objects, default identity comparison
a == a # True
Defining __eq__ replaces that identity comparison with whatever logic you provide. Two separate instances can now be considered equal if their fields match.
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return self.x == other.x and self.y == other.y
a = Point(1, 2)
b = Point(1, 2)
a == b # True
What __hash__ Does and How Collections Use It
__hash__ is called by the built-in hash() function and by hashed collections including set, frozenset, and dict. These collections do not scan their entire contents to find a matching key. Instead, they compute the hash value of the object being looked up and use that value to locate the bucket where the object should reside. Only then do they use __eq__ to confirm the match.
This means hash determines where an object lives in the collection, and equality confirms the identity of the object once found.
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return self.x == other.x and self.y == other.y
def __hash__(self):
return hash((self.x, self.y))
a = Point(1, 2)
b = Point(1, 2)
hash(a) == hash(b) # True
{a, b} # one element — a and b are equal and hash the same
The Contract: Equal Objects Must Share a Hash
The fundamental rule, stated directly in the Python data model documentation, is that objects which compare equal must have the same hash value. If x == y is true, then hash(x) == hash(y) must also be true.
The reverse is not required. Two objects can share the same hash value without being equal. These are hash collisions, and Python handles them by falling back to __eq__. The only prohibited direction is the asymmetric one: two objects that are equal but produce different hashes.
The consequence of violating this rule is that a set or dict will fail to recognise two equal objects as the same key. An item inserted under one object cannot be found using the other, even though == between them returns True.
Defining __eq__ Without __hash__
If you define __eq__ in a class but do not define __hash__, Python sets __hash__ to None implicitly. The documentation is explicit: a class that overrides __eq__() and does not define __hash__() will have its __hash__() implicitly set to None. When __hash__ is None, instances of the class raise TypeError if passed to hash() or used as a set element or dict key.
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return self.x == other.x and self.y == other.y
p = Point(1, 2)
hash(p) # TypeError: unhashable type: 'Point'
{p} # TypeError: unhashable type: 'Point'
This is intentional. Python is enforcing the contract. If you have defined what equality means but have not defined what the hash should be, your class cannot safely participate in hash-based collections.
Why Mutable Objects Should Not Be Hashable
The Python documentation on __hash__ states directly that if a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(). The reason is structural: hashable collections require that a key’s hash value is immutable. If an object’s hash changes after it has been added to a set or dict, it ends up in the wrong bucket. It becomes unreachable through normal lookup and will not be found even though it is present.
class BadPoint:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return self.x == other.x and self.y == other.y
def __hash__(self):
return hash((self.x, self.y)) # depends on mutable state
p = BadPoint(1, 2)
s = {p}
p.x = 99 # mutate the object after insertion
p in s # False — hash changed, bucket no longer matches
The object is still in the set. The set still holds a reference to it. But the lookup fails because the hash computed at lookup time does not point to the bucket where p was originally stored.
Implementing Both Correctly
A correct implementation defines __hash__ using the same fields that __eq__ uses. The documentation recommends mixing those field values into a single tuple and hashing the tuple.
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
if not isinstance(other, Point):
return NotImplemented
return self.x == other.x and self.y == other.y
def __hash__(self):
return hash((self.x, self.y))
The isinstance guard is a recommended practice: returning NotImplemented when the comparison is not meaningful allows Python to try the reflected operation on the other operand. For __hash__, the tuple approach is sufficient because Python’s tuple hash already combines the hashes of its elements.
How This Affects Sets and Dicts in Practice
With both methods implemented correctly, instances behave as expected in all hash-based contexts: deduplication in sets, key lookup in dicts, and membership testing with in.
a = Point(3, 4)
b = Point(3, 4)
c = Point(5, 6)
s = {a, b, c}
len(s) # 2 — a and b are deduplicated
d = {a: "origin-relative", c: "shifted"}
d[b] # "origin-relative" — b hashes and equals a
The in operator on a set calls __hash__ first to find the candidate bucket, then __eq__ to confirm. Both must be consistent. If they are, the set operates correctly across all three: insertion, lookup, and removal.
What You Can Do Now
Build a Version class that represents a semantic version number. Make instances hashable so they can be stored in sets and used as dict keys. Verify the contract holds by inserting instances and looking them up with separately constructed equal instances.
class Version:
def __init__(self, major, minor, patch):
self.major = major
self.minor = minor
self.patch = patch
def __eq__(self, other):
if not isinstance(other, Version):
return NotImplemented
return (self.major, self.minor, self.patch) == (other.major, other.minor, other.patch)
def __hash__(self):
return hash((self.major, self.minor, self.patch))
def __repr__(self):
return f"Version({self.major}, {self.minor}, {self.patch})"
v1 = Version(1, 0, 0)
v2 = Version(1, 0, 0)
v3 = Version(2, 1, 3)
print(v1 == v2) # True
print(hash(v1) == hash(v2)) # True
released = {v1, v2, v3}
print(len(released)) # 2
changelog = {v1: "initial release", v3: "new features"}
print(changelog[v2]) # initial release — v2 equals v1 and shares its hash
Once this works, attempt to introduce mutation. Change one of the version fields after inserting the instance into a set, then check whether the instance is still found. The failure mode is informative.