Contents
  1. How Python Manages Memory
  2. Reference Cycles and Why Counting Fails
  3. The Cyclic Garbage Collector
  4. The Problem with __del__
  5. Weak References
  6. Detecting Leaks with the gc Module
  7. What You Can Do Now
← All posts

When Python Holds On: Reference Cycles, Finalizers, and Weak References

Python's reference counting handles most memory automatically, but reference cycles defeat it. Understanding how the cyclic garbage collector, __del__, and weakref work together is what prevents long-running programs from quietly accumulating unreachable objects.

Python manages memory automatically, so the assumption is often that memory leaks are a concern for lower-level languages. That assumption breaks down in long-running processes. When objects refer to each other in a cycle, Python’s primary memory mechanism cannot collect them. The program continues allocating memory, the unreachable objects accumulate, and the process grows without an obvious cause.

How Python Manages Memory

CPython, the standard Python interpreter, tracks object lifetime through reference counting. Every object carries an integer field recording how many references point to it. When a new reference is created, the count increments. When a reference is removed, the count decrements. When the count reaches zero, the object is immediately deallocated and its memory is returned.

This mechanism is direct and predictable. In the common case, objects are freed the moment the last reference to them is dropped. There is no delay, no pause, and no background sweep. The lifetime of an object is determined precisely by the scope and ownership of its references.

Reference Cycles and Why Counting Fails

Reference counting fails when two or more objects hold references to each other in a cycle. Each object’s reference count stays above zero even after all external references to the group have been removed, because the objects within the cycle are still pointing at each other.

class Node:
    def __init__(self, value):
        self.value = value
        self.next = None

a = Node(1)
b = Node(2)

a.next = b   # a holds a reference to b
b.next = a   # b holds a reference to a

del a
del b
# Both objects still have a reference count of 1.
# Neither will be freed by reference counting alone.

After del a and del b, no part of the program can reach either node. Yet both remain alive because each one’s count is kept at one by the other. Reference counting has no mechanism to detect that the remaining references form a closed loop pointing to nothing reachable from the outside.

The Cyclic Garbage Collector

Python supplements reference counting with a cyclic garbage collector, accessible through the gc module. Its sole purpose is to find and collect groups of objects that are unreachable but whose reference counts have not reached zero due to mutual references.

The collector operates on a generational model. Every new object begins in generation 0. Objects that survive a collection are promoted to generation 1, and then to generation 2. Collection is triggered by a threshold: when the number of allocations minus deallocations exceeds a configurable value, a collection cycle begins. Younger generations are collected more frequently than older ones, on the premise that recently created objects are more likely to become garbage soon.

To find cycles, the collector examines tracked container objects (instances, lists, dicts, and similar types). It traverses the graph of references among them, identifies groups that are entirely self-contained with no external references reachable from outside the group, and collects the whole group at once. This is the work that reference counting cannot do on its own.

The Problem with __del__

The __del__ method defines what happens when an object is about to be destroyed. Before Python 3.4, objects that defined __del__ and were part of a reference cycle could not be safely collected. The collector could not determine in what order the finalizers should run: calling __del__ on one object in a cycle might rely on another object in the same cycle, which had already been torn down. Rather than risk undefined behaviour, CPython placed those objects into gc.garbage, a list of uncollectable objects, and left them there. The memory was never reclaimed.

Python 3.4 introduced PEP 442, which made finalization safe. Objects with __del__ are now finalized in a safe order even when part of a cycle, and they no longer accumulate in gc.garbage automatically. That list remains as a diagnostic tool and is populated only in unusual circumstances, such as C extension types with specific internal flags.

The remaining concern with __del__ is subtler. Because it runs at an unpredictable time relative to other code, and because the collector does not guarantee which generation will collect a given object or when, relying on __del__ for deterministic resource release is unreliable. It is better suited for last-resort cleanup than for structured resource management.

Weak References

A weak reference points to an object without contributing to its reference count. The referent can be collected at any time once no strong references remain, and when that happens, the weak reference returns None rather than accessing freed memory.

The weakref module provides weakref.ref for creating weak references, along with WeakValueDictionary, WeakKeyDictionary, and WeakSet for containers that hold their entries weakly. These are the standard tools for breaking reference cycles deliberately.

import weakref

class Cache:
    pass

obj = Cache()
ref = weakref.ref(obj)

print(ref())   # <Cache object at 0x...>
del obj
print(ref())   # None

A common situation where weak references are appropriate is a cache or registry that maps identifiers to live objects. If the cache holds strong references, the objects it tracks will never be freed even when nothing else in the program uses them. Replacing the values with weak references allows normal garbage collection to proceed. The cache entry disappears automatically when the object is collected.

The weakref.finalize function provides a more robust alternative to __del__ for cleanup logic. It registers a callable that is invoked when the target object is collected, without requiring any modification to the class itself. Critically, the cleanup callable must not hold a strong reference back to the object it is cleaning up, or the reference cycle it was meant to avoid is simply recreated.

import weakref, tempfile, shutil

class TempWorkspace:
    def __init__(self):
        self.path = tempfile.mkdtemp()
        self._cleanup = weakref.finalize(self, shutil.rmtree, self.path)

Here the finalizer holds only self.path, a string, not a reference to the TempWorkspace instance. When the instance is collected, shutil.rmtree runs against the stored path. The workspace directory is removed without __del__ and without the cycle risk that a bound method would introduce.

Detecting Leaks with the gc Module

When a program grows in memory over time and the cause is unclear, the gc module provides the tools to investigate directly.

gc.collect() triggers a full collection across all generations and returns the total number of objects collected plus any uncollectable objects found. Calling it manually and observing the return value indicates whether cycles are accumulating. A return value of zero after a period of activity means the collector found nothing to clean up. A consistently nonzero value suggests cycles are being created faster than they are being noticed.

gc.get_objects() returns a list of every object currently tracked by the collector. Calling it before and after a suspected leak and comparing the two lists reveals which types are accumulating. Filtering by type makes the comparison tractable.

import gc

# Trigger a full collection first to clear anything already unreachable.
gc.collect()
before = len(gc.get_objects())

# Run the suspected leaking operation.
create_cyclic_structures()

gc.collect()
after = len(gc.get_objects())

print(f"Object count before: {before}")
print(f"Object count after:  {after}")
print(f"Net increase:        {after - before}")

Setting gc.set_debug(gc.DEBUG_SAVEALL) causes the collector to append all unreachable objects to gc.garbage rather than freeing them, which allows inspection of exactly what was found. This is a diagnostic mode intended for development only; in normal operation, collected objects are freed and gc.garbage remains empty.

What You Can Do Now

Run the following script to observe a reference cycle being created, detected, and collected. It constructs a cycle that includes a finalizer, confirms that both objects are tracked by the collector, and then measures what gc.collect() reclaims.

import gc

class Tracked:
    def __init__(self, name):
        self.name = name
        self.other = None

    def __del__(self):
        print(f"{self.name} finalized")

# Ensure a clean baseline.
gc.collect()
baseline = len(gc.get_objects())

# Create two objects that reference each other.
x = Tracked("x")
y = Tracked("y")
x.other = y
y.other = x

# Drop all external references.
del x, y

# Both objects are now unreachable but not yet collected.
current = len(gc.get_objects())
print(f"Objects added since baseline: {current - baseline}")

# Force collection and observe finalization.
collected = gc.collect()
print(f"Objects collected: {collected}")

after = len(gc.get_objects())
print(f"Objects remaining above baseline: {after - baseline}")

Running this prints both finalizer messages before the final count returns to the baseline, confirming that the cycle was detected, both objects were finalized safely, and the memory was reclaimed. Removing the __del__ methods and re-running shows the same outcome, which demonstrates that finalization no longer blocks collection in Python 3.4 and later.

← All posts