Threads, Locks, and the One Rule CPython Will Not Break

Concurrency in Python is frequently misunderstood because the language’s threading model does not behave like threading in many other languages. Adding threads to a CPU-bound program can make it slower. Removing threads from an I/O-bound program can also make it slower. The mechanism responsible for both outcomes is the Global Interpreter Lock.

What the GIL Is and Why It Exists

The Global Interpreter Lock, commonly referred to as the GIL, is a mutex in the CPython interpreter that ensures only one thread executes Python bytecode at any given time. It is not a design choice specific to threading. It exists because of how CPython manages memory.

CPython tracks object lifetime through reference counting. Every object carries an integer that records how many references point to it. When that count reaches zero, the memory is released. This is straightforward in a single-threaded program, but in a multi-threaded one, two threads could read and modify the same reference count simultaneously, producing a race condition. The count could be incremented and decremented in an interleaved order that leaves it incorrect, either leaking memory or freeing an object that is still in use.

Rather than placing a fine-grained lock on every individual object, CPython uses a single interpreter-level lock. This is the GIL. It protects the reference counts of all objects, as well as critical built-in types such as dict, from concurrent modification. The trade-off is that thread safety is achieved at the cost of parallelism.

What the GIL Actually Blocks

The GIL prevents more than one thread from executing Python bytecode at the same time, regardless of how many CPU cores are available on the machine. A program running two threads on an eight-core machine is still limited to one active thread at any moment when executing Python code. The additional cores provide no benefit for computation that stays inside the Python interpreter.

This has a direct consequence for CPU-bound work. A task that counts, transforms, or calculates using Python objects will not run faster when split across multiple threads. The threads will still take turns one at a time, and the overhead of switching between them can make the overall execution time longer than a straightforward single-threaded approach.

When Threads Are Still Useful

The GIL does not hold permanently. CPython releases the GIL whenever a thread is waiting on I/O. File reads, network requests, database queries, and similar operations cause a thread to yield the lock while it waits for an external response. Another thread can then acquire the GIL and execute Python bytecode while the first thread is waiting.

This makes threads genuinely effective for I/O-bound programs. A web scraper fetching many URLs, a server handling concurrent connections, or a program reading from multiple files at once can all benefit from threading. Each thread spends most of its time waiting rather than computing, so the serialisation imposed by the GIL is rarely the bottleneck.

Extension modules written in C can also release the GIL explicitly when performing computationally intensive work such as compression or hashing. Libraries like NumPy do this for certain operations, which is why array-level numerical work in NumPy can benefit from threading even though pure Python loops cannot.

The following example shows the structure of an I/O-bound concurrent fetch using concurrent.futures:

import concurrent.futures
import urllib.request

URLS = [
    "https://python.org",
    "https://docs.python.org",
    "https://pypi.org",
]

def fetch(url):
    with urllib.request.urlopen(url, timeout=10) as response:
        return url, len(response.read())

with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    futures = {executor.submit(fetch, url): url for url in URLS}
    for future in concurrent.futures.as_completed(futures):
        url, size = future.result()
        print(f"{url} — {size} bytes")

While any one thread is blocked waiting for a server response, the other threads are free to run. The GIL releases during the network wait and is reacquired when the response arrives and Python resumes processing.

When Threads Are Not Useful

CPU-bound tasks receive no parallelism benefit from threads under the GIL. Consider a function that counts down from a large number:

import threading

COUNT = 50_000_000

def countdown(n):
    while n > 0:
        n -= 1

# Single-threaded
countdown(COUNT)

# Multi-threaded — no faster, often slower
t1 = threading.Thread(target=countdown, args=(COUNT // 2,))
t2 = threading.Thread(target=countdown, args=(COUNT // 2,))
t1.start()
t2.start()
t1.join()
t2.join()

Both approaches execute the same number of Python operations. The multi-threaded version adds the cost of thread creation and GIL contention without gaining any parallelism. On a multi-core machine, the threads still alternate through the GIL rather than executing side by side.

How Multiprocessing Bypasses the GIL

The standard library’s multiprocessing module addresses CPU-bound parallelism by spawning separate operating system processes instead of threads. Each process has its own Python interpreter and its own GIL. Because the processes do not share an interpreter, there is no contention over a single lock.

import multiprocessing

COUNT = 50_000_000

def countdown(n):
    while n > 0:
        n -= 1

if __name__ == "__main__":
    p1 = multiprocessing.Process(target=countdown, args=(COUNT // 2,))
    p2 = multiprocessing.Process(target=countdown, args=(COUNT // 2,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

The two processes run their countdown loops in true parallel. The cost is that spawning processes carries higher overhead than creating threads, and sharing data between processes requires explicit mechanisms such as queues or shared memory rather than simple shared variables.

Free-Threaded Python and PEP 703

Python 3.13 introduced an experimental build configuration that disables the GIL entirely. This is specified in PEP 703. A Python interpreter built with --disable-gil can be invoked with -X gil=0 or by setting the PYTHON_GIL=0 environment variable, at which point multiple threads can execute Python bytecode simultaneously without serialisation.

Free-threaded builds are experimental in 3.13 and the broader ecosystem of extension modules has not yet fully adapted to the new threading model. The feature is a significant step toward true multi-core Python, but it is not yet a straightforward drop-in replacement for standard CPython in production use.

What You Can Do Now

Write a small benchmark that runs the same CPU-bound task three ways: single-threaded, multi-threaded, and with multiprocessing. Measure the elapsed time for each and observe the difference the GIL produces.

import threading
import multiprocessing
import time

COUNT = 20_000_000

def countdown(n):
    while n > 0:
        n -= 1

def run_single():
    start = time.perf_counter()
    countdown(COUNT)
    return time.perf_counter() - start

def run_threaded():
    start = time.perf_counter()
    t1 = threading.Thread(target=countdown, args=(COUNT // 2,))
    t2 = threading.Thread(target=countdown, args=(COUNT // 2,))
    t1.start(); t2.start()
    t1.join(); t2.join()
    return time.perf_counter() - start

def run_multiprocess():
    start = time.perf_counter()
    p1 = multiprocessing.Process(target=countdown, args=(COUNT // 2,))
    p2 = multiprocessing.Process(target=countdown, args=(COUNT // 2,))
    p1.start(); p2.start()
    p1.join(); p2.join()
    return time.perf_counter() - start

if __name__ == "__main__":
    print(f"Single-threaded:  {run_single():.2f}s")
    print(f"Multi-threaded:   {run_threaded():.2f}s")
    print(f"Multiprocessing:  {run_multiprocess():.2f}s")

Run this and compare the three results. The single-threaded and multi-threaded times will be similar, with the threaded version often slightly slower. The multiprocessing time will be lower, approaching half, with the remainder accounted for by process startup overhead. Once you have seen this directly, the GIL’s effect on CPU-bound work is no longer abstract.