Contents
  1. I/O-Bound vs CPU-Bound Work
  2. Threading: Shared Memory, Limited by the GIL
  3. Multiprocessing: Separate Processes, True Parallelism
  4. Asyncio: Cooperative Concurrency on a Single Thread
  5. Decision Table
  6. What You Can Do Now
← All posts

Multiprocessing, Threading, and Asyncio: Choosing the Right Concurrency Model in Python

Python offers three distinct concurrency models: threading, multiprocessing, and asyncio. Each exists for a different class of problem. Choosing the wrong one produces code that is either slower or more complex than it needs to be.

Python offers three concurrency primitives in the standard library: threading, multiprocessing, and asyncio. They are not interchangeable. Each was designed to address a specific class of problem, and applying the wrong one leads to programs that are either no faster than sequential code or significantly more expensive to run. The key question before reaching for any of them is whether the work is I/O-bound or CPU-bound.

I/O-Bound vs CPU-Bound Work

A task is I/O-bound when execution time is dominated by waiting: waiting for a network response, a disk read, a database query, or any external system. The CPU sits idle for most of the operation. A task is CPU-bound when execution time is dominated by computation: number crunching, image processing, parsing large data structures, or any work that keeps the CPU continuously busy.

This distinction drives every decision that follows. Tools built for I/O-bound work offer little or no benefit for CPU-bound tasks, and vice versa.

Threading: Shared Memory, Limited by the GIL

The threading module allows multiple threads to run within a single process. All threads share the same memory space, which makes data sharing straightforward but requires synchronization to avoid race conditions. The standard primitives for this are Lock and RLock, both of which support use as context managers.

import threading

lock = threading.Lock()
shared_counter = 0

def increment():
    global shared_counter
    with lock:
        shared_counter += 1

threads = [threading.Thread(target=increment) for _ in range(100)]
for t in threads:
    t.start()
for t in threads:
    t.join()

The critical limitation is the Global Interpreter Lock (GIL). In CPython, only one thread can execute Python bytecode at a time, even on a machine with many cores. The documentation states this directly: “due to the Global Interpreter Lock, only one thread can execute Python code at once.” For CPU-bound work, threading produces no performance gain over sequential code because threads cannot run in true parallel. For I/O-bound work, threading is effective: while one thread waits on a network response, the GIL is released and another thread can proceed.

Python 3.13 introduced an experimental free-threaded build that can disable the GIL, but this is not available by default and is not covered here as a general-purpose recommendation.

Multiprocessing: Separate Processes, True Parallelism

The multiprocessing module sidesteps the GIL entirely by spawning separate Python interpreter processes rather than threads. Each process has its own memory space and its own GIL, so they run in true parallel on multiple cores. The documentation describes this design explicitly: the package “effectively side-steps the Global Interpreter Lock by using subprocesses instead of threads.”

from multiprocessing import Pool

def compute(n):
    return sum(i * i for i in range(n))

if __name__ == "__main__":
    with Pool(processes=4) as pool:
        results = pool.map(compute, [1_000_000] * 8)
    print(results)

The tradeoff is memory and startup cost. Each process is a full Python interpreter, so spawning many of them is expensive in both time and RAM. Data passed between processes must be serialized via pickle, which adds overhead for large objects. Communication between processes uses Queue or Pipe, both of which serialize objects at the boundary.

As of Python 3.14, the default start method on POSIX systems is forkserver rather than fork. The fork method is fast but unsafe when the parent process has active threads. The spawn method, which is the default on Windows and macOS, starts a fresh interpreter each time and is the safest option.

Multiprocessing is the correct choice when the work is CPU-bound and can be divided into independent units. It is not the right tool for tasks that require frequent communication between workers or that need to share large amounts of data, because the serialization cost becomes significant.

Asyncio: Cooperative Concurrency on a Single Thread

The asyncio module provides concurrency on a single thread through cooperative scheduling. Instead of using OS-level threads or processes, it runs an event loop that switches between coroutines at await points. The documentation describes the mechanism precisely: “an event loop runs one Task at a time. While a Task awaits for the completion of a Future, the event loop runs other Tasks, callbacks, or performs IO operations.”

import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    async with asyncio.TaskGroup() as tg:
        tasks = [tg.create_task(fetch(session, url)) for url in urls]

Coroutines are declared with async def and must be explicitly awaited or scheduled. Calling a coroutine function without await produces a coroutine object but does not execute it. asyncio.gather() runs multiple awaitables concurrently and returns their results. asyncio.TaskGroup, added in Python 3.11, provides structured concurrency: if one task raises, the remaining tasks in the group are cancelled automatically.

The advantage over threading is overhead. A thread has OS-level cost in memory and context-switching. Asyncio can handle thousands of concurrent I/O operations within a single thread with very low per-task cost. The requirement is that all I/O operations use async-compatible libraries, because a blocking call in a coroutine blocks the entire event loop.

Asyncio is the correct choice when handling a large number of concurrent I/O operations, such as making many HTTP requests, managing many open connections, or building a network server.

Decision Table

ScenarioRecommended model
CPU-bound computation, multiple coresmultiprocessing
I/O-bound, moderate concurrency, shared statethreading
I/O-bound, high concurrency, low overheadasyncio
Mixed CPU and I/O-boundmultiprocessing with asyncio per process
Simple sequential I/ONone, sequential code is sufficient

What You Can Do Now

Run the same five URL fetches using all three models and observe the timing difference. This requires requests for the threading and multiprocessing versions and aiohttp for asyncio.

import time
import threading
import multiprocessing
import asyncio

URLS = [
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
    "https://httpbin.org/delay/1",
]

# --- threading ---
import requests

def fetch_sync(url):
    return requests.get(url).status_code

def run_threading():
    threads = [threading.Thread(target=fetch_sync, args=(u,)) for u in URLS]
    for t in threads:
        t.start()
    for t in threads:
        t.join()

# --- multiprocessing ---
def run_multiprocessing():
    with multiprocessing.Pool(processes=5) as pool:
        pool.map(fetch_sync, URLS)

# --- asyncio ---
import aiohttp

async def fetch_async(session, url):
    async with session.get(url) as r:
        return r.status

async def run_asyncio():
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(*[fetch_async(session, u) for u in URLS])

# --- timing ---
if __name__ == "__main__":
    for label, fn in [
        ("threading", run_threading),
        ("multiprocessing", run_multiprocessing),
    ]:
        start = time.perf_counter()
        fn()
        print(f"{label}: {time.perf_counter() - start:.2f}s")

    start = time.perf_counter()
    asyncio.run(run_asyncio())
    print(f"asyncio: {time.perf_counter() - start:.2f}s")

All three should complete in roughly one second rather than five, because the five requests overlap in time. The asyncio version will typically show the lowest overhead for this kind of workload as you scale the number of URLs upward, while the multiprocessing version will show the highest startup cost for short-lived I/O tasks.

← All posts