Where Is Python Slow: cProfile, pstats, and line

Performance problems in Python code are rarely where you expect them to be. A function that looks expensive at a glance is often called once and finishes quickly. An innocent utility function called ten thousand times inside a loop is often the real cause. The only way to know which is which is to measure. Guessing wastes time and can lead to optimising code that was never the bottleneck while the actual problem remains untouched.

Using timeit for Isolated Comparisons

Before profiling a full program, it is sometimes useful to compare two approaches to the same operation. The timeit module in the standard library is designed for exactly this. It runs a small code statement a specified number of times and returns the total elapsed seconds.

import timeit

# Compare two ways to build a joined string from a range
result_a = timeit.timeit(
    '"-".join(str(n) for n in range(100))',
    number=10000
)

result_b = timeit.timeit(
    '"-".join(map(str, range(100)))',
    number=10000
)

print(f"Generator: {result_a:.4f}s")
print(f"map():     {result_b:.4f}s")

The number parameter controls how many times the statement executes. The function returns the total time across all repetitions, not the time per call, so dividing by number gives the per-call cost. One important behaviour to be aware of: timeit disables garbage collection by default during the timing run. This produces cleaner measurements for pure computation, but it means the result may not reflect real-world timing if the code allocates and discards many objects. If GC behaviour matters for what you are measuring, pass 'gc.enable()' as the setup argument.

timeit is appropriate for micro-benchmarks on isolated expressions. It is not a substitute for whole-program profiling. When the goal is to understand where an application spends its time across many functions, cProfile is the right tool.

cProfile: Function-Level Profiling

cProfile is a deterministic profiler included in the Python standard library. It instruments every function call, return, and exception in the running program and records timing information for each function. Because it is implemented as a C extension, its overhead is lower than the pure-Python profile module.

The simplest way to profile a script is from the command line:

python -m cProfile -s cumulative my_script.py

The -s cumulative flag sorts the output by cumulative time before printing. To save the results to a file for later analysis, use -o:

python -m cProfile -o profile_output.prof my_script.py

To profile a specific section of code programmatically, use cProfile.Profile as a context manager:

import cProfile

with cProfile.Profile() as pr:
    result = run_expensive_function()

pr.print_stats(sort='cumulative')

Reading the cProfile Output

The output produced by cProfile contains one row per function, with the following columns:

ncalls is the number of times the function was called. When a function is recursive, this appears as two numbers separated by a slash, such as 100/1, where the first is the total call count and the second is the number of primitive (non-recursive) calls.

tottime is the total time spent inside the function itself, not including time spent in functions it called. This is the column to sort by when looking for functions where the code itself is slow.

percall in the first position is tottime divided by ncalls.

cumtime is the cumulative time spent in the function and all functions it called. A high cumtime value on a top-level function simply means it called other things. A high tottime is where the actual cost lives.

percall in the second position is cumtime divided by primitive call count.

filename:lineno(function) identifies where the function is defined.

When reading the output, start with tottime sorted descending to find where the interpreter is spending time in Python code, then check ncalls to see whether a function is called far more often than expected.

Sorting and Filtering with pstats

When profiling output is saved to a file, the pstats module provides a programmatic interface for sorting and filtering it. This is more flexible than the command-line -s flag because multiple sort keys and restrictions can be chained together.

import pstats
from pstats import SortKey

p = pstats.Stats('profile_output.prof')
p.strip_dirs()
p.sort_stats(SortKey.TIME).print_stats(15)

strip_dirs() removes the leading file path from each function name, making the output easier to read. sort_stats() accepts a SortKey enum or a plain string such as 'cumulative', 'time', or 'calls'. print_stats() accepts an integer to limit output to the top N rows, a decimal between 0.0 and 1.0 to show that fraction of rows, or a string treated as a regular expression to filter function names.

To narrow output to functions in a specific module and sort by cumulative time:

p.sort_stats(SortKey.CUMULATIVE).print_stats('my_module')

To find what is calling a hot function:

p.print_callers('process_record')

pstats is particularly useful when a profile run is long and the raw output has hundreds of entries. Filtering to a module name or a pattern quickly narrows down which layer of the application is responsible for the time.

line_profiler: Line-by-Line Timing

cProfile tells you which functions are slow. It does not tell you which lines inside those functions are slow. Once cProfile has identified a suspect function, line_profiler can measure the time spent on each individual line within it.

line_profiler is not part of the standard library. Install it with pipx to keep it isolated from project environments:

pipx install line_profiler

Once installed, decorate the functions you want to profile with @profile and run the script through the kernprof command:

kernprof -l -v my_script.py

The -l flag activates line-by-line profiling. The -v flag prints the results immediately after the script finishes rather than writing them to a file only.

@profile
def process_records(records):
    results = []
    for record in records:
        cleaned = record.strip().lower()
        tokens = cleaned.split()
        filtered = [t for t in tokens if len(t) > 3]
        results.append(" ".join(filtered))
    return results

Reading line_profiler Output

The output from line_profiler adds four columns to each source line: Hits is the number of times that line executed. Time is the total time spent on that line across all hits, in timer units. Per Hit is the time per execution. % Time is the fraction of the function’s total time that line accounts for.

A line with a high % Time and a low Hits count points to a single expensive operation. A line with a moderate % Time but an extremely high Hits count points to a cheap operation being called in a tight loop that accumulates cost through repetition. Both patterns are actionable, but the remedies are different. An expensive single call might be replaced with a faster algorithm or cached. A cheap call in a hot loop might be moved outside the loop, vectorised, or replaced with a compiled equivalent.

The correct workflow is to use cProfile first to identify the functions where time is being spent, then apply line_profiler only to those specific functions. Running line_profiler on an entire program without cProfile to guide it produces a large volume of output and does not help prioritise where to act.

What You Can Do Now

Profile the following function with both cProfile and line_profiler to practice the full workflow. The function has a deliberate inefficiency that cProfile will surface and line_profiler will pinpoint.

# save as profile_demo.py

def find_common_words(text, min_length=4):
    words = text.lower().split()
    counts = {}
    for word in words:
        cleaned = "".join(c for c in word if c.isalpha())
        if len(cleaned) >= min_length:
            if cleaned in counts:
                counts[cleaned] += 1
            else:
                counts[cleaned] = 1
    common = sorted(counts.items(), key=lambda x: x[1], reverse=True)
    return common[:20]


SAMPLE = "the quick brown fox jumps over the lazy dog " * 50000

if __name__ == "__main__":
    import cProfile
    import pstats
    from pstats import SortKey

    with cProfile.Profile() as pr:
        find_common_words(SAMPLE)

    stats = pstats.Stats(pr)
    stats.strip_dirs().sort_stats(SortKey.TIME).print_stats(10)

Run this with python profile_demo.py to see the cProfile output. Then add the @profile decorator to find_common_words, run kernprof -l -v profile_demo.py, and compare which lines account for the majority of time. The "".join(c for c in word if c.isalpha()) call inside the loop is the line that will stand out. Replacing the generator expression with str.translate or a compiled regular expression is the targeted change that the profiling output points toward.