Structured Data in Python: Choosing Between dataclasses, TypedDict, and Pydantic

A plain Python dictionary carries no guarantees about which keys exist, what types the values are, or whether the structure is complete. For a small script this is tolerable, but in a larger codebase, or at any boundary where data enters from outside the process, it becomes a persistent source of bugs that neither the runtime nor a static checker can catch without additional tooling. Python now offers three well-established options for adding structure: the @dataclass decorator from the standard library, TypedDict from the typing module, and Pydantic’s BaseModel. They are not interchangeable. Each one makes a specific trade-off between simplicity, static analysis support, and runtime enforcement.

dataclasses: Generated Boilerplate for Internal Objects

The @dataclass decorator, added in Python 3.7 via PEP 557, transforms a class with annotated fields into a fully equipped data container. It generates __init__, __repr__, and __eq__ automatically from the field declarations. The decorator accepts parameters that control this behaviour: setting order=True generates comparison methods, frozen=True makes instances immutable by raising FrozenInstanceError on any attempted assignment, and slots=True (Python 3.10+) produces a __slots__-based class that is more memory-efficient.

from dataclasses import dataclass, field

@dataclass
class Article:
    title: str
    word_count: int
    tags: list[str] = field(default_factory=list)
    published: bool = False

The generated __init__ accepts these fields in declaration order. The __repr__ returns a string such as Article(title='...', word_count=800, tags=[], published=False). The __eq__ compares two instances field-by-field as a tuple, provided both are of the exact same type.

There is no runtime type enforcement. Passing word_count="eight hundred" to the constructor above raises no error. The annotations exist entirely for static checkers such as mypy or pyright, and for readers of the code. For mutable list or dict fields, field(default_factory=list) is required; a bare tags: list[str] = [] raises a ValueError at class definition time because the decorator detects the shared mutable default.

Dataclasses are the right tool when you need a lightweight, readable container for data that stays inside your own code. They carry no dependencies, add no overhead beyond the method generation at class creation time, and work naturally with the rest of the standard library.

TypedDict: Annotations Without a New Type

TypedDict does not create a new class in the object-oriented sense. At runtime, a TypedDict subclass is simply a regular dict. There are no generated methods, no attribute access, and no enforcement of any kind. Its purpose is entirely static: it gives type checkers a schema for a dictionary so that key access can be verified offline.

from typing import TypedDict, NotRequired

class ArticlePayload(TypedDict):
    title: str
    word_count: int
    tags: NotRequired[list[str]]

A checker using this definition will flag payload['titel'] as an unknown key and payload['word_count'] = "eight hundred" as a type mismatch. At runtime, both assignments succeed silently because the underlying object is a plain dict. NotRequired (added in Python 3.11 via PEP 655) marks individual keys as optional within a TypedDict that otherwise requires all keys. The inverse, Required, marks a key as mandatory inside a TypedDict declared with total=False.

TypedDict is the correct choice when a function or library expects or returns a plain dictionary and you want to document its shape for the type checker without changing the type of the object. It is particularly useful when interacting with APIs that consume or produce raw dicts, where converting to a class instance would require unnecessary wrapping and unwrapping. The limitation is the absence of any runtime guarantee: a TypedDict annotation cannot protect you from data that arrives with missing keys or wrong types at the boundary of your application.

Pydantic BaseModel: Validation at the Point of Entry

Pydantic’s BaseModel enforces types at runtime. When you instantiate a model, Pydantic validates every field against its declared type and raises a ValidationError if the data does not conform. Since version 2, this validation layer is implemented in Rust via the pydantic-core package, which makes it substantially faster than the pure-Python version 1 and reduces the performance argument against using it by default.

from pydantic import BaseModel, Field, field_validator
from typing import NotRequired

class ArticleRequest(BaseModel):
    title: str
    word_count: int
    tags: list[str] = Field(default_factory=list)

    @field_validator("title")
    @classmethod
    def title_not_empty(cls, v: str) -> str:
        if not v.strip():
            raise ValueError("title must not be blank")
        return v.strip()

By default, Pydantic performs coercion. A string "800" passed to word_count becomes the integer 800. This behaviour can be disabled per model with model_config = ConfigDict(strict=True), which requires values to already be the declared type. The model exposes model_dump() to produce a plain dictionary and model_dump_json() to produce a JSON string, both of which respect field aliases and exclusions declared via Field(). Parsing from external data uses model_validate(dict) or model_validate_json(json_string).

The cost of this power is a dependency on the pydantic package, a slightly higher instantiation overhead compared to a plain dataclass, and a steeper learning curve when field configuration becomes complex. For data that originates inside your own code and is never serialised, that cost is rarely justified.

Choosing Between the Three

The decision follows from where the data comes from and what you need to do with it.

Dataclasses are appropriate for internal data containers: objects that your own code creates, passes around, and uses without crossing a serialisation boundary. They are well-suited to domain model objects, intermediate computation results, and configuration objects built from known Python values. The standard library’s dataclasses.asdict and dataclasses.astuple helpers cover the common case of converting to a plain dict or tuple when needed.

TypedDict is appropriate when the data must remain a plain dictionary but you want a type checker to verify its shape. The most common scenarios are typing the return values of functions that build configuration dicts, annotating the parameters of functions that accept JSON-like structures, and describing the shape of objects returned by libraries that hand back raw dicts. It adds zero runtime overhead and zero new dependencies.

Pydantic BaseModel is appropriate whenever data enters the process from an external source: an HTTP request body, a configuration file, an environment variable, a database row retrieved through a loosely typed driver, or user input of any kind. It is also the standard choice in FastAPI, where request and response schemas are declared as BaseModel subclasses and validation, serialisation, and OpenAPI schema generation all derive from the same definition. By 2025, Pydantic v2 had become the de facto standard for these use cases in the Python ecosystem, and the combination of FastAPI plus Pydantic is now a common baseline for new HTTP API projects.

A rough decision rule: if the data is internal, use a dataclass. If the data must stay a dict and you only need static analysis, use TypedDict. If the data crosses a boundary and correctness must be enforced at runtime, use BaseModel.

What You Can Do Now

The following file puts all three tools next to each other around the same domain concept so the differences in behaviour are immediately visible.

# structured_data_demo.py

from dataclasses import dataclass, field
from typing import TypedDict, NotRequired
from pydantic import BaseModel, Field, field_validator, ValidationError


# --- dataclass: internal container, no runtime enforcement ---

@dataclass
class ArticleRecord:
    title: str
    word_count: int
    tags: list[str] = field(default_factory=list)
    published: bool = False


record = ArticleRecord(title="Draft", word_count="not a number")  # No error
print(record)
# ArticleRecord(title='Draft', word_count='not a number', tags=[], published=False)


# --- TypedDict: plain dict with a type checker schema ---

class ArticlePayload(TypedDict):
    title: str
    word_count: int
    tags: NotRequired[list[str]]


payload: ArticlePayload = {"title": "Draft", "word_count": "not a number"}
# No runtime error. A static checker (mypy, pyright) flags word_count as wrong type.
print(payload["word_count"])  # 'not a number'


# --- Pydantic BaseModel: runtime validation and coercion ---

class ArticleRequest(BaseModel):
    title: str
    word_count: int
    tags: list[str] = Field(default_factory=list)

    @field_validator("title")
    @classmethod
    def title_not_empty(cls, v: str) -> str:
        if not v.strip():
            raise ValueError("title must not be blank")
        return v.strip()


# Coercion: string becomes int
req = ArticleRequest(title="Draft", word_count="800")
print(req.word_count)         # 800
print(type(req.word_count))   # <class 'int'>
print(req.model_dump())       # {'title': 'Draft', 'word_count': 800, 'tags': []}

# Validation failure: raises ValidationError, not a silent wrong value
try:
    ArticleRequest(title="", word_count=800)
except ValidationError as exc:
    print(exc)

Run this file directly to observe that the dataclass and TypedDict assignments with a wrong type produce no runtime feedback, while the Pydantic validator raises immediately. Then run mypy --strict structured_data_demo.py to see that the static checker catches the wrong type in both the dataclass constructor call and the TypedDict assignment, but only Pydantic stops the program at runtime.