Python Python

Modeling an Order Management System with Python Dataclasses

Dima Июн 30, 2026

Introduction

Every non-trivial Python application models domain entities: orders, customers, products, invoices. The straightforward approach is a plain class with an __init__ method that assigns each attribute. That works until the model has eight fields. Then you need __repr__ for debugging, __eq__ for unit tests, __hash__ for set membership, comparison operators for sorting by price or date, and validation in the constructor. The result is hundreds of lines of mechanical boilerplate that carries no information — a reader cannot look at self.total = total and learn anything that the field name does not already say.

Python's dataclasses module, introduced in 3.7, generates this boilerplate automatically from a class declaration that is purely declarative. It does not sacrifice any flexibility: you can override what is generated, add validation, control which fields appear in __repr__, and freeze instances to make them immutable. The module is part of the standard library, costs nothing to add, and scales cleanly from a two-field value object to a twenty-field aggregate root.

This tutorial builds a realistic order management system step by step, starting from the boilerplate-heavy plain class and refactoring it toward a clean, correct dataclass implementation. Every feature of the module is motivated by a real requirement the system needs to satisfy.


Background

A dataclass is an ordinary Python class decorated with @dataclass. The decorator inspects the class body's type annotations and generates __init__, __repr__, and __eq__ based on them. It does not change how the class is instantiated or how attribute access works — generated classes are still fully extensible.

Key terms used throughout:

  • Field: One annotated attribute defined in the class body. price: float = 0.0 is a field with a default.
  • field(): A function from the dataclasses module that replaces a simple default with a richer descriptor — controlling repr inclusion, comparison participation, default factories, and metadata.
  • default_factory: A callable passed to field() that produces a fresh default value on each instantiation. Required for mutable defaults like lists and dicts.
  • __post_init__: A method the generated __init__ calls after setting all fields. The correct place for validation and derived-field computation.
  • frozen: A @dataclass(frozen=True) class raises FrozenInstanceError on any attribute assignment after construction, making instances behave like tuples.

Practical Scenario

A mid-size e-commerce platform processes thousands of orders per day. The engineering team maintains a Python service that models three core entities: LineItem (a product and quantity), Customer (billing contact and tier), and Order (a set of line items associated with a customer). These models are created in the web layer, passed through validation and pricing logic, compared in unit tests, serialised to JSON for the message queue, and logged at every stage.

The team's existing model code was written in a hurry. Each class has a handwritten __init__ that assigns fields in sequence, a __repr__ that was copy-pasted and drifted out of sync with the actual fields, and no __eq__ — so unit tests compare objects with is, which always fails on fresh instances. Adding a new field to Order means updating __init__, __repr__, and every test fixture by hand.

The team needs the models to be correct by construction, comparable by value, sortable by total price, frozen once submitted to the queue, and serialisable to a dictionary with one function call.


The Problem

Create a new file:

touch orders.py

Run it using:

python3 orders.py
class LineItem:
    def __init__(self, product_id, name, unit_price, quantity):
        self.product_id = product_id
        self.name       = name
        self.unit_price = unit_price
        self.quantity   = quantity

    def __repr__(self):
        return (
            f"LineItem(product_id={self.product_id!r}, name={self.name!r}, "
            f"unit_price={self.unit_price}, quantity={self.quantity})"
        )

    def __eq__(self, other):
        if not isinstance(other, LineItem):
            return NotImplemented
        return (
            self.product_id == other.product_id
            and self.name == other.name
            and self.unit_price == other.unit_price
            and self.quantity == other.quantity
        )


class Customer:
    def __init__(self, customer_id, name, email, tier="standard"):
        self.customer_id = customer_id
        self.name        = name
        self.email       = email
        self.tier        = tier

    def __repr__(self):
        return (
            f"Customer(customer_id={self.customer_id!r}, name={self.name!r}, "
            f"email={self.email!r}, tier={self.tier!r})"
        )

    def __eq__(self, other):
        if not isinstance(other, Customer):
            return NotImplemented
        return (
            self.customer_id == other.customer_id
            and self.name == other.name
            and self.email == other.email
            and self.tier == other.tier
        )


class Order:
    def __init__(self, order_id, customer, items=None):
        self.order_id = order_id
        self.customer = customer
        self.items    = items if items is not None else []

    def __repr__(self):
        return (
            f"Order(order_id={self.order_id!r}, customer={self.customer!r}, "
            f"items={self.items!r})"
        )

    def __eq__(self, other):
        if not isinstance(other, Order):
            return NotImplemented
        return (
            self.order_id == other.order_id
            and self.customer == other.customer
            and self.items == other.items
        )


customer = Customer("C001", "Adriana Stoica", "adriana@example.ro", "premium")
item1    = LineItem("P101", "Wireless Headphones", 149.99, 1)
item2    = LineItem("P204", "USB-C Cable", 12.50, 3)
order    = Order("ORD-2024-001", customer, [item1, item2])

print(order)
print("Items match:", item1 == LineItem("P101", "Wireless Headphones", 149.99, 1))


Order(order_id='ORD-2024-001', customer=Customer(customer_id='C001', name='Adriana Stoica', email='adriana@example.ro', tier='premium'), items=[LineItem(product_id='P101', name='Wireless Headphones', unit_price=149.99, quantity=1), LineItem(product_id='P204', name='USB-C Cable', unit_price=12.5, quantity=3)])
Items match: True


This is eighty lines of code to describe three entities. Every line of __init__, __repr__, and __eq__ is mechanical: it cannot be wrong in interesting ways, but it can drift — a field added to __init__ but forgotten in __repr__, an __eq__ that tests three of four fields. The mutable default problem in Order.__init__ (items=None with an if guard) is a well-known Python pitfall that every developer writes at least once before learning the lesson. None of this is what the class is about.


@dataclass Basics

The @dataclass decorator replaces __init__, __repr__, and __eq__ with generated versions derived directly from the class's type annotations. Replace the entire content of orders.py with the following:

from dataclasses import dataclass

@dataclass
class LineItem:
    product_id: str
    name:       str
    unit_price: float
    quantity:   int

@dataclass
class Customer:
    customer_id: str
    name:        str
    email:       str
    tier:        str = "standard"

@dataclass
class Order:
    order_id: str
    customer: Customer
    items:    list

customer = Customer("C001", "Adriana Stoica", "adriana@example.ro", "premium")
item1    = LineItem("P101", "Wireless Headphones", 149.99, 1)
item2    = LineItem("P204", "USB-C Cable", 12.50, 3)
order    = Order("ORD-2024-001", customer, [item1, item2])

print(order)
print("Items match:", item1 == LineItem("P101", "Wireless Headphones", 149.99, 1))


Order(order_id='ORD-2024-001', customer=Customer(customer_id='C001', name='Adriana Stoica', email='adriana@example.ro', tier='premium'), items=[LineItem(product_id='P101', name='Wireless Headphones', unit_price=149.99, quantity=1), LineItem(product_id='P204', name='USB-C Cable', unit_price=12.5, quantity=3)])
Items match: True


The output is identical. The code went from eighty lines to twenty-two. Every field appears exactly once — in the class body — and __init__, __repr__, and __eq__ are generated from that single source of truth. Adding a new field to LineItem now means one line.

The class body now reads as a specification, not an implementation. A developer scanning LineItem sees four fields and their types — nothing else competes for attention. __eq__ is guaranteed to test exactly the fields that __repr__ shows, because both are generated from the same annotation list. Drift between them is structurally impossible.

Note: Fields with defaults must come after fields without defaults, exactly as in a regular __init__ signature. If you place tier: str = "standard" before name: str, Python raises TypeError: non-default argument 'name' follows default argument.


Field Defaults and default_factory

Simple defaults like tier: str = "standard" work for immutable values. For mutable defaults — lists, dicts, sets — assigning the value directly in the class body raises a ValueError at class definition time. The field() function's default_factory argument is the correct solution: it accepts a callable that is invoked fresh for each new instance.

Replace Order with the following:

from dataclasses import dataclass, field

@dataclass
class Order:
    order_id: str
    customer: Customer
    items:    list[LineItem] = field(default_factory=list)
    tags:     list[str]      = field(default_factory=list)

customer = Customer("C001", "Adriana Stoica", "adriana@example.ro", "premium")
item1    = LineItem("P101", "Wireless Headphones", 149.99, 1)
order_a  = Order("ORD-2024-001", customer)
order_b  = Order("ORD-2024-002", customer)
order_a.items.append(item1)

print("order_a items:", order_a.items)
print("order_b items:", order_b.items)


order_a items: [LineItem(product_id='P101', name='Wireless Headphones', unit_price=149.99, quantity=1)]
order_b items: []


order_b.items is empty even though order_a.items was mutated. Each instance gets its own fresh list from list(). Without default_factory, a single list object would be shared across all instances — a class-level bug that manifests as mysterious data appearing in unrelated orders.

default_factory=list is self-documenting: it says "each instance starts with an empty list." The alternative — items=None with a constructor guard — obscures the intent and moves logic out of the field declaration into procedural code. default_factory also works for any callable: dict, set, uuid.uuid4, or a lambda that returns a domain-specific default.

Note: You cannot mix a default_factory with a simple default. field(default=[], default_factory=list) raises ValueError. Choose one.


post_init for Validation and Derived Fields

The generated __init__ calls __post_init__ immediately after assigning all fields. This is the correct place for two things: validating that field values satisfy business rules, and computing derived fields that depend on other fields.

Replace the entire file with the following:

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class LineItem:
    product_id: str
    name:       str
    unit_price: float
    quantity:   int

    def subtotal(self) -> float:
        return round(self.unit_price * self.quantity, 2)

@dataclass
class Customer:
    customer_id: str
    name:        str
    email:       str
    tier:        str = "standard"

    def __post_init__(self):
        valid_tiers = {"standard", "premium", "enterprise"}
        if self.tier not in valid_tiers:
            raise ValueError(
                f"invalid tier {self.tier!r} — must be one of {valid_tiers}"
            )
        if "@" not in self.email:
            raise ValueError(f"invalid email address: {self.email!r}")

@dataclass
class Order:
    order_id:   str
    customer:   Customer
    items:      list[LineItem] = field(default_factory=list)
    tags:       list[str]      = field(default_factory=list)
    created_at: str            = field(default_factory=lambda: datetime.now().strftime("%Y-%m-%dT%H:%M:%S"))
    total:      float          = field(init=False)

    def __post_init__(self):
        self.total = round(sum(item.subtotal() for item in self.items), 2)

customer = Customer("C001", "Adriana Stoica", "adriana@example.ro", "premium")
item1    = LineItem("P101", "Wireless Headphones", 149.99, 1)
item2    = LineItem("P204", "USB-C Cable", 12.50, 3)
order    = Order("ORD-2024-001", customer, [item1, item2])

print(f"Order {order.order_id}: total=${order.total}, created={order.created_at}")

try:
    Customer("C002", "Bad Customer", "not-an-email", "vip")
except ValueError as e:
    print(f"Validation error: {e}")


Order ORD-2024-001: total=$187.49, created=2024-01-15T10:42:17
Validation error: invalid tier 'vip'  must be one of {'standard', 'premium', 'enterprise'}


total is declared with init=False, which excludes it from the generated __init__ signature — callers cannot accidentally set it to an arbitrary value. __post_init__ computes it from items after construction. created_at uses a lambda as its default_factory so each instance captures the actual construction time.

Validation in __post_init__ means an invalid Customer cannot exist — the constructor raises before the object is returned. There is no separate .validate() call to forget, no partially-initialised object floating around. Derived fields like total computed in __post_init__ are always consistent with the data they derive from, because they are computed at construction time and not recomputed incorrectly later.

Note: init=False fields must not have a simple default or default_factory in the field() call — __post_init__ is solely responsible for setting them. Setting a value for an init=False field from outside __post_init__ is possible but defeats the purpose.


frozen=True for Immutable Instances

Once an order is submitted to the fulfilment queue it must not be modified. Python's @dataclass(frozen=True) makes every attribute assignment after construction raise FrozenInstanceError. Frozen instances are also hashable by default, which means they can be placed in sets or used as dictionary keys.

Replace the Order class definition with the following:

@dataclass(frozen=True)
class Order:
    order_id:   str
    customer:   Customer
    items:      tuple[LineItem, ...]
    tags:       tuple[str, ...]      = field(default_factory=tuple)
    created_at: str                  = field(default_factory=lambda: datetime.now().strftime("%Y-%m-%dT%H:%M:%S"))
    total:      float                = field(init=False)

    def __post_init__(self):
        object.__setattr__(self, "total", round(sum(item.subtotal() for item in self.items), 2))

Update the order construction at the bottom:

customer  = Customer("C001", "Adriana Stoica", "adriana@example.ro", "premium")
item1     = LineItem("P101", "Wireless Headphones", 149.99, 1)
item2     = LineItem("P204", "USB-C Cable", 12.50, 3)
order     = Order("ORD-2024-001", customer, (item1, item2))

print(f"Order {order.order_id}: total=${order.total}")

try:
    order.order_id = "TAMPERED"
except Exception as e:
    print(f"Mutation blocked: {type(e).__name__}: {e}")

submitted = {order}
print(f"Order in set: {order in submitted}")


Order ORD-2024-001: total=$187.49
Mutation blocked: FrozenInstanceError: cannot assign to field 'order_id'
Frozen Order hashable and usable in sets: True


items is now a tuple instead of a list because a frozen dataclass cannot hold a mutable container without defeating immutability — the object reference is frozen, but a list's contents are not. __post_init__ must use object.__setattr__ to set total because the normal attribute assignment syntax is intercepted by the frozen machinery.

A frozen order submitted to the queue cannot be accidentally modified by a later stage of the pipeline — the runtime enforces what the business rule demands. Hash support means submitted orders can be tracked in a set for deduplication or used as dictionary keys without implementing __hash__ manually.

Note: frozen=True makes the class unhashable if you later set eq=False. The hash is generated only when both frozen=True and eq=True (the default). Combining frozen=True with mutable field types like list is not blocked at the class level — Python cannot inspect what you put in a field — so the responsibility for choosing immutable containers falls on the author.


@dataclass(order=True) for Comparison Operators

The order fulfilment dashboard needs to sort orders by total value. Without order=True, comparing two Order instances raises TypeError. With it, the decorator generates __lt__, __le__, __gt__, and __ge__ based on field order — comparing instances field by field in declaration order, exactly as tuple comparison works.

Replace the Order class definition with the following. To focus on sorting, switch back to a non-frozen version:

@dataclass(order=True)
class Order:
    total:      float = field(init=False)
    order_id:   str   = field(compare=False)
    customer:   Customer = field(compare=False)
    items:      list[LineItem] = field(default_factory=list, compare=False)
    tags:       list[str]      = field(default_factory=list, compare=False)
    created_at: str            = field(default_factory=lambda: datetime.now().strftime("%Y-%m-%dT%H:%M:%S"), compare=False)

    def __post_init__(self):
        self.total = round(sum(item.subtotal() for item in self.items), 2)

Update the bottom of the file to create multiple orders and sort them:

customer  = Customer("C001", "Adriana Stoica", "adriana@example.ro", "premium")
item1     = LineItem("P101", "Wireless Headphones", 149.99, 1)
item2     = LineItem("P204", "USB-C Cable", 12.50, 3)
item3     = LineItem("P308", "Mechanical Keyboard", 89.00, 2)

order_a = Order("ORD-2024-001", customer, [item1, item2])
order_b = Order("ORD-2024-002", customer, [item3])
order_c = Order("ORD-2024-003", customer, [item1, item3])

orders = [order_a, order_b, order_c]
for o in sorted(orders):
    print(f"  {o.order_id}: ${o.total}")


  ORD-2024-002: $178.0
  ORD-2024-001: $187.49
  ORD-2024-003: $238.99


total is declared first and is the only field with compare=True (the default). All other fields pass compare=False to exclude them from ordering. This is the key insight: order=True compares fields in declaration order, so field ordering and compare=False together give you precise control over what "less than" means.

sorted(orders) works without a key= function because the comparison semantics are encoded in the class itself. A key=lambda o: o.total is implicit in the class declaration, visible to every reader, and protected from the key function being wrong or missing at each call site.

Note: order=True requires eq=True (the default). If you set eq=False, adding order=True raises ValueError. You cannot have ordering without equality.


field() with metadata, repr=False, and compare=False

field() has three additional parameters used frequently in production models: repr=False to exclude sensitive fields from the string representation, compare=False to exclude fields from equality and ordering, and metadata to attach arbitrary read-only annotations that tools and frameworks can inspect.

Add a PaymentInfo class and update Order to include it:

from dataclasses import dataclass, field, fields

@dataclass
class PaymentInfo:
    card_last_four: str
    card_brand:     str
    billing_zip:    str = field(repr=False)   # excluded from repr — not logged
    auth_token:     str = field(repr=False, compare=False)  # excluded from repr and equality

@dataclass(order=True)
class Order:
    total:        float       = field(init=False)
    order_id:     str         = field(compare=False)
    customer:     Customer    = field(compare=False)
    items:        list[LineItem] = field(default_factory=list, compare=False)
    tags:         list[str]   = field(default_factory=list, compare=False)
    created_at:   str         = field(default_factory=lambda: datetime.now().strftime("%Y-%m-%dT%H:%M:%S"), compare=False)
    payment_info: PaymentInfo = field(default=None, compare=False,
                                      metadata={"pii": True, "audit_required": True})

    def __post_init__(self):
        self.total = round(sum(item.subtotal() for item in self.items), 2)

payment  = PaymentInfo("4242", "Visa", "400001", "tok_live_abc123")
customer = Customer("C001", "Adriana Stoica", "adriana@example.ro", "premium")
item1    = LineItem("P101", "Wireless Headphones", 149.99, 1)
order    = Order("ORD-2024-001", customer, [item1], payment_info=payment)

print("Payment repr:", payment)
print()

# Inspect metadata
for f in fields(order):
    if f.metadata.get("pii"):
        print(f"PII field detected: {f.name} (audit_required={f.metadata['audit_required']})")


Payment repr: PaymentInfo(card_last_four='4242', card_brand='Visa')
PII field detected: payment_info (audit_required=True)


billing_zip and auth_token are absent from PaymentInfo's repr — they will never appear in log files. auth_token is also excluded from comparison, so two PaymentInfo objects with the same card but different auth tokens are considered equal. The metadata dict on payment_info is accessible via fields(order) and can be used by serialisers, auditing middleware, or access-control layers to identify PII fields at runtime without hardcoding field names.

repr=False is enforced structurally — there is no runtime check to forget, no filter to maintain in a logging formatter. metadata is the correct alternative to class-level dictionaries or decorators for attaching field-level annotations: it travels with the field wherever the dataclass is introspected, and fields() exposes it uniformly.


dataclasses.asdict and dataclasses.astuple

The order must be serialised to JSON before it is published to the message queue. dataclasses.asdict recursively converts a dataclass instance — and any nested dataclasses — to a plain dictionary. dataclasses.astuple does the same to a nested tuple. Neither requires you to write a to_dict method.

Replace the bottom of the file with the following:

import json
from dataclasses import asdict, astuple

customer = Customer("C001", "Adriana Stoica", "adriana@example.ro", "premium")
item1    = LineItem("P101", "Wireless Headphones", 149.99, 1)
item2    = LineItem("P204", "USB-C Cable", 12.50, 3)
order    = Order("ORD-2024-001", customer, [item1, item2])

order_dict = asdict(order)
# Remove the auth token before publishing
order_dict.pop("payment_info", None)

print("Queue payload:")
print(json.dumps(order_dict, indent=2))

print()
item_tuple = astuple(item1)
print(f"LineItem as tuple: {item_tuple}")


Queue payload:
{
  "total": 187.49,
  "order_id": "ORD-2024-001",
  "customer": {
    "customer_id": "C001",
    "name": "Adriana Stoica",
    "email": "adriana@example.ro",
    "tier": "premium"
  },
  "items": [
    {
      "product_id": "P101",
      "name": "Wireless Headphones",
      "unit_price": 149.99,
      "quantity": 1
    },
    {
      "product_id": "P204",
      "name": "USB-C Cable",
      "unit_price": 12.5,
      "quantity": 3
    }
  ],
  "tags": [],
  "created_at": "2024-01-15T10:42:17"
}

LineItem as tuple: ('P101', 'Wireless Headphones', 149.99, 1)


asdict recursively descends into nested dataclasses, lists, tuples, and dicts, converting each dataclass it encounters. The result is a plain Python dictionary that json.dumps can handle directly.

A manually written to_dict method must be updated every time a field is added or renamed — and it usually is not. asdict derives its output from the same annotation list that __init__ and __repr__ use, so it stays correct automatically. astuple is useful when the output format is positional — for example, inserting rows into a database using a cursor that expects a tuple.

Note: asdict performs a deep copy. The returned dictionary does not share references with the original dataclass. If you need a shallow copy for performance reasons, use dataclasses.fields(instance) and build the dict manually.


Summary

Python's dataclasses module removes the mechanical boilerplate from data-carrying classes without sacrificing correctness or flexibility. The order management system built in this tutorial covered every major feature:

  • @dataclass generates __init__, __repr__, and __eq__ from class annotations, making the class body a single source of truth for all field-related behaviour
  • Fields with immutable defaults like tier: str = "standard" work directly; mutable defaults — lists, dicts — require field(default_factory=list) to avoid shared state across instances
  • __post_init__ is the correct place for validation and derived fields; it runs after __init__ sets all fields, and init=False fields must be set exclusively there using object.__setattr__ in frozen classes
  • frozen=True raises FrozenInstanceError on any post-construction assignment and enables hashing; fields should use immutable containers like tuple to fully honour the immutability guarantee
  • order=True generates comparison operators that compare fields in declaration order; compare=False on individual fields excludes them from both equality and ordering
  • field(repr=False) permanently excludes a field from the string representation — the correct structural solution for preventing sensitive values from appearing in logs
  • field(metadata={...}) attaches read-only annotations that fields() exposes at runtime, enabling serialisers and auditing layers to identify PII or required fields without hardcoded lists
  • dataclasses.asdict recursively serialises a dataclass and all nested dataclasses to a plain dictionary, staying correct automatically as fields are added or renamed

Чтобы получить доступ к облачной лаборатории, необходимо войти в систему.

Войти