Python
Python namedtuple Tutorial: Readable, Structured Records
Introduction
Plain tuples are fast and lightweight, but they age poorly. row[4] means something when the code is written. Three months and four refactors later, it means nothing — and changing the order of fields in a CSV or a database query silently shifts every index in every caller without a single compile-time warning. The bugs that follow are among the most tedious to trace because the code looks correct.
Named tuples solve this without adding overhead. They are tuples — same memory layout, same speed, same immutability — but with named fields that make every access self-documenting and every field reordering a compile-time error rather than a silent data corruption. Python ships two forms: the factory function in collections for quick definitions, and the class syntax in typing for definitions that need type annotations, default values, or methods. This tutorial builds a stock market data analyser that uses both, covering every practical feature along the way.
Background
A named tuple is a tuple subclass whose elements can be accessed by name as well as by position. candle.close and candle[4] are the same value — the name is just a stable alias for the index. Because named tuples are true tuple subclasses, they are immutable, hashable, and usable anywhere a tuple is accepted: as dictionary keys, in sets, in csv.reader loops, as the return type of a function.
Two forms exist and are interchangeable for most purposes:
collections.namedtuple— a factory function. Concise, no imports beyondcollections, no type annotations.typing.NamedTuple— a class-based syntax introduced in Python 3.6. Supports type hints, default values, and methods on the class body. Produces the same underlying type.
Practical Scenario
A trading analytics service ingests daily OHLCV candles — Open, High, Low, Close, Volume records — for equities. The service parses CSV exports from multiple data vendors, normalises them into a common representation, applies corporate action adjustments (stock splits, dividends), computes statistics across date ranges, and returns summary reports to portfolio managers.
The records are read-only by nature: a historical price cannot change once recorded. They need to be sortable, usable as dictionary keys for deduplication, passable to statistical functions, and serialisable to JSON for downstream consumers. A plain tuple handles none of these requirements legibly. A full class adds overhead and mutability that the domain does not need.
The Problem
The initial implementation parses each CSV row into a plain tuple and accesses fields by index.
Create a new file:
touch market_analyser.py
Run it using:
python3 market_analyser.py
import csv
with open("candles.csv", "w") as f:
f.write(
"date,open,high,low,close,volume\n"
"2024-01-15,184.50,187.20,183.10,186.40,52341000\n"
"2024-01-16,186.40,188.90,185.60,187.80,48720000\n"
"2024-01-17,187.80,191.50,186.30,190.20,61230000\n"
"2024-01-18,190.20,192.10,188.40,189.70,44580000\n"
"2024-01-19,189.70,193.80,189.20,192.50,58910000\n"
)
def load_candles(path):
candles = []
with open(path) as f:
reader = csv.reader(f)
next(reader)
for row in reader:
candles.append((
row[0],
float(row[1]),
float(row[2]),
float(row[3]),
float(row[4]),
int(row[5]),
))
return candles
candles = load_candles("candles.csv")
best = max(candles, key=lambda r: r[2] - r[3])
print(f"Widest range : {best[0]} range={best[2] - best[3]:.2f}")
for c in candles:
if c[4] > c[1]:
print(f" {c[0]}: bullish close={c[4]} open={c[1]}")
Widest range : 2024-01-17 range=5.20
2024-01-15: bullish close=186.4 open=184.5
2024-01-16: bullish close=187.8 open=186.4
2024-01-17: bullish close=190.2 open=187.8
2024-01-19: bullish close=192.5 open=189.7
The output is correct, but r[2] - r[3] and c[4] > c[1] tell a reader nothing without looking up which index maps to which field. If the data vendor changes the column order — or if a new field is inserted between open and high — every index silently refers to a different value and the results are wrong with no error raised.
collections.namedtuple: Named Fields on a Plain Tuple
Replace the load_candles function and the analysis block with the following. The CSV creation block at the top stays unchanged:
from collections import namedtuple
Candle = namedtuple("Candle", ["date", "open", "high", "low", "close", "volume"])
def load_candles(path):
candles = []
with open(path) as f:
reader = csv.reader(f)
next(reader)
for row in reader:
candles.append(Candle(
date=row[0],
open=float(row[1]),
high=float(row[2]),
low=float(row[3]),
close=float(row[4]),
volume=int(row[5]),
))
return candles
candles = load_candles("candles.csv")
best = max(candles, key=lambda c: c.high - c.low)
print(f"Widest range : {best.date} range={best.high - best.low:.2f}")
for c in candles:
if c.close > c.open:
print(f" {c.date}: bullish close={c.close} open={c.open}")
Widest range : 2024-01-17 range=5.20
2024-01-15: bullish close=186.4 open=184.5
2024-01-16: bullish close=187.8 open=186.4
2024-01-17: bullish close=190.2 open=187.8
2024-01-19: bullish close=192.5 open=189.7
The output is identical. The code now reads as the business logic it implements: c.close > c.open is a bullish candle definition; c.high - c.low is the intraday range. Index access still works — c[4] and c.close are the same value — but there is no reason to use it.
Why this is better
Field names survive refactoring. If the source CSV gains a vwap column after volume, you update the namedtuple definition and load_candles in one place. Any code using c.close is unaffected. Any code using c[4] would silently return the wrong value. Named access makes the wrong refactoring a traceable bug instead of a silent one.
Immutability and _replace()
Named tuples are immutable. Attempting to assign a field raises an AttributeError at runtime:
try:
candles[0].close = 200.0
except AttributeError as e:
print(f"Cannot modify: {e}")
Cannot modify: can't set attribute
When a record genuinely needs to change — adjusting prices for a 2-for-1 stock split, for example — _replace() produces a new named tuple with selected fields overwritten. The original is unchanged. Add the following below the analysis block:
def adjust_split(candle, ratio):
return candle._replace(
open=round(candle.open / ratio, 2),
high=round(candle.high / ratio, 2),
low=round(candle.low / ratio, 2),
close=round(candle.close / ratio, 2),
volume=candle.volume * int(ratio),
)
original = candles[0]
adjusted = adjust_split(candles[0], ratio=2)
print(f"Original : {original.date} close={original.close} vol={original.volume}")
print(f"Adjusted : {adjusted.date} close={adjusted.close} vol={adjusted.volume}")
Cannot modify: can't set attribute
Original : 2024-01-15 close=186.4 vol=52341000
Adjusted : 2024-01-15 close=93.2 vol=104682000
Why this is better
Immutability means a Candle passed to any function is guaranteed to come back unchanged. There are no defensive copies, no copy.deepcopy calls, no wonder about whether a callee mutated the record. _replace() is explicit: it creates a new record rather than silently altering an existing one, which is the semantics the domain requires — a split-adjusted price is a derived record, not a correction to history.
Built-in Helpers: _fields, _asdict(), _make()
Named tuples expose three class-level helpers that cover the most common structural operations. Add the following block to the file:
# _fields: the field names as a tuple — useful for generic code
print("Fields :", Candle._fields)
# _asdict(): convert one record to a dict — useful for JSON serialisation
record = candles[0]._asdict()
print("As dict:", dict(record))
# _make(): construct from any existing sequence — useful when parsing raw rows
raw = ["2024-01-22", 192.50, 195.30, 191.80, 194.10, 47230000]
new_candle = Candle._make(raw)
print(f"From sequence: {new_candle.date} close={new_candle.close}")
Fields : ('date', 'open', 'high', 'low', 'close', 'volume')
As dict: {'date': '2024-01-15', 'open': 184.5, 'high': 187.2, 'low': 183.1, 'close': 186.4, 'volume': 52341000}
From sequence: 2024-01-22 close=194.1
Why this is better
_asdict() makes round-tripping to JSON or a dict-based API a one-liner without writing a serialisation method. _fields allows generic code — a CSV writer, a logging formatter, a schema validator — to inspect field names at runtime without hardcoding them. _make() constructs a named tuple from any iterable, which is cleaner than unpacking a sequence into positional arguments when the sequence is already in field order.
Note: _make() expects exactly as many values as there are fields, with no defaults. If the record has optional fields with defaults, use keyword arguments in the constructor instead.
typing.NamedTuple: Type Hints, Defaults, and Methods
The typing.NamedTuple class syntax produces the same underlying type but allows type annotations, default values, and method definitions inside the class body. Replace the Candle definition at the top of the file:
from typing import NamedTuple
class Candle(NamedTuple):
date: str
open: float
high: float
low: float
close: float
volume: int
adjusted: bool = False
def price_range(self) -> float:
return round(self.high - self.low, 2)
def is_bullish(self) -> bool:
return self.close > self.open
Update the analysis block to use the new methods:
candles = load_candles("candles.csv")
for c in candles:
trend = "bull" if c.is_bullish() else "bear"
print(f" {c.date} range={c.price_range():.2f} {trend} adjusted={c.adjusted}")
2024-01-15 range=4.10 bull adjusted=False
2024-01-16 range=3.30 bull adjusted=False
2024-01-17 range=5.20 bull adjusted=False
2024-01-18 range=3.70 bear adjusted=False
2024-01-19 range=4.60 bull adjusted=False
Why this is better
Methods that belong to the record live on the record, not scattered in module-level functions that happen to accept a Candle. price_range() and is_bullish() are computations that depend only on a candle's own fields — they belong here. The adjusted default means existing load_candles code that does not pass the field continues to work, and the adjust_split function can set it explicitly: candle._replace(adjusted=True).
Note: Fields without defaults must come before fields with defaults, exactly as with function parameters. class Candle(NamedTuple) with adjusted: bool = False after all required fields is valid; placing it before close: float raises a TypeError.
Named Tuples as Function Return Values
A function that computes summary statistics over a date range has two bad options without named tuples: return a plain tuple and force the caller to remember index order, or return a dict and lose the structure guarantees. A NamedTuple return type gives both names and structure. Add the following class and function to the file:
class PriceStats(NamedTuple):
highest_close: float
lowest_close: float
average_close: float
best_day: str
def analyse(candles: list) -> PriceStats:
closes = [c.close for c in candles]
best = max(candles, key=lambda c: c.close)
return PriceStats(
highest_close=max(closes),
lowest_close=min(closes),
average_close=round(sum(closes) / len(closes), 2),
best_day=best.date,
)
stats = analyse(candles)
print(f"Highest close : {stats.highest_close}")
print(f"Lowest close : {stats.lowest_close}")
print(f"Average close : {stats.average_close}")
print(f"Best day : {stats.best_day}")
Highest close : 192.5
Lowest close : 186.4
Average close : 189.32
Best day : 2024-01-19
Why this is better
The caller receives a PriceStats object whose field names are part of the function's contract. stats.best_day is unambiguous; stats[3] requires looking up the definition. The return type is also unpackable as a plain tuple — high, low, avg, day = analyse(candles) works — so it is backwards-compatible with any code that was already unpacking a bare tuple return. PriceStats can be used as a dict key, stored in a set, or compared with == without any extra implementation.
Named Tuples vs Dataclasses
Both NamedTuple and dataclass produce classes with named fields. The choice between them is the choice between two different contracts.
Use a named tuple when:
- The record is immutable by nature — a historical price, a parsed log line, a GPS coordinate, a query result row. The type should enforce this, not rely on convention.
- The record needs to be hashable — stored in a
set, used as adictkey, deduplicated across a dataset. - The record benefits from tuple compatibility — positional unpacking,
csv.writerrows, comparison by value,len(). - There is no mutable state and no need for
__post_init__validation.
Use a dataclass when:
- Fields need to be mutated after construction — a running aggregate, a config object, a request state.
- Construction requires validation or derived fields —
__post_init__can enforce invariants and compute cached properties. - The object carries behaviour beyond simple field computations — methods that modify state, lifecycle hooks, inheritance.
The practical test: if two instances with identical field values are the same record, use a named tuple. If two instances with identical field values can represent different objects with different identity, use a dataclass.
from dataclasses import dataclass
# Named tuple: two records with the same values ARE the same record
c1 = Candle("2024-01-15", 184.5, 187.2, 183.1, 186.4, 52341000)
c2 = Candle("2024-01-15", 184.5, 187.2, 183.1, 186.4, 52341000)
print(f"Named tuple equal : {c1 == c2}") # True — same values, same record
print(f"Named tuple hash : {hash(c1) == hash(c2)}") # True — usable as dict key
# Dataclass: two objects with same values are equal but not identical by default
@dataclass
class CandleRecord:
date: str
close: float
r1 = CandleRecord("2024-01-15", 186.4)
r2 = CandleRecord("2024-01-15", 186.4)
print(f"Dataclass equal : {r1 == r2}") # True with @dataclass eq=True (default)
try:
print(hash(r1))
except TypeError as e:
print(f"Dataclass hash : {e}") # unhashable — mutable by default
Named tuple equal : True
Named tuple hash : True
Dataclass equal : True
Dataclass hash : unhashable type: 'CandleRecord'
A dataclass is mutable by default, so Python does not generate __hash__ for it — adding a mutable object to a set would break set invariants. A named tuple is always hashable because it is always immutable.
Summary
Named tuples replace anonymous index access with stable, self-documenting field names at zero runtime cost. In this tutorial we built a stock market analyser that demonstrated the full feature set:
collections.namedtupleproduces a tuple subclass with named fields in one line; all tuple operations — indexing, unpacking, comparison, hashing — continue to work unchanged_replace()creates a new record with selected fields overwritten, which is the correct model for immutable data that needs adjustment; the original is never mutated_fieldsexposes the field names as a tuple for generic code that needs to introspect structure at runtime_asdict()converts a record to anOrderedDict-derived mapping for serialisation or dict-based APIs in one call_make()constructs a named tuple from any existing iterable and is most useful when a sequence is already in field order with no type conversion neededtyping.NamedTuplewith class syntax adds type annotations, per-field defaults, and methods; fields with defaults must follow all fields without defaults- Named tuples as function return types eliminate positional coupling between a function and its callers while preserving tuple unpacking compatibility
- Named tuples are the right choice when records are immutable by nature, need to be hashable, or need tuple compatibility; dataclasses are the right choice when fields are mutable, construction requires validation, or the object carries stateful behaviour