Core Architecture Cost Mapping Systems

Designing Recipe BOM Databases

Multi-unit restaurant operators and culinary managers face a persistent data-integrity problem: translating chef-authored recipe cards into machine-readable cost structures that survive across dozens of locations. When recipes live as flat spreadsheets, theoretical food cost fractures the moment a vendor changes, a portion drifts, or a sub-recipe is reused in two dishes — the result is margin leakage that nobody can trace back to a single ingredient. This guide, part of the Core Architecture & Cost Mapping Systems framework, isolates one sub-problem: how to model a recipe Bill of Materials (BOM) as a version-controlled, recursively traversable graph so that every finished menu item resolves to a deterministic, auditable cost. The concrete implementation of this schema in a specific engine is covered in the companion walkthrough on structuring recipe BOMs in PostgreSQL; here we define the data contract, the design decisions behind it, and the three-phase build that turns raw ingredient prices into a queryable cost ledger.

Concept Definition and Data Contract

A culinary BOM is not a parts list — it is a directed acyclic graph (DAG). Finished menu items sit at the roots, sub-assemblies (house-made sauces, prepped proteins, batched doughs) form intermediate nodes, and raw purchase SKUs are the leaves. An edge carries a quantity and a unit of measure; a node carries an identity and, for leaves, a purchase price. The costing engine walks this graph from leaves to roots, multiplying quantities by unit costs and dividing by yield.

The input contract the database must accept is deliberately narrow:

Node inputs — every recipe and ingredient has an immutable identifier (recipe_id, ingredient_id, both UUIDs) that is decoupled from vendor naming. Vendor SKUs and display names change; the graph key must not.
Edge inputs — each parent→child relationship supplies a raw_quantity (NUMERIC, never a float) and a raw_uom string that must resolve against a canonical unit table. Edges also carry effective date ranges (valid_from, valid_to) so the same recipe can hold multiple historical shapes.
Price inputs — leaf costs arrive separately from the graph, keyed by (ingredient_id, location_id, effective_date). Costs are never stored on the BOM edge itself; storing them inline destroys the audit trail and couples recipe structure to procurement volatility.
Yield inputs — a yield_factor per node in the range (0, 1] supplied by the yield factor calculation frameworks that translate raw purchase weights into usable edible portions.

The output contract is a single materialized row per (location_id, recipe_id): a theoretical_cost as NUMERIC(12,4), a sync_timestamp, and a bom_version reference. That row is what downstream consumers read — the sales-side reconciliation described in the POS taxonomy mapping layer joins against exactly this table to compute theoretical-versus-actual variance. Any consumer should be able to read a cost without ever traversing the graph itself.

The BOM is a directed acyclic graph: quantity/UOM live on the edges, purchase price lives on the leaves, and cost is resolved leaf-first up to the root. Olive oil is a shared leaf feeding two sub-recipes — the reuse case a flat spreadsheet cannot model without double-entry.

Schema constraints that hold the contract

Three constraints are load-bearing and belong in the database, not the application:

Acyclicity. A sub-recipe must never (even transitively) contain itself. Without a guard, the recursive walk loops forever. Enforce a depth ceiling in the recursive query and validate on insert.
UOM canonicalization. Every raw_uom maps to a base unit (grams for mass, millilitres for volume, each for count). Volumetric-to-mass conversions require a per-ingredient density, so the ingredient master carries a density_g_per_ml. This is the same unit-normalization discipline enforced during CSV bulk import automation at ingestion time.
Temporal integrity. An edge with valid_to IS NULL is current; a closed range is historical. The graph is never destructively updated — a chef’s change closes the old edge and opens a new one, preserving period-over-period comparability.

Architecture Decision Rationale

The central decision is where recursion lives: in the database via recursive common table expressions (CTEs), or in the application via a graph library. This BOM design uses both, deliberately, at different stages — and it is worth being explicit about why.

Recursive CTE in PostgreSQL is the right tool for read-time expansion: “show me the full ingredient explosion of this dish right now.” It runs where the data lives, avoids shuttling the whole edge table into application memory, and composes naturally with the temporal valid_to filter. Its weakness is cost arithmetic: doing yield-adjusted, location-multiplied NUMERIC math inside a recursive CTE is awkward to read and hard to unit-test.

Application-layer DAG traversal in Python is the right tool for the batch cost roll-up: load the current edge set once, topologically sort it, and resolve costs leaf-first with ordinary, testable Decimal code. graphlib.TopologicalSorter from the standard library gives a deterministic ordering guarantee that child nodes resolve before their parents. This is where the nightly sync belongs, and it is why the roll-up runs as a batch job rather than an on-read computation.

We reject two alternatives outright:

Storing computed costs on the BOM edge. It looks faster but silently corrupts history: a price change either rewrites the past or leaves stale numbers. Costs are always derived, never stored on structure.
On-read cost computation for dashboards. Executive dashboards read thousands of recipes; recomputing the graph per request is both slow and non-deterministic under concurrent price edits. A materialized cost table, refreshed by an idempotent batch, gives every reader the same number at the same version.

The roll-up itself is a high-volume job across every location, so it is dispatched through the async batch processing workflow rather than run inline with a user request.

Phase 1 Implementation — Schema and Graph Setup

The schema separates immutable structure (edges), mutable pricing, and derived output. Note the strict types: money and quantities are NUMERIC, identifiers are UUIDs, and the yield constraint lives in the table definition.

-- Structural edges: parent -> child, temporally versioned, cost-free.
CREATE TABLE recipe_bom_edges (
    edge_id       BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    parent_id     UUID NOT NULL,
    child_id      UUID NOT NULL,
    raw_quantity  NUMERIC(12,5) NOT NULL CHECK (raw_quantity > 0),
    raw_uom       VARCHAR(12) NOT NULL,
    yield_factor  NUMERIC(5,4) NOT NULL DEFAULT 1.0000
                    CHECK (yield_factor > 0 AND yield_factor <= 1.0000),
    valid_from    DATE NOT NULL DEFAULT CURRENT_DATE,
    valid_to      DATE,                       -- NULL = currently active
    UNIQUE (parent_id, child_id, valid_from),
    CHECK (parent_id <> child_id)             -- reject self-loops at the edge
);

-- Location-scoped leaf prices, decoupled from structure.
CREATE TABLE purchase_prices (
    ingredient_id UUID NOT NULL,
    location_id   UUID NOT NULL,
    unit_cost     NUMERIC(12,4) NOT NULL CHECK (unit_cost >= 0),
    price_uom     VARCHAR(12) NOT NULL,
    effective_date DATE NOT NULL DEFAULT CURRENT_DATE,
    PRIMARY KEY (ingredient_id, location_id, effective_date)
);

-- Derived output the whole business reads from.
CREATE TABLE mv_location_recipe_costs (
    location_id      UUID NOT NULL,
    recipe_id        UUID NOT NULL,
    theoretical_cost NUMERIC(12,4) NOT NULL,
    bom_version      BIGINT NOT NULL,
    sync_timestamp   TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (location_id, recipe_id)
);

Read-time explosion of a single dish uses a recursive CTE with a depth guard to backstop the acyclicity constraint:

WITH RECURSIVE bom_tree AS (
    -- Anchor: the target menu item (root of the explosion)
    SELECT parent_id, child_id, raw_quantity, raw_uom, yield_factor, 1 AS depth
    FROM recipe_bom_edges
    WHERE parent_id = :target_recipe_id
      AND (valid_to IS NULL OR valid_to > CURRENT_DATE)

    UNION ALL

    -- Recursive step: expand each child downward toward the leaves
    SELECT e.parent_id, e.child_id, e.raw_quantity, e.raw_uom, e.yield_factor,
           bt.depth + 1
    FROM recipe_bom_edges e
    JOIN bom_tree bt ON e.parent_id = bt.child_id
    WHERE (e.valid_to IS NULL OR e.valid_to > CURRENT_DATE)
      AND bt.depth < 20          -- hard ceiling: catch accidental cycles
)
SELECT parent_id, child_id, raw_quantity, raw_uom, yield_factor, depth
FROM bom_tree;

To load the graph into Python for the batch roll-up, model the edge as a strictly typed structure rather than a loose tuple:

from __future__ import annotations
from dataclasses import dataclass
from decimal import Decimal
from uuid import UUID


@dataclass(frozen=True, slots=True)
class BomEdge:
    parent_id: UUID
    child_id: UUID
    quantity: Decimal           # canonical base-unit quantity
    yield_factor: Decimal       # (0, 1]


@dataclass(frozen=True, slots=True)
class LeafPrice:
    ingredient_id: UUID
    location_id: UUID
    unit_cost: Decimal          # cost per canonical base unit

Phase 2 Implementation — Validation and Error Routing

A roll-up is only as trustworthy as the graph it consumes. Before any arithmetic runs, the loaded edge set is validated, and anything that fails is routed to a quarantine table instead of silently producing a wrong cost. The three failure classes that matter are cycles, missing leaf prices, and unresolvable units.

import logging
from graphlib import TopologicalSorter, CycleError

logger = logging.getLogger("bom.validate")


@dataclass(frozen=True, slots=True)
class ValidationResult:
    graph_ok: bool
    priced_leaves: set[UUID]
    missing_prices: set[UUID]


def validate_graph(
    edges: list[BomEdge],
    price_map: dict[UUID, Decimal],
) -> ValidationResult:
    """Reject cyclic graphs and flag leaves with no resolvable price."""
    adjacency: dict[UUID, set[UUID]] = {}
    all_children: set[UUID] = set()
    for e in edges:
        adjacency.setdefault(e.parent_id, set()).add(e.child_id)
        adjacency.setdefault(e.child_id, set())   # ensure leaves appear
        all_children.add(e.child_id)

    # A cycle makes topological ordering impossible — fail loudly, do not sync.
    try:
        TopologicalSorter(adjacency).prepare()
    except CycleError as exc:
        logger.error("bom_cycle_detected", extra={"nodes": str(exc.args[1])})
        return ValidationResult(False, set(), set())

    leaves = {n for n, kids in adjacency.items() if not kids}
    missing = {leaf for leaf in leaves if leaf not in price_map}
    if missing:
        logger.warning(
            "bom_unpriced_leaves",
            extra={"count": len(missing), "sample": [str(x) for x in list(missing)[:5]]},
        )
    return ValidationResult(True, leaves - missing, missing)

The rule is deliberate: a cycle blocks the entire sync (a corrupt structure must never produce numbers), while missing prices quarantine only the affected roots. Structured logs carry correlation-friendly keys (count, sample, node ids) so an operator can trace a quarantined recipe without grepping stack traces. Unpriced leaves are written to a bom_quarantine table with a reason code, and the roots that depend on them are excluded from the materialized refresh rather than costed at zero — a zero-cost dish is the single most dangerous silent corruption in food-cost analytics because it inflates apparent margin.

Phase 3 Implementation — Roll-Up and Materialized Handoff

With a validated, acyclic graph, the roll-up resolves costs leaf-first and writes one row per recipe. All monetary math uses Decimal to avoid IEEE-754 drift, and the write is a single atomic transaction so readers never see a half-refreshed table.

from decimal import Decimal, ROUND_HALF_UP
from sqlalchemy import text
from sqlalchemy.orm import Session

CENTS = Decimal("0.0001")


def roll_up_costs(
    edges: list[BomEdge],
    price_map: dict[UUID, Decimal],
    location_multiplier: Decimal = Decimal("1"),
) -> dict[UUID, Decimal]:
    """Bottom-up theoretical cost per node, yield-adjusted, location-scaled."""
    adjacency: dict[UUID, set[UUID]] = {}
    edge_index: dict[tuple[UUID, UUID], BomEdge] = {}
    for e in edges:
        adjacency.setdefault(e.parent_id, set()).add(e.child_id)
        adjacency.setdefault(e.child_id, set())
        edge_index[(e.parent_id, e.child_id)] = e

    ts = TopologicalSorter(adjacency)
    ts.prepare()
    cost: dict[UUID, Decimal] = {}

    while ts.is_active():
        for node in ts.get_ready():
            children = adjacency[node]
            if not children:
                # Leaf: purchase price scaled by the location's price index.
                cost[node] = price_map.get(node, Decimal("0")) * location_multiplier
            else:
                # Sub-recipe: sum child costs weighted by qty, divided by yield.
                subtotal = Decimal("0")
                for child in children:
                    e = edge_index[(node, child)]
                    subtotal += cost[child] * e.quantity / e.yield_factor
                cost[node] = subtotal
            ts.done(node)

    return {n: c.quantize(CENTS, rounding=ROUND_HALF_UP) for n, c in cost.items()}


def refresh_materialized_costs(
    session: Session, location_id: UUID, bom_version: int, costs: dict[UUID, Decimal]
) -> None:
    """Idempotent upsert of the location's cost ledger in one transaction."""
    upsert = text("""
        INSERT INTO mv_location_recipe_costs
            (location_id, recipe_id, theoretical_cost, bom_version, sync_timestamp)
        VALUES (:loc, :recipe, :cost, :ver, NOW())
        ON CONFLICT (location_id, recipe_id) DO UPDATE
            SET theoretical_cost = EXCLUDED.theoretical_cost,
                bom_version      = EXCLUDED.bom_version,
                sync_timestamp   = EXCLUDED.sync_timestamp
    """)
    with session.begin():
        for recipe_id, amount in costs.items():
            session.execute(upsert, {
                "loc": str(location_id), "recipe": str(recipe_id),
                "cost": amount, "ver": bom_version,
            })

The ON CONFLICT ... DO UPDATE upsert keyed on (location_id, recipe_id) is what makes the refresh re-runnable: a retried job produces the identical table state, never duplicate rows. Once written, this ledger is the join target for the sales-side reconciliation and for the variance mapping methodologies that compare theoretical usage against actual inventory withdrawals.

The batch roll-up as a gated pipeline. A cycle is fatal and stops the whole run; a missing leaf price quarantines only the roots that depend on it. Everything that survives validation is sorted, costed in Decimal, and written to the materialized ledger in one atomic upsert that downstream variance analysis reads.

Production Hardening

Moving from a working roll-up to a dependable nightly job comes down to a handful of disciplines:

Idempotency keys. Stamp each run with a bom_version (a monotonic snapshot id of the edge set). Writing the version alongside the cost lets a reader detect stale numbers and lets a retried job overwrite exactly the rows it owns.
Memory bounds. For a portfolio with millions of edges, do not hold every location’s graph simultaneously. The structure is shared across locations; only price_map and location_multiplier differ. Load the edge graph once, then iterate locations, swapping only the price inputs — this is the same isolation the multi-location cost center architecture relies on to keep regional procurement variance from fracturing the master recipe tree.
Deduplication. Guard the edge table with the UNIQUE (parent_id, child_id, valid_from) constraint so a re-imported recipe cannot create a duplicate parallel edge that double-counts an ingredient.
Unit normalization hooks. Convert raw_uom to the canonical base unit at ingestion, not at roll-up. By the time an edge reaches the graph, its quantity is already in grams or millilitres; the roll-up never touches unit conversion, which keeps its arithmetic auditable. Portion-facing conversions belong upstream, alongside portion size standardization.
Variance gating. Before committing the refresh, diff the new costs against the previous version and hold any recipe whose cost moved more than a configured threshold (for example, > 5%) for culinary review. This stops a bad price feed from silently propagating a margin shock to executive dashboards.
RBAC boundaries. Culinary managers get read-only access to versioned BOMs; procurement holds write access only to purchase_prices and yield_factors; the roll-up runs as a service account with EXECUTE on the sync function alone. Financial calculations stay isolated from manual UI edits.

Failure Modes and Troubleshooting

Symptom	Likely cause	Detection / fix
Roll-up never terminates	Cyclic sub-recipe (A contains B contains A)	`TopologicalSorter.prepare()` raises `CycleError`; the depth guard in the recursive CTE also caps traversal. Reject on insert.
A dish shows near-zero cost	Missing leaf price costed as `0`	Quarantine unpriced leaves; never default to zero. Alert on any root touching a quarantined leaf.
Costs drift by fractions of a cent	Float arithmetic instead of `Decimal`/`NUMERIC`	Use `Decimal` end-to-end and `quantize` once at the boundary; store as `NUMERIC(12,4)`.
Theoretical margin overstated everywhere	`yield_factor` defaulting to `1.0` when trim loss is real	Source yields from the yield frameworks; validate `0 < yield_factor <= 1`.
Same ingredient counted twice in one dish	Duplicate parallel edges	Enforce the `UNIQUE (parent_id, child_id, valid_from)` constraint and dedupe on import.
Historical margin report changes retroactively	Price or structure overwritten in place	Never mutate edges or store cost on structure; close and re-open temporally versioned rows.

The through-line of every failure above is the same: cost must be derived, versioned, and validated before it is written — never stored on structure, never defaulted to zero, never computed in floating point. Get those three right and the BOM becomes a dependable foundation for automated menu engineering, feeding clean numbers into variance analysis and waste routing alike.