Core Architecture Cost Mapping Systems

Structure Recipe BOMs in PostgreSQL

This page walks a food-tech developer or data engineer through the exact PostgreSQL tables, recursive queries, and Python roll-up code needed to turn nested chef recipes into a deterministic, audit-ready cost model. It is the hands-on implementation companion to the broader schema design principles covered in Designing Recipe BOM Databases; read that first for the architectural rationale, then follow the numbered steps here to stand up a working Bill of Materials (BOM) schema you can query tonight.

Prerequisites and Data Contract

Before running any step below, confirm the following environment and structural assumptions. Every step is written against them and will silently misbehave if they drift.

PostgreSQL 13+ — required for gen_random_uuid() in the core catalog and for stable WITH RECURSIVE cycle handling.
Python 3.11+, pandas 2.x, SQLAlchemy 2.x, and psycopg2-binary 2.9+.
pg_cron (optional) for scheduled materialized-view refreshes.
All monetary and quantity arithmetic uses PostgreSQL NUMERIC or Python’s decimal module — never binary floats.

The data contract is four tables with a strict separation between immutable master data (recipes, ingredients), the hierarchy edges (recipe_bom_lines), and transactional pricing (pricing). Costs are never stored on the BOM edge; they are resolved on demand so the hierarchy stays a pure structural record. This same decoupling is what lets one BOM resolve against many regional contracts under a multi-location cost center architecture.

CREATE TABLE recipes (
    recipe_id    UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    recipe_name  VARCHAR(150) NOT NULL,
    portion_size DECIMAL(10,3) NOT NULL,
    portion_uom  VARCHAR(10) NOT NULL,
    is_active    BOOLEAN DEFAULT TRUE,
    created_at   TIMESTAMPTZ DEFAULT NOW(),
    updated_at   TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE ingredients (
    ingredient_id   UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    sku             VARCHAR(50) UNIQUE,
    description     VARCHAR(200),
    base_uom        VARCHAR(10) NOT NULL,
    density_g_per_ml DECIMAL(6,3)  -- required for volume-to-weight conversion
);

CREATE TABLE pricing (
    ingredient_id UUID REFERENCES ingredients(ingredient_id),
    location_id   VARCHAR(20) NOT NULL,
    cost_per_unit NUMERIC(12,4) NOT NULL,   -- price per `uom`
    uom           VARCHAR(10) NOT NULL,
    valid_from    DATE NOT NULL,
    valid_to      DATE,
    PRIMARY KEY (ingredient_id, location_id, valid_from)
);

Step-by-Step Implementation

Step 1 — Model the hierarchy edge with guardrails

The recipe_bom_lines table maps each parent recipe to a child that is either a raw INGREDIENT or a nested SUBRECIPE. Two CHECK constraints do the heavy lifting: child_type is restricted to the known enumeration, and yield_factor is forced into the (0.001, 1.0] range so a mistyped zero can never reach a division later.

CREATE TABLE recipe_bom_lines (
    bom_line_id      BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    parent_recipe_id UUID REFERENCES recipes(recipe_id) ON DELETE CASCADE,
    child_id         UUID NOT NULL,
    child_type       VARCHAR(10) CHECK (child_type IN ('INGREDIENT', 'SUBRECIPE')),
    raw_quantity     DECIMAL(12,5) NOT NULL,
    raw_uom          VARCHAR(10) NOT NULL,
    yield_factor     DECIMAL(5,4) DEFAULT 1.0000,
    CONSTRAINT valid_yield CHECK (yield_factor > 0.001 AND yield_factor <= 1.0000),
    CONSTRAINT valid_qty   CHECK (raw_quantity >= 0),
    UNIQUE (parent_recipe_id, child_id, child_type)
);

The yield_factor is stored as a decimal fraction (0.85 = 85% usable product). Deriving that fraction — trim loss, evaporation, cooking shrink — belongs to the yield factor calculation frameworks that own the produce-by-produce logic; the BOM only enforces that whatever lands here is a sane fraction.

Step 2 — Insert master data and wire the edges

Populate recipes and ingredients first, then attach BOM lines. A sub-recipe (a house sauce, a batched dough) is simply a recipes row referenced by another recipe’s line with child_type = 'SUBRECIPE'.

-- One finished dish that consumes a raw ingredient and a house sub-recipe
INSERT INTO recipe_bom_lines
    (parent_recipe_id, child_id, child_type, raw_quantity, raw_uom, yield_factor)
VALUES
    ('11111111-1111-1111-1111-111111111111',  -- Roasted Potato Plate
     '22222222-2222-2222-2222-222222222222',  -- raw potatoes (ingredient)
     'INGREDIENT', 1000, 'g', 0.8200),
    ('11111111-1111-1111-1111-111111111111',
     '33333333-3333-3333-3333-333333333333',  -- Garlic Aioli (sub-recipe)
     'SUBRECIPE', 60, 'g', 1.0000);

Step 3 — Resolve nested sub-recipes with a recursive CTE

Culinary prep is a directed acyclic graph. A recursive CTE expands every SUBRECIPE down to leaf INGREDIENT rows in one deterministic pass — no application-layer recursion. The path array is the cycle guard: if a child id already appears in the path, the branch is dropped instead of looping forever.

Each SUBRECIPE edge expands until only INGREDIENT leaves remain; yield_factor compounds along the path, and any edge that points back to an ancestor is dropped by the path cycle guard.

WITH RECURSIVE bom_tree AS (
    -- Anchor: the target recipe's direct lines
    SELECT parent_recipe_id, child_id, child_type,
           raw_quantity, raw_uom, yield_factor,
           1 AS depth, ARRAY[child_id::text] AS path
    FROM recipe_bom_lines
    WHERE parent_recipe_id = 'TARGET_RECIPE_UUID'::UUID

    UNION ALL

    -- Recursive step: expand sub-recipes, multiplying yield at each level
    SELECT b.parent_recipe_id, b.child_id, b.child_type,
           b.raw_quantity * bt.yield_factor,
           b.raw_uom, b.yield_factor,
           bt.depth + 1, bt.path || b.child_id::text
    FROM recipe_bom_lines b
    JOIN bom_tree bt ON b.parent_recipe_id = bt.child_id
    WHERE bt.child_type = 'SUBRECIPE'
      AND b.child_id::text <> ALL(bt.path)   -- cycle guard
)
SELECT child_id AS ingredient_id,
       raw_uom,
       SUM(raw_quantity) AS total_raw_qty,
       MAX(depth) AS max_nesting_level
FROM bom_tree
WHERE child_type = 'INGREDIENT'
GROUP BY child_id, raw_uom;

Step 4 — Normalize every unit to a single base weight

Unit conversion is the largest source of cost drift across locations. Collapse every volumetric or count measure to grams before any price multiplication, using each ingredient’s density_g_per_ml. Keep the conversion factors in a lookup rather than hard-coding them inline so regional aliases stay versioned.

-- ml-per-uom lookup; extend per regional alias as needed
CREATE TABLE uom_conversions (
    raw_uom   VARCHAR(10) PRIMARY KEY,
    ml_per_uom NUMERIC(10,5)   -- NULL for mass units, handled separately
);
INSERT INTO uom_conversions VALUES
    ('ml', 1), ('fl_oz', 29.57353), ('cup', 236.58824), ('tbsp', 14.78676);

-- normalized grams = volume_ml * density_g_per_ml
--   (mass units bypass this and convert straight to grams)

Yield is applied to the input quantity, not the plated output. A dish calling for 1000 g of raw potato at a 0.82 yield must carry 1000 g as the cost driver while only 820 g reaches the plate — the distinction that keeps theoretical numbers honest when they feed variance mapping methodologies downstream.

Step 5 — Roll up cost in Python with Decimal precision

This step fetches the resolved leaf rows, normalizes them vectorized with numpy.select, then computes line cost with decimal.Decimal so no binary-float error accumulates across a deep tree. There is no row-by-row apply on the numeric path.

from __future__ import annotations

from decimal import Decimal, ROUND_HALF_UP

import numpy as np
import pandas as pd
from sqlalchemy import create_engine, text

engine = create_engine("postgresql+psycopg2://user:pass@host/dbname")

QUERY = text("""
    SELECT bl.child_id, bl.raw_quantity, bl.raw_uom, bl.yield_factor,
           i.density_g_per_ml, c.ml_per_uom,
           p.cost_per_unit, p.uom AS pricing_uom
    FROM recipe_bom_lines bl
    JOIN ingredients i ON bl.child_id = i.ingredient_id
    LEFT JOIN uom_conversions c ON bl.raw_uom = c.raw_uom
    LEFT JOIN pricing p ON bl.child_id = p.ingredient_id
         AND p.location_id = :loc
         AND p.valid_to IS NULL
    WHERE bl.parent_recipe_id = :recipe_id AND bl.child_type = 'INGREDIENT'
""")


def roll_up_cost(recipe_id: str, location_id: str) -> Decimal:
    df: pd.DataFrame = pd.read_sql(
        QUERY, engine, params={"recipe_id": recipe_id, "loc": location_id}
    )

    # Vectorized normalization to grams: mass units pass through,
    # volumetric units convert via density; everything else is flagged NaN.
    grams = np.select(
        condlist=[
            df["raw_uom"].eq("g"),
            df["raw_uom"].eq("kg"),
            df["ml_per_uom"].notna() & df["density_g_per_ml"].notna(),
        ],
        choicelist=[
            df["raw_quantity"],
            df["raw_quantity"] * 1000,
            df["raw_quantity"] * df["ml_per_uom"] * df["density_g_per_ml"],
        ],
        default=np.nan,
    )
    df["procurement_g"] = grams / df["yield_factor"].astype(float)

    if df["procurement_g"].isna().any():
        raise ValueError("Unconvertible unit or missing density in BOM lines")

    # Decimal arithmetic for the money path only (cost is per-kg here).
    line_costs = [
        Decimal(str(g)) * Decimal(str(cpu)) / Decimal("1000")
        for g, cpu in zip(df["procurement_g"], df["cost_per_unit"])
    ]
    return sum(line_costs, Decimal("0")).quantize(
        Decimal("0.0001"), rounding=ROUND_HALF_UP
    )

Step 6 — Materialize the roll-up for read-heavy dashboards

Re-running the recursive CTE on every dashboard load causes lock contention at peak. Materialize the leaf resolution and refresh it on a schedule with pg_cron; menu-engineering reports then read a flat table.

CREATE MATERIALIZED VIEW mv_recipe_leaf_cost AS
SELECT parent_recipe_id, child_id, SUM(raw_quantity) AS total_raw_qty
FROM /* the Step 3 recursive CTE, generalized over all recipes */ bom_tree
WHERE child_type = 'INGREDIENT'
GROUP BY parent_recipe_id, child_id
WITH DATA;

CREATE UNIQUE INDEX ON mv_recipe_leaf_cost (parent_recipe_id, child_id);

-- Refresh nightly; CONCURRENTLY needs the unique index above.
SELECT cron.schedule('refresh-bom', '0 3 * * *',
    $$REFRESH MATERIALIZED VIEW CONCURRENTLY mv_recipe_leaf_cost$$);

Verification and Validation

Confirm each layer before trusting a cost number:

Structure resolves to leaves. The Step 3 query should return only INGREDIENT rows and a max_nesting_level matching your deepest sub-recipe. If a SUBRECIPE id leaks into the result, the recursive JOIN predicate is wrong.
No cycles were silenced incorrectly. Temporarily insert a deliberate circular reference and confirm the query still terminates and simply omits the looped branch.
Money path is deterministic. Assert the roll-up against a hand-computed value:

cost = roll_up_cost("TARGET_RECIPE_UUID", "LOC_01")
assert cost == Decimal("4.7310"), f"unexpected roll-up: {cost}"

Materialized view is fresh. Check SELECT * FROM pg_stat_user_tables WHERE relname = 'mv_recipe_leaf_cost'; and confirm the row count equals the sum of distinct leaves across active recipes.

Gotchas and Edge Cases

IEEE-754 drift. Summing float line costs across a deep tree accumulates sub-cent error that surfaces at month-end reconciliation. Keep the money path on Decimal/NUMERIC end to end.
yield_factor = 0 divide-by-zero. The valid_yield CHECK blocks it at write time; never relax that constraint to “temporarily” load data.
Missing density on a volumetric line. cup or fl_oz with a NULL density_g_per_ml produces NaN grams. Step 5 raises rather than shipping a silent zero — keep it that way.
Regional unit aliases. floz, fl oz, fl_oz, and metric vs US cup all mean different things. Canonicalize aliases into uom_conversions before ingestion; do not pattern-match them in application code.
Fractional quantity strings. Chef cards carry "1 1/2" or "½". Parse to Decimal at the ingestion boundary; never let those reach raw_quantity as text.
Yield applied to output. Applying yield to the plated portion instead of the procurement input systematically understates cost. Always drive cost from the raw input weight.
Mutating history. Never overwrite a BOM line or a pricing row. Soft-delete with is_active = FALSE and add a new price with a fresh valid_from, preserving the trail that menu-engineering analysis depends on.

Designing Recipe BOM Databases — the parent guide to this page’s schema and roll-up patterns.
Yield Factor Calculation Frameworks — how the yield_factor values in Step 1 are derived.
Calculating Trim and Yield Factors for Produce — worked produce examples.
Mapping POS Taxonomies to Ingredients — connecting resolved BOMs back to sales data.
Variance Mapping Methodologies — consuming these theoretical costs downstream.
Core Architecture & Cost Mapping Systems — the wider system this BOM schema anchors.