Why convert weights with the decimal module instead of floats?

Multiplying raw weights by float conversion factors and summing across thousands of daily events lets IEEE-754 error accumulate into daily aggregates. Converting through Decimal and quantizing before casting to float keeps the stored gram value exact to two places, with final rounding deferred to the report layer.

Why use IQR bands instead of a fixed percentage tolerance?

A static tolerance that fits pre-portioned proteins is far too tight for volatile produce or high-cost garnishes. An interquartile-range fence derived per SKU-location group adapts the acceptable band to each item's own dispensing distribution and is robust to the fat tails real data carries.

How is a divide-by-zero from a zero engineered weight prevented?

A spec row with a zero or null engineered weight is treated as a defect and routed to the NO_SPEC state alongside unmatched rows, so the variance ratio is never computed against a zero target. A zero-target portion is a data error to review, not a value to divide by.

What happens during a network outage that leaves telemetry gaps?

Gaps are not imputed with rolling means, which would mask the variance being measured. The pipeline falls back to prep-log yields tagged MANUAL_OVERRIDE to preserve the audit trail, and the missing window is reconciled later through CSV bulk import rather than by fabricating dispensing events.

Why separate under-portioning and over-portioning in the output?

They route to different owners and actions. Under-portioning signals reach kitchen supervisors for immediate correction, while over-portioning trends indicate margin leakage that feeds the weekly menu-engineering review and threshold-tuning logic, so collapsing them into one bucket loses the actionable distinction.

Theoretical Vs Actual Food Cost Calculation

Standardizing Portion Sizes Across Locations

This page shows a food-tech developer or multi-unit culinary manager how to build a deterministic Python pipeline that reconciles scale-measured portion weights against engineered recipe specs across every location — the specific task of turning noisy dispensing telemetry into per-SKU compliance flags without letting unit drift or truncation corrupt the numbers. It sits under Portion Size Standardization, narrowing that module’s general standardization discipline to the concrete extract-normalize-flag loop that decides whether execution variance can be isolated cleanly.

When portion execution diverges from engineered specifications, the gap compounds silently: a plate that runs eight grams heavy on a high-cost protein multiplies across thousands of covers and quietly erodes margin. Standardizing that execution is what makes the rest of the theoretical vs actual food cost calculation architecture trustworthy — you cannot attribute a variance to portioning drift if the portioning signal itself is not on a common, validated scale first.

Prerequisites and Data Contract

Pin these versions and provision the two input tables before the steps apply. The pipeline is only deterministic if the engineered spec it compares against is itself canonical.

Runtime: Python 3.11+, pandas==2.2.*, numpy==1.26.*, pydantic==2.7.*. Monetary and weight arithmetic uses the standard-library decimal module — never binary floats.
Environment: read access to the portion-telemetry store (IoT scale exports or POS-weighted line items) and to the recipe spec table. Engineered weights are assumed already expressed in base units via the yield factor calculation frameworks that translate raw purchase weights into edible portions.
Reconciliation: vendor and menu SKUs arriving on telemetry rows must already be resolved against your POS taxonomy mapping; this pipeline governs weights, not item identity.

The telemetry input contract — one row per dispensing event:

Field	Type	Meaning
`location_id`	text	Site key the event belongs to
`sku_id`	text	Menu item / component being portioned
`timestamp`	timestamptz	When the portion was dispensed
`raw_value`	numeric	Measured quantity, in the row’s own unit
`unit`	text	Unit of `raw_value` (`g`, `oz`, `ml_oil`, `scoop_4oz`, …)
`scale_id`	text	Source scale, for calibration correction

The engineered spec contract — one row per SKU-location:

Field	Type	Meaning
`location_id`	text	Site the spec applies to
`sku_id`	text	Menu item / component
`engineered_weight_g`	numeric	Target portion weight in base grams
`category`	text	Ingredient class (protein, garnish, produce) for alert routing

The output guarantee: every telemetry row leaves the pipeline either tagged with a compliance_status and a signed variance_ratio against a validated band, or quarantined with a reason code. Nothing is silently coerced to grams, zero-filled, or dropped.

Step-by-Step Implementation

Each step is a self-contained block. Compose them in order inside one batch worker, partitioned by location and day.

Step 1 — Canonicalize every unit to base grams with decimal precision

Culinary teams specify grams; scales report ounces; volume components need explicit density coefficients. Resolve everything to grams through a fixed conversion matrix, and anchor the arithmetic to Decimal so rounding never accumulates across thousands of daily events. This is the same unit canonicalization discipline the cost architecture depends on — done once, at the boundary.

import pandas as pd
from decimal import Decimal, ROUND_HALF_UP

# Deterministic conversion matrix (legacy unit -> grams)
UNIT_CONVERSION: dict[str, Decimal] = {
    "g": Decimal("1.0"),
    "oz": Decimal("28.3495"),
    "lb": Decimal("453.592"),
    "ml_water": Decimal("1.0"),   # density-specific
    "ml_oil": Decimal("0.92"),    # density-specific
    "scoop_4oz": Decimal("113.398"),
}

def harmonize_units(df: pd.DataFrame) -> pd.DataFrame:
    """Vectorized unit normalization with a strict rejection gate."""
    df = df.copy()
    df["multiplier"] = df["unit"].map(UNIT_CONVERSION)

    invalid = df["multiplier"].isna() | (df["raw_value"] <= 0)
    if invalid.any():
        raise ValueError(
            f"Unknown unit or non-positive value at rows: {df.index[invalid].tolist()}"
        )

    df["weight_g"] = df["raw_value"].astype(str).map(Decimal).mul(df["multiplier"]).map(
        lambda x: float(x.quantize(Decimal("0.01"), rounding=ROUND_HALF_UP))
    )
    return df.drop(columns=["multiplier"])

Step 2 — Correct scale calibration drift before anything else

IoT scales lose zero-point accuracy through thermal expansion and mechanical wear. Apply the day’s measured offset as a linear correction on ingestion, not as a post-hoc fudge on aggregates — a drifting scale otherwise masquerades as a real portioning trend.

def apply_calibration_offset(
    df: pd.DataFrame, offsets: pd.DataFrame
) -> pd.DataFrame:
    """offsets: columns [scale_id, event_date, offset_g] from the daily test-weight run."""
    df = df.copy()
    df["event_date"] = pd.to_datetime(df["timestamp"]).dt.date
    df = df.merge(offsets, on=["scale_id", "event_date"], how="left")
    df["offset_g"] = df["offset_g"].fillna(0.0)          # no reading -> no correction
    df["weight_g"] = (df["weight_g"] - df["offset_g"]).clip(lower=0.0)
    return df.drop(columns=["offset_g"])

Step 3 — Join telemetry to the engineered spec

Attach each event’s target weight and category with a vectorized left join, keyed on the SKU-location pair. Rows with no matching spec are routed out rather than compared against a phantom target.

def attach_spec(telemetry: pd.DataFrame, spec: pd.DataFrame) -> pd.DataFrame:
    merged = telemetry.merge(
        spec[["location_id", "sku_id", "engineered_weight_g", "category"]],
        on=["location_id", "sku_id"],
        how="left",
    )
    unmatched = merged["engineered_weight_g"].isna()
    if unmatched.any():
        merged.loc[unmatched, "compliance_status"] = "NO_SPEC"
    return merged

Step 4 — Compute per-SKU dynamic tolerance bands

A static ±5% tolerance is fine for pre-portioned proteins and catastrophic for volatile produce or high-cost garnishes. Derive the acceptable band per (location_id, sku_id) from the group’s own distribution using an interquartile-range fence, which is robust to the fat tails real dispensing data carries.

def compute_tolerance_bands(df: pd.DataFrame, min_events: int = 5) -> pd.DataFrame:
    df = df.sort_values(["location_id", "sku_id", "timestamp"]).copy()
    grp = df.groupby(["location_id", "sku_id"])["weight_g"]

    q1 = grp.transform(lambda s: s.quantile(0.25))
    q3 = grp.transform(lambda s: s.quantile(0.75))
    iqr = q3 - q1

    df["lower_bound"] = q1 - 1.5 * iqr
    df["upper_bound"] = q3 + 1.5 * iqr
    df["group_n"] = grp.transform("size")
    df.loc[df["group_n"] < min_events, ["lower_bound", "upper_bound"]] = pd.NA
    return df

Step 5 — Tag compliance and the signed variance ratio

Compute the variance ratio V = (weight_g - engineered_weight_g) / engineered_weight_g and classify each event against its band with a single vectorized np.select. Under-portioning and over-portioning are distinct signals routed to different owners downstream.

import numpy as np

def tag_compliance(df: pd.DataFrame) -> pd.DataFrame:
    df = df.copy()
    df["variance_ratio"] = (
        (df["weight_g"] - df["engineered_weight_g"]) / df["engineered_weight_g"]
    )
    conditions = [
        df["lower_bound"].isna(),
        df["weight_g"] < df["lower_bound"],
        df["weight_g"] > df["upper_bound"],
    ]
    choices = ["INSUFFICIENT_DATA", "UNDER_PORTION", "OVER_PORTION"]
    df["compliance_status"] = df.get("compliance_status")  # preserve NO_SPEC
    unresolved = df["compliance_status"].isna()
    df.loc[unresolved, "compliance_status"] = np.select(
        [c[unresolved] for c in conditions], choices, default="COMPLIANT"
    )
    return df

Step 6 — Aggregate failures by shift and category for routing

Do not flood managers with raw event logs. Roll compliance failures up by shift and ingredient category so grouped UNDER_PORTION events reach kitchen supervisors immediately, while OVER_PORTION trends feed the weekly menu-engineering review and the threshold tuning for alerts logic.

def summarize_for_alerts(df: pd.DataFrame) -> pd.DataFrame:
    fails = df[df["compliance_status"].isin(["UNDER_PORTION", "OVER_PORTION"])].copy()
    fails["shift"] = pd.to_datetime(fails["timestamp"]).dt.hour // 8  # 3 shifts/day
    return (
        fails.groupby(["location_id", "category", "shift", "compliance_status"])
        .agg(events=("variance_ratio", "size"),
             mean_variance=("variance_ratio", "mean"))
        .reset_index()
    )

Verification and Validation

Confirm the pipeline behaves before you trust it to gate a variance report.

Conservation of rows. Every input row lands in exactly one terminal state:

assert tagged["compliance_status"].notna().all(), "untagged rows leaked"

No silent float drift. Round-trip a known ounce weight and assert the canonical gram value is exact to the cent:

assert harmonize_units(
    pd.DataFrame({"raw_value": [4.0], "unit": ["oz"]})
)["weight_g"].iloc[0] == 113.40

Bands only where data supports them. Any group under min_events must be tagged INSUFFICIENT_DATA, never COMPLIANT by default — assert no COMPLIANT row has a null lower_bound.
Directional split is real. Feed a synthetic SKU with three deliberately heavy events and confirm they surface as OVER_PORTION in summarize_for_alerts, so under- and over-portioning never collapse into one bucket.

A healthy run ends with zero untagged rows, an exact ounce round-trip, and a shift summary whose failure counts reconcile against the raw tagged frame.

Gotchas and Edge Cases

IEEE-754 drift accumulating across thousands of events

Multiplying raw weights by float conversion factors and summing them lets binary-float error creep into daily aggregates. Step 1 converts through Decimal and only casts to float after an explicit quantize, so the stored gram value is exact to two places. Keep the final rounding at the report layer — never round intermediate weights.

Fractional-yield truncation clustering weights at whole numbers

POS systems often truncate decimal weights to integers for receipt printing, which creates artificial spikes at whole grams and skews the IQR band. Detect it with a modulo test (weight_g % 1 == 0 for an implausible share of rows) and back-calculate the true weight probabilistically before Step 4, or quarantine the affected scale rather than let the truncation narrow the tolerance fence.

engineered_weight_g of zero causing a divide-by-zero

A spec row that arrives with engineered_weight_g = 0 makes the variance ratio in Step 5 explode to inf. Treat a zero or null engineered weight as a spec defect — filter it into NO_SPEC alongside unmatched rows in Step 3 — because a zero-target portion is a data error to review, not a value to divide by.

Regional unit aliases and locale decimal separators

The same scoop or measure ships under different labels per region, and some exports use 1,50 for one and a half. An unmapped alias hits the rejection gate in Step 1 rather than being silently coerced; extend UNIT_CONVERSION with the regional key instead of loosening the gate, and normalize decimal separators upstream so raw_value is already a clean numeric.

Missing telemetry windows during network outages

Offline periods leave gaps. Do not impute them with rolling means — that masks the very variance you are measuring. Fall back to prep-log yields tagged MANUAL_OVERRIDE so the audit trail stays intact, and reconcile the gap later via CSV bulk import automation rather than fabricating dispensing events.

Up: Portion Size Standardization — the parent module this standardization loop implements.
Theoretical vs Actual Food Cost Calculation — the full variance architecture this feeds.
Variance Mapping Methodologies — separating portioning drift from prep loss and supplier quality once weights are clean.
Yield Factor Calculation Frameworks — how engineered portion weights are derived before this comparison runs.
Multi-Location Cost Center Architecture — isolating per-site tolerance bands within the wider cost estate.

For deeper reference, consult the official Python decimal documentation on rounding contexts and the pandas documentation on vectorized groupby transforms.