Skip to content

themoddedcube/covenant

Repository files navigation

Covenant

Optimizing the Proxy, Missing the Router: A Noise-Calibrated Negative Result for In-Loop Differentiable RUDY in GPU Global Placement

A controlled investigation of the proxy-to-router coupling failure in differentiable routability-driven global placement. We add an in-loop RUDY penalty to a DREAMPlace-class GPU placer, search its functional form across five variants, and measure both (a) the in-loop proxy response and (b) NCTUgr-routed ground truth on two ISPD2011 designs. Every form demonstrably optimizes its own proxy (−24% to −51%, 5/5 seeds) while routed overflow stays at noise; a calibration arm with cell inflation proves the headroom was real (−8.5% routed overflow at +0.67% wirelength on superblue12, −17.3% at +0.66% on superblue15). The mechanism — not the budget — is the binding constraint.

Paper: paper/paper.pdf — targeted for ISPD '27. Every number in this README and the paper regenerates from the committed per-cell TSVs under results/ via scripts/regen_readme_numbers.py and paper/make_figs.py.


superblue15 — same wirelength, −17.3% router overflow (throttled inflate r=1.03 vs baseline). Top row: cell placements (matched HPWL, +0.66%). Bottom row: NCTUgr router overflow maps.

superblue15 (seed 42), stock DREAMPlace baseline vs. throttled inflation (r=1.03). Top row: cell placements — visually indistinguishable, matched wirelength (paired multi-seed: +0.66% HPWL). Bottom row: NCTUgr post-route overflow on those same placements — visibly relaxed congestion. The headline (Finding 4): −17.3% routed 2D overflow [−19.1%, −15.5%], 5/5 seed wins, inside a 1% wirelength budget.

Key results

Every "baseline" below is stock DREAMPlace (verified — the bench baseline is left at upstream defaults so paired ratios are interpretable). All CIs are paired-t at n=5 seeds unless stated.

Covenant configuration design class result vs. stock DREAMPlace finding
--profile audited (λ guard removed) ISPD2015 std-cell −3.2% HPWL at matched overflow [−5.5%, −0.9%], 12/12 wins #1
inflate (rate 1.03, throttled) ISPD2011 superblue12 −8.3% routed 2D overflow [−10.2%, −6.4%], 5/5 wins, +0.67% HPWL (in budget) #3
inflate (rate 1.03, throttled) ISPD2011 superblue15 −17.3% routed 2D overflow [−19.1%, −15.5%], 5/5 wins, +0.66% HPWL (in budget) #4

A combined --profile audited + inflate r=1.03 configuration was also tested on both routability designs and does not compose additively — the two findings address different placement modes (HPWL vs. routability) and each is strongest in isolation. See NOTES.md (2026-06-09) for the data; the headline above reports the strongest per-mode configuration.

Finding 1 — DREAMPlace's λ feedback guard hurts standard-cell designs

Removing the HPWL-feedback guard on the density-weight update improves wirelength at matched overflow on a 4-design × 3-seed paired protocol:

comparison mean HPWL ratio 95% CI pairs sign test
λ guard removed vs. stock 0.968 [0.945, 0.991] 12/12 wins p = 0.0005

Shipped as covenant place --profile audited; the bench baseline remains stock so ratios stay meaningful. Full audit: AUDIT.md.

Finding 2 — A differentiable congestion proxy can be perfectly optimized and perfectly useless

An in-loop differentiable RUDY penalty added to a DREAMPlace-class placer does not move router-measured overflow vs. stock DREAMPlace on ISPD2011-class designs. Five functional forms, paired 5-seed protocol on superblue12, NCTUgr ground truth:

arm own-proxy response routed 2D overflow (95% CI) HPWL cost
capacity-thresholded, w0.1 −24% (5/5) 0.988 [0.968, 1.008] +0.01%
capacity-thresholded, w0.3 −41% (5/5) 1.000 [0.979, 1.021] +0.11%
widened visibility, EFF 0.45 −36% (5/5) 0.993 [0.968, 1.018] +0.33%
relative hotspot, q98 −51% (5/5) 1.001 [0.972, 1.029] +0.18%

Every form demonstrably optimizes its own proxy; the strongest form measurably flattens the RUDY utilization map itself (max utilization −5.4%, 5/5). Routed overflow nevertheless stays at noise across the entire search. The failure localizes to the proxy→router coupling, not the gradient machinery, weight, or threshold.

Coupling failure: RUDY map flattens, router overflow does not

Mean of n=5 seed pairs on superblue12. Top row: in-loop RUDY utilization map (left: baseline; right: hotspot arm) — the proxy is provably moving (mean max Δ −4.4%). Bottom row: NCTUgr post-route overflow map — essentially unchanged (mean Δ +0.1%). Same placements, two ground truths, opposite stories.

Finding 3 — The headroom was real, and reachable inside the budget

A calibration arm with the known-working mechanism (cell inflation, the stock DREAMPlace routability-mode adjustment) proves the routed-overflow headroom exists and prices it on a dose–response curve. Multi-seed, paired vs. stock DREAMPlace, NCTUgr ground truth:

mechanism routed 2D overflow (95% CI) HPWL cost within 1% WL budget
stock cell inflation (rate 2.0) 0.630 [0.611, 0.649], 5/5 +15.1% no
throttled, rate 1.2 0.721 [0.712, 0.729], 5/5 +10.2% no
throttled, rate 1.05 0.889 [0.879, 0.898], 5/5 +1.62% no
throttled, rate 1.03 0.917 [0.898, 0.936], 5/5 +0.67% yes

At the wirelength price point where every differentiable form captured nothing, throttled inflation captures −8.5% routed overflow. The 1% wirelength budget was never the binding constraint; the mechanism was.

Operating space: differentiable arms vs inflation dose-response

Routed-overflow reduction vs. wirelength cost. The five differentiable arms cluster at zero left of the 1% WL budget (dashed); the inflation dose–response curve crosses the budget at −8.5% (cap 1.03) and reaches −37% at stock. The superblue15 r1.03 point (purple diamond) sits well above superblue12's — a stronger result on the second design.

Finding 4 — Both predictions transfer to a second design

Generality campaign on superblue15 (1.12M nodes, 294k fixed terminals — a sharply different macro profile from superblue12's 15k). Both arms had predictions registered before execution. Both held:

arm routed 2D OF ratio (95% CI) HPWL cost wins
rudy_hotspot (strongest differentiable form) 0.989 [0.956, 1.021] +0.01% 4/5
inflate_r103 (budget-compliant mechanism) 0.827 [0.809, 0.845] +0.66% 5/5

The differentiable RUDY null transfers cleanly — the proxy→router coupling failure is design-independent, not a superblue12 quirk. Throttled inflation transfers and strengthens on superblue15: −17.3% routed 2D overflow (vs. −8.5% on superblue12) at essentially the same wirelength cost (+0.66% vs. +0.67%). The 1% budget still does not bind.

Generality across two ISPD2011 designs

Evaluation discipline

The protocol is descended from a prior noise-calibrated study of LLM-evolved placement schedules and extended here with router ground truth and in-loop liveness gates:

  • Paired multi-seed. Every comparison is paired at identical (design, seed) with a 95% paired-t CI and an exact sign test; sub-noise deltas are flagged "not a claim" mechanically. No single-seed numbers are quoted anywhere.
  • Measured noise floor. Same-seed re-runs of the identical configuration on NVIDIA GB10 differ by σ ≈ 1% normalized HPWL (GPU atomic nondeterminism). The floor is measured, not assumed, and sets the claimability line.
  • Liveness gates. A null is interpretable only if the mechanism provably fired. Every arm logs its activation iteration and calibrated weight in-loop, and the final placement is re-scored offline against the arm's own metric. Every reported null has a documented −24% to −51% own-proxy response.
  • Calibration arms. Negative results ship with a known-working mechanism on the same seeds, separating "weak method" from "immovable metric".
  • Cost gates. Routed wins are claimable only inside a 1% HPWL budget with per-pair density-overflow parity (max |Δoverflow| ≤ 0.005), a reward-hacking check.
  • Audited control loop. Every inherited heuristic in the placer carries a paired ablation in AUDIT.md; defaults are set by evidence, not upstream inertia.
  • Frozen evaluator. Harness files are never modified during experiments.

Reproducibility

Every number in this README and in paper/paper.tex regenerates from committed per-cell measurements:

python scripts/regen_readme_numbers.py        # M1: full audit campaign (GPU required)
python scripts/regen_readme_numbers.py --m2   # M2: statistics from results/*.tsv (no GPU)
python paper/make_figs.py                     # all paper figures

Numbers without a CI do not appear in this README or in the paper.

Quick start

pip install -e .                              # console script: covenant
covenant place benchmarks/fft_1 --stub        # no GPU needed (stub mode)
covenant bench --candidate my_variant.py \
    --benchmarks fft_1 fft_2 --seeds 42 43 44 45 46   # paired multi-seed
covenant audit                                # control-loop ablation matrix → AUDIT.md

GPU runs require the DREAMPlace submodule built — see docs/SPARK_SETUP.md (GB10 / aarch64 / CUDA 13) and docs/RUNNING.md (sanity gates; run them before trusting any number).

Repository layout

path contents
covenant/ placer overlay: harness, hooks, bench/stats, audit, routability
scripts/ campaign drivers, figure/animation tooling, number regeneration
results/ committed per-cell TSVs — the provenance for every claim
paper/ the M2 paper (paper.tex); make_figs.py regenerates every figure
graphs/ placement animations (scripts/make_comparison_gif.py)
NOTES.md dated lab notebook with the claims ledger
AUDIT.md control-loop ablation matrix (generated)
prd.md product definition and roadmap

License

Covenant overlay code: BSD-3-Clause. Vendored DREAMPlace: BSD-3 (fork on the covenant-hooks branch). A transitive-dependency licensing audit (docs/LICENSING.md) gates the first public release.

About

Contract-driven GPU analytical placer (DREAMPlace fork overlay) with a noise-calibrated evaluation protocol, paired multi-seed CIs, liveness gates, calibration arms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors