Optimizing the Proxy, Missing the Router: A Noise-Calibrated Negative Result for In-Loop Differentiable RUDY in GPU Global Placement
A controlled investigation of the proxy-to-router coupling failure in
differentiable routability-driven global placement. We add an in-loop
RUDY penalty to a DREAMPlace-class GPU placer, search its functional
form across five variants, and measure both (a) the in-loop proxy
response and (b) NCTUgr-routed ground truth on two ISPD2011 designs.
Every form demonstrably optimizes its own proxy (−24% to −51%, 5/5
seeds) while routed overflow stays at noise; a calibration arm with
cell inflation proves the headroom was real (−8.5% routed overflow at
+0.67% wirelength on superblue12, −17.3% at +0.66% on
superblue15). The mechanism — not the budget — is the binding
constraint.
Paper:
paper/paper.pdf— targeted for ISPD '27. Every number in this README and the paper regenerates from the committed per-cell TSVs underresults/viascripts/regen_readme_numbers.pyandpaper/make_figs.py.
superblue15 (seed 42), stock DREAMPlace baseline vs. throttled
inflation (r=1.03). Top row: cell placements — visually
indistinguishable, matched wirelength (paired multi-seed: +0.66%
HPWL). Bottom row: NCTUgr post-route overflow on those same
placements — visibly relaxed congestion. The headline (Finding 4):
−17.3% routed 2D overflow [−19.1%, −15.5%], 5/5 seed wins, inside a
1% wirelength budget.
Every "baseline" below is stock DREAMPlace (verified — the bench baseline is left at upstream defaults so paired ratios are interpretable). All CIs are paired-t at n=5 seeds unless stated.
| Covenant configuration | design class | result vs. stock DREAMPlace | finding |
|---|---|---|---|
--profile audited (λ guard removed) |
ISPD2015 std-cell | −3.2% HPWL at matched overflow [−5.5%, −0.9%], 12/12 wins | #1 |
inflate (rate 1.03, throttled) |
ISPD2011 superblue12 | −8.3% routed 2D overflow [−10.2%, −6.4%], 5/5 wins, +0.67% HPWL (in budget) | #3 |
inflate (rate 1.03, throttled) |
ISPD2011 superblue15 | −17.3% routed 2D overflow [−19.1%, −15.5%], 5/5 wins, +0.66% HPWL (in budget) | #4 |
A combined --profile audited + inflate r=1.03 configuration was
also tested on both routability designs and does not compose
additively — the two findings address different placement modes (HPWL
vs. routability) and each is strongest in isolation. See
NOTES.md (2026-06-09) for the data; the headline above
reports the strongest per-mode configuration.
Removing the HPWL-feedback guard on the density-weight update improves wirelength at matched overflow on a 4-design × 3-seed paired protocol:
| comparison | mean HPWL ratio | 95% CI | pairs | sign test |
|---|---|---|---|---|
| λ guard removed vs. stock | 0.968 | [0.945, 0.991] | 12/12 wins | p = 0.0005 |
Shipped as covenant place --profile audited; the bench baseline
remains stock so ratios stay meaningful. Full audit: AUDIT.md.
An in-loop differentiable RUDY penalty added to a DREAMPlace-class
placer does not move router-measured overflow vs. stock
DREAMPlace on ISPD2011-class designs. Five functional forms, paired
5-seed protocol on superblue12, NCTUgr ground truth:
| arm | own-proxy response | routed 2D overflow (95% CI) | HPWL cost |
|---|---|---|---|
| capacity-thresholded, w0.1 | −24% (5/5) | 0.988 [0.968, 1.008] | +0.01% |
| capacity-thresholded, w0.3 | −41% (5/5) | 1.000 [0.979, 1.021] | +0.11% |
| widened visibility, EFF 0.45 | −36% (5/5) | 0.993 [0.968, 1.018] | +0.33% |
| relative hotspot, q98 | −51% (5/5) | 1.001 [0.972, 1.029] | +0.18% |
Every form demonstrably optimizes its own proxy; the strongest form measurably flattens the RUDY utilization map itself (max utilization −5.4%, 5/5). Routed overflow nevertheless stays at noise across the entire search. The failure localizes to the proxy→router coupling, not the gradient machinery, weight, or threshold.
Mean of n=5 seed pairs on superblue12. Top row: in-loop RUDY
utilization map (left: baseline; right: hotspot arm) — the proxy is
provably moving (mean max Δ −4.4%). Bottom row: NCTUgr post-route
overflow map — essentially unchanged (mean Δ +0.1%). Same placements,
two ground truths, opposite stories.
A calibration arm with the known-working mechanism (cell inflation, the stock DREAMPlace routability-mode adjustment) proves the routed-overflow headroom exists and prices it on a dose–response curve. Multi-seed, paired vs. stock DREAMPlace, NCTUgr ground truth:
| mechanism | routed 2D overflow (95% CI) | HPWL cost | within 1% WL budget |
|---|---|---|---|
| stock cell inflation (rate 2.0) | 0.630 [0.611, 0.649], 5/5 | +15.1% | no |
| throttled, rate 1.2 | 0.721 [0.712, 0.729], 5/5 | +10.2% | no |
| throttled, rate 1.05 | 0.889 [0.879, 0.898], 5/5 | +1.62% | no |
| throttled, rate 1.03 | 0.917 [0.898, 0.936], 5/5 | +0.67% | yes |
At the wirelength price point where every differentiable form captured nothing, throttled inflation captures −8.5% routed overflow. The 1% wirelength budget was never the binding constraint; the mechanism was.
Routed-overflow reduction vs. wirelength cost. The five differentiable
arms cluster at zero left of the 1% WL budget (dashed); the inflation
dose–response curve crosses the budget at −8.5% (cap 1.03) and reaches
−37% at stock. The superblue15 r1.03 point (purple diamond) sits
well above superblue12's — a stronger result on the second design.
Generality campaign on superblue15 (1.12M nodes, 294k fixed terminals
— a sharply different macro profile from superblue12's 15k). Both
arms had predictions registered before execution. Both held:
| arm | routed 2D OF ratio (95% CI) | HPWL cost | wins |
|---|---|---|---|
rudy_hotspot (strongest differentiable form) |
0.989 [0.956, 1.021] | +0.01% | 4/5 |
inflate_r103 (budget-compliant mechanism) |
0.827 [0.809, 0.845] | +0.66% | 5/5 |
The differentiable RUDY null transfers cleanly — the
proxy→router coupling failure is design-independent, not a
superblue12 quirk. Throttled inflation transfers and strengthens on
superblue15: −17.3% routed 2D overflow (vs. −8.5% on
superblue12) at essentially the same wirelength cost (+0.66% vs.
+0.67%). The 1% budget still does not bind.
The protocol is descended from a prior noise-calibrated study of LLM-evolved placement schedules and extended here with router ground truth and in-loop liveness gates:
- Paired multi-seed. Every comparison is paired at identical (design, seed) with a 95% paired-t CI and an exact sign test; sub-noise deltas are flagged "not a claim" mechanically. No single-seed numbers are quoted anywhere.
- Measured noise floor. Same-seed re-runs of the identical configuration on NVIDIA GB10 differ by σ ≈ 1% normalized HPWL (GPU atomic nondeterminism). The floor is measured, not assumed, and sets the claimability line.
- Liveness gates. A null is interpretable only if the mechanism provably fired. Every arm logs its activation iteration and calibrated weight in-loop, and the final placement is re-scored offline against the arm's own metric. Every reported null has a documented −24% to −51% own-proxy response.
- Calibration arms. Negative results ship with a known-working mechanism on the same seeds, separating "weak method" from "immovable metric".
- Cost gates. Routed wins are claimable only inside a 1% HPWL budget with per-pair density-overflow parity (max |Δoverflow| ≤ 0.005), a reward-hacking check.
- Audited control loop. Every inherited heuristic in the placer
carries a paired ablation in
AUDIT.md; defaults are set by evidence, not upstream inertia. - Frozen evaluator. Harness files are never modified during experiments.
Every number in this README and in paper/paper.tex
regenerates from committed per-cell measurements:
python scripts/regen_readme_numbers.py # M1: full audit campaign (GPU required)
python scripts/regen_readme_numbers.py --m2 # M2: statistics from results/*.tsv (no GPU)
python paper/make_figs.py # all paper figuresNumbers without a CI do not appear in this README or in the paper.
pip install -e . # console script: covenant
covenant place benchmarks/fft_1 --stub # no GPU needed (stub mode)
covenant bench --candidate my_variant.py \
--benchmarks fft_1 fft_2 --seeds 42 43 44 45 46 # paired multi-seed
covenant audit # control-loop ablation matrix → AUDIT.mdGPU runs require the DREAMPlace submodule built — see
docs/SPARK_SETUP.md (GB10 / aarch64 / CUDA 13)
and docs/RUNNING.md (sanity gates; run them before
trusting any number).
| path | contents |
|---|---|
covenant/ |
placer overlay: harness, hooks, bench/stats, audit, routability |
scripts/ |
campaign drivers, figure/animation tooling, number regeneration |
results/ |
committed per-cell TSVs — the provenance for every claim |
paper/ |
the M2 paper (paper.tex); make_figs.py regenerates every figure |
graphs/ |
placement animations (scripts/make_comparison_gif.py) |
NOTES.md |
dated lab notebook with the claims ledger |
AUDIT.md |
control-loop ablation matrix (generated) |
prd.md |
product definition and roadmap |
Covenant overlay code: BSD-3-Clause. Vendored DREAMPlace: BSD-3 (fork
on the covenant-hooks branch). A transitive-dependency licensing
audit (docs/LICENSING.md) gates the first public
release.



