Covenant

Optimizing the Proxy, Missing the Router: A Noise-Calibrated Negative Result for In-Loop Differentiable RUDY in GPU Global Placement

A controlled investigation of the proxy-to-router coupling failure in differentiable routability-driven global placement. We add an in-loop RUDY penalty to a DREAMPlace-class GPU placer, search its functional form across five variants, and measure both (a) the in-loop proxy response and (b) NCTUgr-routed ground truth on two ISPD2011 designs. Every form demonstrably optimizes its own proxy (−24% to −51%, 5/5 seeds) while routed overflow stays at noise; a calibration arm with cell inflation proves the headroom was real (−8.5% routed overflow at +0.67% wirelength on superblue12, −17.3% at +0.66% on superblue15). The mechanism — not the budget — is the binding constraint.

Paper: paper/paper.pdf — targeted for ISPD '27. Every number in this README and the paper regenerates from the committed per-cell TSVs under results/ via scripts/regen_readme_numbers.py and paper/make_figs.py.

superblue15 (seed 42), stock DREAMPlace baseline vs. throttled inflation (r=1.03). Top row: cell placements — visually indistinguishable, matched wirelength (paired multi-seed: +0.66% HPWL). Bottom row: NCTUgr post-route overflow on those same placements — visibly relaxed congestion. The headline (Finding 4): −17.3% routed 2D overflow [−19.1%, −15.5%], 5/5 seed wins, inside a 1% wirelength budget.

Key results

Every "baseline" below is stock DREAMPlace (verified — the bench baseline is left at upstream defaults so paired ratios are interpretable). All CIs are paired-t at n=5 seeds unless stated.

Covenant configuration	design class	result vs. stock DREAMPlace	finding
`--profile audited` (λ guard removed)	ISPD2015 std-cell	−3.2% HPWL at matched overflow [−5.5%, −0.9%], 12/12 wins	#1
`inflate` (rate 1.03, throttled)	ISPD2011 superblue12	−8.3% routed 2D overflow [−10.2%, −6.4%], 5/5 wins, +0.67% HPWL (in budget)	#3
`inflate` (rate 1.03, throttled)	ISPD2011 superblue15	−17.3% routed 2D overflow [−19.1%, −15.5%], 5/5 wins, +0.66% HPWL (in budget)	#4

A combined --profile audited + inflate r=1.03 configuration was also tested on both routability designs and does not compose additively — the two findings address different placement modes (HPWL vs. routability) and each is strongest in isolation. See NOTES.md (2026-06-09) for the data; the headline above reports the strongest per-mode configuration.

Finding 1 — DREAMPlace's λ feedback guard hurts standard-cell designs

Removing the HPWL-feedback guard on the density-weight update improves wirelength at matched overflow on a 4-design × 3-seed paired protocol:

comparison	mean HPWL ratio	95% CI	pairs	sign test
λ guard removed vs. stock	0.968	[0.945, 0.991]	12/12 wins	p = 0.0005

Shipped as covenant place --profile audited; the bench baseline remains stock so ratios stay meaningful. Full audit: AUDIT.md.

Finding 2 — A differentiable congestion proxy can be perfectly optimized and perfectly useless

An in-loop differentiable RUDY penalty added to a DREAMPlace-class placer does not move router-measured overflow vs. stock DREAMPlace on ISPD2011-class designs. Five functional forms, paired 5-seed protocol on superblue12, NCTUgr ground truth:

arm	own-proxy response	routed 2D overflow (95% CI)	HPWL cost
capacity-thresholded, w0.1	−24% (5/5)	0.988 [0.968, 1.008]	+0.01%
capacity-thresholded, w0.3	−41% (5/5)	1.000 [0.979, 1.021]	+0.11%
widened visibility, EFF 0.45	−36% (5/5)	0.993 [0.968, 1.018]	+0.33%
relative hotspot, q98	−51% (5/5)	1.001 [0.972, 1.029]	+0.18%

Every form demonstrably optimizes its own proxy; the strongest form measurably flattens the RUDY utilization map itself (max utilization −5.4%, 5/5). Routed overflow nevertheless stays at noise across the entire search. The failure localizes to the proxy→router coupling, not the gradient machinery, weight, or threshold.

Mean of n=5 seed pairs on superblue12. Top row: in-loop RUDY utilization map (left: baseline; right: hotspot arm) — the proxy is provably moving (mean max Δ −4.4%). Bottom row: NCTUgr post-route overflow map — essentially unchanged (mean Δ +0.1%). Same placements, two ground truths, opposite stories.

Finding 3 — The headroom was real, and reachable inside the budget

A calibration arm with the known-working mechanism (cell inflation, the stock DREAMPlace routability-mode adjustment) proves the routed-overflow headroom exists and prices it on a dose–response curve. Multi-seed, paired vs. stock DREAMPlace, NCTUgr ground truth:

mechanism	routed 2D overflow (95% CI)	HPWL cost	within 1% WL budget
stock cell inflation (rate 2.0)	0.630 [0.611, 0.649], 5/5	+15.1%	no
throttled, rate 1.2	0.721 [0.712, 0.729], 5/5	+10.2%	no
throttled, rate 1.05	0.889 [0.879, 0.898], 5/5	+1.62%	no
throttled, rate 1.03	0.917 [0.898, 0.936], 5/5	+0.67%	yes

At the wirelength price point where every differentiable form captured nothing, throttled inflation captures −8.5% routed overflow. The 1% wirelength budget was never the binding constraint; the mechanism was.

Routed-overflow reduction vs. wirelength cost. The five differentiable arms cluster at zero left of the 1% WL budget (dashed); the inflation dose–response curve crosses the budget at −8.5% (cap 1.03) and reaches −37% at stock. The superblue15 r1.03 point (purple diamond) sits well above superblue12's — a stronger result on the second design.

Finding 4 — Both predictions transfer to a second design

Generality campaign on superblue15 (1.12M nodes, 294k fixed terminals — a sharply different macro profile from superblue12's 15k). Both arms had predictions registered before execution. Both held:

arm	routed 2D OF ratio (95% CI)	HPWL cost	wins
`rudy_hotspot` (strongest differentiable form)	0.989 [0.956, 1.021]	+0.01%	4/5
`inflate_r103` (budget-compliant mechanism)	0.827 [0.809, 0.845]	+0.66%	5/5

The differentiable RUDY null transfers cleanly — the proxy→router coupling failure is design-independent, not a superblue12 quirk. Throttled inflation transfers and strengthens on superblue15: −17.3% routed 2D overflow (vs. −8.5% on superblue12) at essentially the same wirelength cost (+0.66% vs. +0.67%). The 1% budget still does not bind.

Evaluation discipline

The protocol is descended from a prior noise-calibrated study of LLM-evolved placement schedules and extended here with router ground truth and in-loop liveness gates:

Paired multi-seed. Every comparison is paired at identical (design, seed) with a 95% paired-t CI and an exact sign test; sub-noise deltas are flagged "not a claim" mechanically. No single-seed numbers are quoted anywhere.
Measured noise floor. Same-seed re-runs of the identical configuration on NVIDIA GB10 differ by σ ≈ 1% normalized HPWL (GPU atomic nondeterminism). The floor is measured, not assumed, and sets the claimability line.
Liveness gates. A null is interpretable only if the mechanism provably fired. Every arm logs its activation iteration and calibrated weight in-loop, and the final placement is re-scored offline against the arm's own metric. Every reported null has a documented −24% to −51% own-proxy response.
Calibration arms. Negative results ship with a known-working mechanism on the same seeds, separating "weak method" from "immovable metric".
Cost gates. Routed wins are claimable only inside a 1% HPWL budget with per-pair density-overflow parity (max |Δoverflow| ≤ 0.005), a reward-hacking check.
Audited control loop. Every inherited heuristic in the placer carries a paired ablation in AUDIT.md; defaults are set by evidence, not upstream inertia.
Frozen evaluator. Harness files are never modified during experiments.

Reproducibility

Every number in this README and in paper/paper.tex regenerates from committed per-cell measurements:

python scripts/regen_readme_numbers.py        # M1: full audit campaign (GPU required)
python scripts/regen_readme_numbers.py --m2   # M2: statistics from results/*.tsv (no GPU)
python paper/make_figs.py                     # all paper figures

Numbers without a CI do not appear in this README or in the paper.

Quick start

pip install -e .                              # console script: covenant
covenant place benchmarks/fft_1 --stub        # no GPU needed (stub mode)
covenant bench --candidate my_variant.py \
    --benchmarks fft_1 fft_2 --seeds 42 43 44 45 46   # paired multi-seed
covenant audit                                # control-loop ablation matrix → AUDIT.md

GPU runs require the DREAMPlace submodule built — see docs/SPARK_SETUP.md (GB10 / aarch64 / CUDA 13) and docs/RUNNING.md (sanity gates; run them before trusting any number).

Repository layout

path	contents
`covenant/`	placer overlay: harness, hooks, bench/stats, audit, routability
`scripts/`	campaign drivers, figure/animation tooling, number regeneration
`results/`	committed per-cell TSVs — the provenance for every claim
`paper/`	the M2 paper (`paper.tex`); `make_figs.py` regenerates every figure
`graphs/`	placement animations (`scripts/make_comparison_gif.py`)
`NOTES.md`	dated lab notebook with the claims ledger
`AUDIT.md`	control-loop ablation matrix (generated)
`prd.md`	product definition and roadmap

License

Covenant overlay code: BSD-3-Clause. Vendored DREAMPlace: BSD-3 (fork on the covenant-hooks branch). A transitive-dependency licensing audit (docs/LICENSING.md) gates the first public release.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.claude		.claude
.github/workflows		.github/workflows
covenant		covenant
docs		docs
dreamplace_ext		dreamplace_ext
graphs/comparisons		graphs/comparisons
paper		paper
results		results
scripts		scripts
tests		tests
vendor		vendor
.gitignore		.gitignore
.gitmodules		.gitmodules
AUDIT.md		AUDIT.md
LICENSE		LICENSE
NOTES.md		NOTES.md
README.md		README.md
benchmarks		benchmarks
prd.md		prd.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Covenant

Optimizing the Proxy, Missing the Router: A Noise-Calibrated Negative Result for In-Loop Differentiable RUDY in GPU Global Placement

Key results

Finding 1 — DREAMPlace's λ feedback guard hurts standard-cell designs

Finding 2 — A differentiable congestion proxy can be perfectly optimized and perfectly useless

Finding 3 — The headroom was real, and reachable inside the budget

Finding 4 — Both predictions transfer to a second design

Evaluation discipline

Reproducibility

Quick start

Repository layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Covenant

Optimizing the Proxy, Missing the Router: A Noise-Calibrated Negative Result for In-Loop Differentiable RUDY in GPU Global Placement

Key results

Finding 1 — DREAMPlace's λ feedback guard hurts standard-cell designs

Finding 2 — A differentiable congestion proxy can be perfectly optimized and perfectly useless

Finding 3 — The headroom was real, and reachable inside the budget

Finding 4 — Both predictions transfer to a second design

Evaluation discipline

Reproducibility

Quick start

Repository layout

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages