Smooth-L0 (Geman-McClure) importance-minimality loss by danbraunai-goodfire · Pull Request #852 · goodfire-ai/param-decomp

danbraunai-goodfire · 2026-06-16T20:38:23Z

What

Adds SmoothL0ImportanceMinimalityLoss — a bounded smooth-L0 CI-sparsity penalty
φ(c) = c²/(c²+γ²) (Geman–McClure) as an alternative to the existing L_p
ImportanceMinimalityLoss. The L_p penalty's gradient p·c^(p-1) blows up as c→0 for
p<1 (an infinite cliff at the accumulation point where most components sit). Smooth-L0 is:

flat at 0 (φ'(0)=0) — no cliff,
saturating to 1 (so the per-component sum ≈ the active-component count),
bounded gradient (~0.65/γ at the meaningful threshold c≈γ).

γ anneals like p. The implementation is self-contained — it does not modify
importance_minimality.py.

Changes

param_decomp/metrics/smooth_l0_importance_minimality.py — loss + config (gamma, beta,
gamma_final, anneal fracs), mirroring the L_p structure (entropy term, DDP world_size
handling, un-namespaced {name}/{name}_no_beta compute keys).
Registered loss-side (configs.py AnyLossMetricConfig, dispatch.py) and both
imp-min metrics eval-side (param_decomp_lab/eval_metrics/__init__.py) so a run driven by
one logs the other's sparsity proxy as an eval-only metric.
param_decomp/tests/metrics/test_smooth_l0_importance_minimality_loss.py — 15 tests incl.
the defining flat-finite-gradient-at-0 and bounded-gradient-peaks-at-γ checks.
Two cross-logging comparison configs off pile_llama_simple_mlp-4L.yaml:
..._impmin_lp.yaml (L_p control + smooth-L0 eval probe) and ..._impmin_smoothl0.yaml
(smooth-L0 driver + L_p eval probe).
docs/smooth_l0_importance_minimality.md — short note: the configs, run commands, the
metrics to compare, and a local-data fallback for clusters where HF streaming is down.
param_decomp/metrics/CLAUDE.md — variants note.

How to compare

pd-lm param_decomp_lab/experiments/lm/pile_llama_simple_mlp-4L_impmin_smoothl0.yaml --dp 8 --group smoothl0-vs-lp
pd-lm param_decomp_lab/experiments/lm/pile_llama_simple_mlp-4L_impmin_lp.yaml       --dp 8 --group smoothl0-vs-lp

Compare at 5k/10k: eval/.../CI_L0 total (sparsity) vs eval/ce_kl/kl_ci_masked and
eval/loss/PGDReconLoss (faithfulness). batch_size is global, so --dp 8 ≡ --dp 16
trajectory. Sweep coeff on each to trace the full sparsity/faithfulness frontier (smooth-L0's
loss scale ≈ active-count, so its coeff is not 1:1 with L_p's).

Prior validation (sibling `feature/jax` branch, same loss math)

A 10-run sweep found smooth-L0 dominates the L_p sparsity/faithfulness trade-off at 5k/10k
— ~0.1–0.15 lower KL at matched CI_L0 and lower 20-step adversarial PGD recon, including a
fast-anneal cliff-regime (γ→0.05 / p→0.4 by step 8k) pair, training stably. Re-running this on
main is the purpose of this PR.

Caveat: only early training (5k/10k) was tested; the long-horizon collapse-robustness story is
not yet evaluated.

Test

pytest param_decomp/tests/metrics/test_smooth_l0_importance_minimality_loss.py (15 passed);
basedpyright + ruff clean (pre-commit).

…ality loss Add SmoothL0ImportanceMinimalityLoss, an alternative CI-sparsity penalty φ(c)=c²/(c²+γ²) to the L_p ImportanceMinimalityLoss: flat gradient at 0, bounded (~0.65/γ near c≈γ), so no gradient cliff as the threshold tightens (L_p's p·c^(p-1) blows up as c→0 for p<1). γ anneals like p. Self-contained (no edits to importance_minimality.py). Register it loss-side (configs.py, dispatch.py) and both imp-min metrics eval-side (eval_metrics) so a run driven by one logs the other's sparsity proxy at eval. Add unit tests, two cross-logging comparison configs (L_p control + smooth-L0), and a short docs/ note with run commands. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smooth-L0 (Geman-McClure) importance-minimality loss#852

Smooth-L0 (Geman-McClure) importance-minimality loss#852
danbraunai-goodfire wants to merge 1 commit into
mainfrom
feature/smooth-l0-importance-minimality

danbraunai-goodfire commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danbraunai-goodfire commented Jun 16, 2026

What

Changes

How to compare

Prior validation (sibling feature/jax branch, same loss math)

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Prior validation (sibling `feature/jax` branch, same loss math)