Testing the Entropic Collapse / Circuit-Formation Hypothesis in Transformers
"Entropic Collapse and Circuit Formation in Transformers: A Polymer Physics Analogy for Generalisation Under Free-Energy Minimisation"
This CircuitCollapse package implements a suite of experiments that test four predictions of the circuit-collapse hypothesis: the idea that SGD training drives transformers toward a free-energy minimum by concentrating computational logic into sparse, reusable circuits while releasing the remaining weight space to a high-entropy "weight solvent" — in direct analogy to the hydrophobic collapse of polymers.
| Hypothesis | Prediction |
|---|---|
| P1 | Total weight entropy rises at the grokking transition |
| P2 | Larger models grok more easily (lower ΔF barrier) |
| P3 | Circuit sparsity ↑ ⟺ solvent entropy ↑ as weight decay increases |
| P4 | Superposition density increases post-collapse |
| ΔF | Free-energy gap ΔF(t) = ΔL − T·ΔH turns negative just before grokking |
The package also integrates with the MIB circuit localisation track to measure entropy decompositions on circuits discovered in pretrained models (GPT-2, Qwen-2.5, Gemma-2).
# 1. Clone with MIB submodule
git clone --recurse-submodules https://github.com/maomlab/circuit_collapse.git
cd circuit_collapse
# 2. Create conda environment
conda create -n circuit_collapse python=3.10 -y
conda activate circuit_collapse
# 3. Install core package
pip install -e .
# 4. Install MIB integration extras (requires EAP-IG)
pip install -e ".[mib]"
# or manually:
git submodule update --init --recursive
pip install -e EAP-IG/
pip install tabulate
# 5. (Optional) dev tools
pip install -e ".[dev]"| Experiment | Min GPU | Recommended |
|---|---|---|
| P1, P3, P4 (d=128) | RTX 6000 (24 GB) | Any |
| P2 (d=512) | A40 (48 GB) | A40 |
| MIB GPT-2/Qwen | RTX 6000 | A40 |
| MIB Gemma-2 | A40 | A40 |
| MIB Llama-3 | 2× A40 | A100 |
# P1: entropy rise at grokking (tiny model, ~5 min on CPU)
python -m scripts.run_experiment \
--experiment p1 \
--output-dir intermediate/p1 \
--p 97 --d-model 128 --n-layers 1 \
--lr 1e-3 --weight-decay 1.0 \
--n-steps 50000 --device cuda
# Temperature sweep (all temperatures, one job)
python -m scripts.run_experiment \
--experiment temperature_sweep \
--output-dir results/tsweepmkdir -p logs
# P1 — 4 seeds in parallel
sbatch slurm/p1_entropy_rise.sh
# Temperature sweep — 7 temperatures in parallel (SLURM array)
sbatch slurm/temperature_sweep.sh
# P2, P3, P4 — array over experiment type
sbatch slurm/p2_p3_p4.shFirst run MIB attribution (from the MIB repo):
python run_attribution.py \
--models gpt2 qwen2.5 \
--tasks ioi arithmetic_addition \
--method EAP-IG-inputs \
--level edge \
--ablation patchingThen run Circut Collapse entropy analysis on the discovered circuits:
python -m circuit_collapse.scripts.run_experiment \
--experiment mib_entropy \
--model-name gpt2 \
--task ioi \
--circuit-path circuits/EAP-IG-inputs_patching_edge/ioi_gpt2/importances.json \
--temperature 1e-4 \
--output-dir results/mibcircuit_collapse/
├── circuit_collapse/
│ ├── __init__.py
│ ├── entropy.py # Six entropy estimators + EntropyMonitor
│ ├── training.py # GrokTrainer, GrokConfig, modular-arithmetic dataset
│ ├── circuits.py # Circuit discovery, masking, solvent decomposition
│ ├── experiments.py # High-level runners for P1–P4
│ ├── analysis.py # Plotting, statistics, summary tables
│ ├── mib.py # MIB circuit evaluation + entropy augmentation
│ └── scripts/
│ └── run_experiment.py # CLI entry point
├── tests/
│ ├── conftest.py
│ ├── test_entropy.py # 30 unit tests for all estimators
│ └── test_training.py # Training, dataset, circuits tests
├── slurm/
│ ├── p1_entropy_rise.sh
│ ├── temperature_sweep.sh
│ └── p2_p3_p4.sh
├── configs/
│ └── experiments.yaml # Canonical hyperparameters for all experiments
└── pyproject.toml
| Estimator | Class | Complexity | Memory | Circuit-decomposable |
|---|---|---|---|---|
| Diagonal Gaussian | DiagonalGaussianEstimator |
O(d) | O(d) | ✓ exact |
| SWAG low-rank+diag | SWAGEstimator |
O(dK) | O(dK) | ✓ approx |
| Spectral / eRank | SpectralEntropyEstimator |
O(mn·r) | O(mn) | ✓ per-layer |
| KFAC Laplace | KFACLaplaceEstimator |
O(d·s) | O(m²+n²) | ✓ per-layer |
| Full Laplace | FullLaplaceEstimator |
O(d³) | O(d²) | ✓ (toy only) |
| KDE / KNIFE | KDEEstimator |
O(dN²) | O(dN) | ✓ (low-dim only) |
Use EntropyMonitor to run Diagonal + SWAG + Spectral in parallel:
from circuit_collapse.entropy import EntropyMonitor
monitor = EntropyMonitor(model, swag_rank=20, spectral_interval=500)
# In training loop:
monitor.update(model)
# Get snapshot:
snap = monitor.snapshot()
# → {'H_diagonal': ..., 'H_swag': ..., 'gini': ..., 'effective_ranks': {...}}
# Circuit/solvent decomposition:
decomp = monitor.decompose(flat_bool_mask, circuit_layer_names=["blocks.0.attn.W_Q"])
# → {'H_circuit_diagonal': ..., 'H_solvent_diagonal': ..., ...}The circuit-collapse hypothesis predicts:
ΔF(t) = ΔL(t) - T · ΔH(t) → negative just before grokking
where T = η/B is the effective SGD temperature. Circuit Collapse records ΔF at every
evaluation step (record.free_energy_gap) and logs the step at which it first
turns negative (sign_change_step in temperature sweep results).
# All tests (fast; skip slow/GPU):
pytest tests/ -v -m "not slow and not gpu and not mib"
# Full suite (requires GPU + MIB):
pytest tests/ -v
# With coverage:
pytest tests/ --cov=circuit_collapse --cov-report=htmlIf you use this code, please cite:
@article{omeara2026,
title = {Entropic Collapse and Circuit Formation in Transformers:
A Polymer Physics Analogy for Generalisation Under
Free-Energy Minimisation},
author = {Matthew J O'Meara},
year = {2026},
journal = {Technical Report},
}Also cite the MIB benchmark if using MIB integration:
@article{mib-2025,
title = {{MIB}: A Mechanistic Interpretability Benchmark},
author = {Aaron Mueller and Atticus Geiger and Sarah Wiegreffe and others},
year = {2025},
journal = {CoRR},
volume = {arXiv:2504.13151},
}Apache 2.0 — see LICENSE.