Circuit Collapse

Testing the Entropic Collapse / Circuit-Formation Hypothesis in Transformers

"Entropic Collapse and Circuit Formation in Transformers: A Polymer Physics Analogy for Generalisation Under Free-Energy Minimisation"

Overview

This CircuitCollapse package implements a suite of experiments that test four predictions of the circuit-collapse hypothesis: the idea that SGD training drives transformers toward a free-energy minimum by concentrating computational logic into sparse, reusable circuits while releasing the remaining weight space to a high-entropy "weight solvent" — in direct analogy to the hydrophobic collapse of polymers.

Hypothesis	Prediction
P1	Total weight entropy rises at the grokking transition
P2	Larger models grok more easily (lower ΔF barrier)
P3	Circuit sparsity ↑ ⟺ solvent entropy ↑ as weight decay increases
P4	Superposition density increases post-collapse
ΔF	Free-energy gap ΔF(t) = ΔL − T·ΔH turns negative just before grokking

The package also integrates with the MIB circuit localisation track to measure entropy decompositions on circuits discovered in pretrained models (GPT-2, Qwen-2.5, Gemma-2).

Installation

# 1. Clone with MIB submodule
git clone --recurse-submodules https://github.com/maomlab/circuit_collapse.git
cd circuit_collapse

# 2. Create conda environment
conda create -n circuit_collapse python=3.10 -y
conda activate circuit_collapse

# 3. Install core package
pip install -e .

# 4. Install MIB integration extras (requires EAP-IG)
pip install -e ".[mib]"
# or manually:
git submodule update --init --recursive
pip install -e EAP-IG/
pip install tabulate

# 5. (Optional) dev tools
pip install -e ".[dev]"

Hardware Requirements

Experiment	Min GPU	Recommended
P1, P3, P4 (d=128)	RTX 6000 (24 GB)	Any
P2 (d=512)	A40 (48 GB)	A40
MIB GPT-2/Qwen	RTX 6000	A40
MIB Gemma-2	A40	A40
MIB Llama-3	2× A40	A100

Quick Start

Run a single experiment (local)

# P1: entropy rise at grokking (tiny model, ~5 min on CPU)
python -m scripts.run_experiment \
    --experiment p1 \
    --output-dir intermediate/p1 \
    --p 97 --d-model 128 --n-layers 1 \
    --lr 1e-3 --weight-decay 1.0 \
    --n-steps 50000 --device cuda

# Temperature sweep (all temperatures, one job)
python -m scripts.run_experiment \
    --experiment temperature_sweep \
    --output-dir results/tsweep

SLURM cluster

mkdir -p logs

# P1 — 4 seeds in parallel
sbatch slurm/p1_entropy_rise.sh

# Temperature sweep — 7 temperatures in parallel (SLURM array)
sbatch slurm/temperature_sweep.sh

# P2, P3, P4 — array over experiment type
sbatch slurm/p2_p3_p4.sh

MIB integration

First run MIB attribution (from the MIB repo):

python run_attribution.py \
    --models gpt2 qwen2.5 \
    --tasks ioi arithmetic_addition \
    --method EAP-IG-inputs \
    --level edge \
    --ablation patching

Then run Circut Collapse entropy analysis on the discovered circuits:

python -m circuit_collapse.scripts.run_experiment \
    --experiment mib_entropy \
    --model-name gpt2 \
    --task ioi \
    --circuit-path circuits/EAP-IG-inputs_patching_edge/ioi_gpt2/importances.json \
    --temperature 1e-4 \
    --output-dir results/mib

Package Structure

circuit_collapse/
├── circuit_collapse/
│   ├── __init__.py
│   ├── entropy.py          # Six entropy estimators + EntropyMonitor
│   ├── training.py         # GrokTrainer, GrokConfig, modular-arithmetic dataset
│   ├── circuits.py         # Circuit discovery, masking, solvent decomposition
│   ├── experiments.py      # High-level runners for P1–P4
│   ├── analysis.py         # Plotting, statistics, summary tables
│   ├── mib.py              # MIB circuit evaluation + entropy augmentation
│   └── scripts/
│       └── run_experiment.py   # CLI entry point
├── tests/
│   ├── conftest.py
│   ├── test_entropy.py     # 30 unit tests for all estimators
│   └── test_training.py    # Training, dataset, circuits tests
├── slurm/
│   ├── p1_entropy_rise.sh
│   ├── temperature_sweep.sh
│   └── p2_p3_p4.sh
├── configs/
│   └── experiments.yaml    # Canonical hyperparameters for all experiments
└── pyproject.toml

Entropy Estimators

Estimator	Class	Complexity	Memory	Circuit-decomposable
Diagonal Gaussian	`DiagonalGaussianEstimator`	O(d)	O(d)	✓ exact
SWAG low-rank+diag	`SWAGEstimator`	O(dK)	O(dK)	✓ approx
Spectral / eRank	`SpectralEntropyEstimator`	O(mn·r)	O(mn)	✓ per-layer
KFAC Laplace	`KFACLaplaceEstimator`	O(d·s)	O(m²+n²)	✓ per-layer
Full Laplace	`FullLaplaceEstimator`	O(d³)	O(d²)	✓ (toy only)
KDE / KNIFE	`KDEEstimator`	O(dN²)	O(dN)	✓ (low-dim only)

Use EntropyMonitor to run Diagonal + SWAG + Spectral in parallel:

from circuit_collapse.entropy import EntropyMonitor

monitor = EntropyMonitor(model, swag_rank=20, spectral_interval=500)

# In training loop:
monitor.update(model)

# Get snapshot:
snap = monitor.snapshot()
# → {'H_diagonal': ..., 'H_swag': ..., 'gini': ..., 'effective_ranks': {...}}

# Circuit/solvent decomposition:
decomp = monitor.decompose(flat_bool_mask, circuit_layer_names=["blocks.0.attn.W_Q"])
# → {'H_circuit_diagonal': ..., 'H_solvent_diagonal': ..., ...}

Free-Energy Proxy

The circuit-collapse hypothesis predicts:

ΔF(t) = ΔL(t) - T · ΔH(t) → negative just before grokking

where T = η/B is the effective SGD temperature. Circuit Collapse records ΔF at every evaluation step (record.free_energy_gap) and logs the step at which it first turns negative (sign_change_step in temperature sweep results).

Running Tests

# All tests (fast; skip slow/GPU):
pytest tests/ -v -m "not slow and not gpu and not mib"

# Full suite (requires GPU + MIB):
pytest tests/ -v

# With coverage:
pytest tests/ --cov=circuit_collapse --cov-report=html

Citation

If you use this code, please cite:

@article{omeara2026,
  title   = {Entropic Collapse and Circuit Formation in Transformers:
             A Polymer Physics Analogy for Generalisation Under
             Free-Energy Minimisation},
  author  = {Matthew J O'Meara},
  year    = {2026},
  journal = {Technical Report},
}

Also cite the MIB benchmark if using MIB integration:

@article{mib-2025,
  title   = {{MIB}: A Mechanistic Interpretability Benchmark},
  author  = {Aaron Mueller and Atticus Geiger and Sarah Wiegreffe and others},
  year    = {2025},
  journal = {CoRR},
  volume  = {arXiv:2504.13151},
}

License

Apache 2.0 — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Circuit Collapse

Overview

Installation

Hardware Requirements

Quick Start

Run a single experiment (local)

SLURM cluster

MIB integration

Package Structure

Entropy Estimators

Free-Energy Proxy

Running Tests

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
circuit_collapse		circuit_collapse
scripts		scripts
slurm		slurm
test		test
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Circuit Collapse

Overview

Installation

Hardware Requirements

Quick Start

Run a single experiment (local)

SLURM cluster

MIB integration

Package Structure

Entropy Estimators

Free-Energy Proxy

Running Tests

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages