Skip to content

maomlab/circuit_collapse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Circuit Collapse

Testing the Entropic Collapse / Circuit-Formation Hypothesis in Transformers

"Entropic Collapse and Circuit Formation in Transformers: A Polymer Physics Analogy for Generalisation Under Free-Energy Minimisation"


Overview

This CircuitCollapse package implements a suite of experiments that test four predictions of the circuit-collapse hypothesis: the idea that SGD training drives transformers toward a free-energy minimum by concentrating computational logic into sparse, reusable circuits while releasing the remaining weight space to a high-entropy "weight solvent" — in direct analogy to the hydrophobic collapse of polymers.

Hypothesis Prediction
P1 Total weight entropy rises at the grokking transition
P2 Larger models grok more easily (lower ΔF barrier)
P3 Circuit sparsity ↑ ⟺ solvent entropy ↑ as weight decay increases
P4 Superposition density increases post-collapse
ΔF Free-energy gap ΔF(t) = ΔL − T·ΔH turns negative just before grokking

The package also integrates with the MIB circuit localisation track to measure entropy decompositions on circuits discovered in pretrained models (GPT-2, Qwen-2.5, Gemma-2).


Installation

# 1. Clone with MIB submodule
git clone --recurse-submodules https://github.com/maomlab/circuit_collapse.git
cd circuit_collapse

# 2. Create conda environment
conda create -n circuit_collapse python=3.10 -y
conda activate circuit_collapse

# 3. Install core package
pip install -e .

# 4. Install MIB integration extras (requires EAP-IG)
pip install -e ".[mib]"
# or manually:
git submodule update --init --recursive
pip install -e EAP-IG/
pip install tabulate

# 5. (Optional) dev tools
pip install -e ".[dev]"

Hardware Requirements

Experiment Min GPU Recommended
P1, P3, P4 (d=128) RTX 6000 (24 GB) Any
P2 (d=512) A40 (48 GB) A40
MIB GPT-2/Qwen RTX 6000 A40
MIB Gemma-2 A40 A40
MIB Llama-3 2× A40 A100

Quick Start

Run a single experiment (local)

# P1: entropy rise at grokking (tiny model, ~5 min on CPU)
python -m scripts.run_experiment \
    --experiment p1 \
    --output-dir intermediate/p1 \
    --p 97 --d-model 128 --n-layers 1 \
    --lr 1e-3 --weight-decay 1.0 \
    --n-steps 50000 --device cuda

# Temperature sweep (all temperatures, one job)
python -m scripts.run_experiment \
    --experiment temperature_sweep \
    --output-dir results/tsweep

SLURM cluster

mkdir -p logs

# P1 — 4 seeds in parallel
sbatch slurm/p1_entropy_rise.sh

# Temperature sweep — 7 temperatures in parallel (SLURM array)
sbatch slurm/temperature_sweep.sh

# P2, P3, P4 — array over experiment type
sbatch slurm/p2_p3_p4.sh

MIB integration

First run MIB attribution (from the MIB repo):

python run_attribution.py \
    --models gpt2 qwen2.5 \
    --tasks ioi arithmetic_addition \
    --method EAP-IG-inputs \
    --level edge \
    --ablation patching

Then run Circut Collapse entropy analysis on the discovered circuits:

python -m circuit_collapse.scripts.run_experiment \
    --experiment mib_entropy \
    --model-name gpt2 \
    --task ioi \
    --circuit-path circuits/EAP-IG-inputs_patching_edge/ioi_gpt2/importances.json \
    --temperature 1e-4 \
    --output-dir results/mib

Package Structure

circuit_collapse/
├── circuit_collapse/
│   ├── __init__.py
│   ├── entropy.py          # Six entropy estimators + EntropyMonitor
│   ├── training.py         # GrokTrainer, GrokConfig, modular-arithmetic dataset
│   ├── circuits.py         # Circuit discovery, masking, solvent decomposition
│   ├── experiments.py      # High-level runners for P1–P4
│   ├── analysis.py         # Plotting, statistics, summary tables
│   ├── mib.py              # MIB circuit evaluation + entropy augmentation
│   └── scripts/
│       └── run_experiment.py   # CLI entry point
├── tests/
│   ├── conftest.py
│   ├── test_entropy.py     # 30 unit tests for all estimators
│   └── test_training.py    # Training, dataset, circuits tests
├── slurm/
│   ├── p1_entropy_rise.sh
│   ├── temperature_sweep.sh
│   └── p2_p3_p4.sh
├── configs/
│   └── experiments.yaml    # Canonical hyperparameters for all experiments
└── pyproject.toml

Entropy Estimators

Estimator Class Complexity Memory Circuit-decomposable
Diagonal Gaussian DiagonalGaussianEstimator O(d) O(d) ✓ exact
SWAG low-rank+diag SWAGEstimator O(dK) O(dK) ✓ approx
Spectral / eRank SpectralEntropyEstimator O(mn·r) O(mn) ✓ per-layer
KFAC Laplace KFACLaplaceEstimator O(d·s) O(m²+n²) ✓ per-layer
Full Laplace FullLaplaceEstimator O(d³) O(d²) ✓ (toy only)
KDE / KNIFE KDEEstimator O(dN²) O(dN) ✓ (low-dim only)

Use EntropyMonitor to run Diagonal + SWAG + Spectral in parallel:

from circuit_collapse.entropy import EntropyMonitor

monitor = EntropyMonitor(model, swag_rank=20, spectral_interval=500)

# In training loop:
monitor.update(model)

# Get snapshot:
snap = monitor.snapshot()
# → {'H_diagonal': ..., 'H_swag': ..., 'gini': ..., 'effective_ranks': {...}}

# Circuit/solvent decomposition:
decomp = monitor.decompose(flat_bool_mask, circuit_layer_names=["blocks.0.attn.W_Q"])
# → {'H_circuit_diagonal': ..., 'H_solvent_diagonal': ..., ...}

Free-Energy Proxy

The circuit-collapse hypothesis predicts:

ΔF(t) = ΔL(t) - T · ΔH(t) → negative just before grokking

where T = η/B is the effective SGD temperature. Circuit Collapse records ΔF at every evaluation step (record.free_energy_gap) and logs the step at which it first turns negative (sign_change_step in temperature sweep results).


Running Tests

# All tests (fast; skip slow/GPU):
pytest tests/ -v -m "not slow and not gpu and not mib"

# Full suite (requires GPU + MIB):
pytest tests/ -v

# With coverage:
pytest tests/ --cov=circuit_collapse --cov-report=html

Citation

If you use this code, please cite:

@article{omeara2026,
  title   = {Entropic Collapse and Circuit Formation in Transformers:
             A Polymer Physics Analogy for Generalisation Under
             Free-Energy Minimisation},
  author  = {Matthew J O'Meara},
  year    = {2026},
  journal = {Technical Report},
}

Also cite the MIB benchmark if using MIB integration:

@article{mib-2025,
  title   = {{MIB}: A Mechanistic Interpretability Benchmark},
  author  = {Aaron Mueller and Atticus Geiger and Sarah Wiegreffe and others},
  year    = {2025},
  journal = {CoRR},
  volume  = {arXiv:2504.13151},
}

License

Apache 2.0 — see LICENSE.

About

Experiments to characterize changes in free energy associated with formation of transformer circuits

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors