Chronicle monitors a transformer model during text generation and, when a specified trigger token is emitted, reports which source tokens most influenced it — before the attention mechanism collapses them into a single output vector.
Standard attention outputs sum weighted value vectors together:
output = sum(w_ij * v_j)
Once summed, traceability is lost. Chronicle captures contributions before summation:
c_ij = w_ij * v_j
Each c_ij is a vector in residual stream space representing how much source token j contributed to destination token i. Chronicle scores these by L2 norm and reports the top-k per head (or aggregated across heads) at the moment a trigger token is generated.
Requires Python 3.11+ and uv.
git clone <repo>
cd chronicle
uv syncuv run chronicle \
--prompt-file prompt.txt \
--trigger " the" \
--top-k 5The trigger must be a single token. Leading spaces matter — " mortality" and "mortality" are different tokens in most tokenizers. Chronicle will error if the trigger tokenizes to more than one token, and will print a message to stderr (including the resolved token ID) if the trigger is never generated within --max-new-tokens.
| Option | Default | Description |
|---|---|---|
--prompt-file |
(required) | Path to a text file containing the prompt |
--trigger |
(required) | Token to watch for (must be a single token) |
--model |
pythia-1b |
TransformerLens model name |
--top-k |
5 |
Contributors to report per head |
--aggregated |
off | Add an aggregated summary section across all heads and layers |
--layers |
last 2 | Comma-separated layer indices to hook, e.g. --layers 10,11 |
--max-new-tokens |
200 |
Give up after this many tokens without seeing the trigger |
--score |
norm |
Scoring mode: norm (L2 norm) or logit_projection (project contributions through W_O onto the trigger token's unembedding direction) |
Prompt: 'The relationship between poverty and...'
=== Chronicle Report: trigger ' mortality' (id=1234) at step 47 ===
Generated: "...poverty and inequality predict[ mortality]"
Pattern:
Dominant: ' poverty' (pos 3) leads in 12/16 heads
Lexical cluster: ' poverty', ' impoverished'
Signal is concentrated (one source dominates)
Layer 14, Head 0:
1. ' poverty' pos 3 score=5.210
2. ' inequality' pos 5 score=2.538
3. ' predict' pos 7 score=0.961
Layer 14, Head 5:
1. ' poverty and' pos 3–4 score=7.180
context: "...relationship between poverty and inequality..."
2. ' rates' pos 12 score=0.449
Head summary (16 heads monitored):
' poverty' (pos 3): top-3 in 14/16 heads
' inequality' (pos 5): top-3 in 9/16 heads
' and' (pos 4): top-3 in 6/16 heads
The Pattern section at the top identifies the dominant contributor, any lexical cluster around it (morphological variants sharing a 4+ character substring), and whether signal is concentrated or diffuse. The trigger token is shown in [brackets] in the destination context line, along with its token ID.
Adjacent high-scoring tokens are automatically grouped into spans (e.g. ' poverty and'). A context: line shows the surrounding text for any multi-token span. Tokenizer artifacts like Ġ and Ċ are normalized to readable characters. Punctuation-only tokens are suppressed from output by default.
With --aggregated (adds a summary block before per-head output):
Aggregated across 8 heads × 2 layers:
1. ' poverty' pos 3 score=19.343 heads=14/16
2. ' inequality' pos 5 score=3.940 heads=9/16
3. ' predict' pos 7 score=2.241 heads=7/16
Each aggregated entry includes a heads= count showing how many monitored heads ranked that position highly.
- Load the model and tokenizer via TransformerLens.
- Validate the trigger resolves to exactly one token ID.
- Generate tokens one at a time using greedy decoding.
- At each step, hook two points in the attention mechanism:
blocks.N.attn.hook_pattern— attention weightsw_ij, shape[n_heads, dest, src]blocks.N.attn.hook_v— value vectorsv_j, shape[src, n_heads, d_head]
- Compute
c_ij = w_ij * v_jfor the last destination position (the token just generated), yielding shape[n_heads, src, d_head]. - Score each contribution by its L2 norm (or logit projection with
--score logit_projection) and keep the top candidates per head. - When the generated token matches the trigger, print the report and exit.
The buffer is intentionally short-lived — it stores only the current generation step and is wiped on each reset, so there is no accumulation across steps.
--score norm (default): ranks contributions by L2 norm ‖c_ij‖. Fast, model-agnostic, and works for any trigger.
--score logit_projection: projects each contribution through the head's output matrix W_O and then onto the trigger token's unembedding direction W_U[:, trigger_id]. Scores contributions by their directional alignment with the emitted token rather than raw magnitude. Reports labeled [score=logit_projection] in the header.
make install # uv sync --extra dev
make test # full test suite (71 tests, no model download needed)
make test-unit # unit tests only (~1s)
make run-example # runs against pythia-1b with prompt.txt.exampleTo run a single test file:
make test-one FILE=tests/unit/test_contrib.pysrc/chronicle/
cli.py generation loop, CLI entry point
model.py model loading, trigger token validation
hooks.py TransformerLens hook registration
contrib.py c_ij computation, norm/logit-projection scoring, aggregation
buffer.py per-step tensor storage with internal oversampling
display.py token normalization, noise filtering, span grouping, context windows
report.py stdout formatting (pattern section, per-head, aggregated, head summary)
tests/
unit/ trigger validation, contribution math, buffer semantics
integration/ full CLI tests with a mock model (no GPU required)
- TransformerLens ≥ 3.0.0
- Pythia (default model, downloaded on first run)