Skip to content

omarmung/chronicle

Repository files navigation

Chronicle

Chronicle monitors a transformer model during text generation and, when a specified trigger token is emitted, reports which source tokens most influenced it — before the attention mechanism collapses them into a single output vector.

The idea

Standard attention outputs sum weighted value vectors together:

output = sum(w_ij * v_j)

Once summed, traceability is lost. Chronicle captures contributions before summation:

c_ij = w_ij * v_j

Each c_ij is a vector in residual stream space representing how much source token j contributed to destination token i. Chronicle scores these by L2 norm and reports the top-k per head (or aggregated across heads) at the moment a trigger token is generated.

Install

Requires Python 3.11+ and uv.

git clone <repo>
cd chronicle
uv sync

Usage

uv run chronicle \
  --prompt-file prompt.txt \
  --trigger " the" \
  --top-k 5

The trigger must be a single token. Leading spaces matter — " mortality" and "mortality" are different tokens in most tokenizers. Chronicle will error if the trigger tokenizes to more than one token, and will print a message to stderr (including the resolved token ID) if the trigger is never generated within --max-new-tokens.

Options

Option Default Description
--prompt-file (required) Path to a text file containing the prompt
--trigger (required) Token to watch for (must be a single token)
--model pythia-1b TransformerLens model name
--top-k 5 Contributors to report per head
--aggregated off Add an aggregated summary section across all heads and layers
--layers last 2 Comma-separated layer indices to hook, e.g. --layers 10,11
--max-new-tokens 200 Give up after this many tokens without seeing the trigger
--score norm Scoring mode: norm (L2 norm) or logit_projection (project contributions through W_O onto the trigger token's unembedding direction)

Example output

Prompt: 'The relationship between poverty and...'

=== Chronicle Report: trigger ' mortality' (id=1234) at step 47 ===
Generated: "...poverty and inequality predict[ mortality]"

Pattern:
  Dominant: ' poverty' (pos 3) leads in 12/16 heads
  Lexical cluster: ' poverty', ' impoverished'
  Signal is concentrated (one source dominates)

Layer 14, Head 0:
   1. ' poverty'               pos   3       score=5.210
   2. ' inequality'            pos   5       score=2.538
   3. ' predict'               pos   7       score=0.961

Layer 14, Head 5:
   1. ' poverty and'           pos 3–4       score=7.180
       context: "...relationship between poverty and inequality..."
   2. ' rates'                 pos  12       score=0.449

Head summary (16 heads monitored):
  ' poverty' (pos 3): top-3 in 14/16 heads
  ' inequality' (pos 5): top-3 in 9/16 heads
  ' and' (pos 4): top-3 in 6/16 heads

The Pattern section at the top identifies the dominant contributor, any lexical cluster around it (morphological variants sharing a 4+ character substring), and whether signal is concentrated or diffuse. The trigger token is shown in [brackets] in the destination context line, along with its token ID.

Adjacent high-scoring tokens are automatically grouped into spans (e.g. ' poverty and'). A context: line shows the surrounding text for any multi-token span. Tokenizer artifacts like Ġ and Ċ are normalized to readable characters. Punctuation-only tokens are suppressed from output by default.

With --aggregated (adds a summary block before per-head output):

Aggregated across 8 heads × 2 layers:
   1. ' poverty'               pos   3       score=19.343  heads=14/16
   2. ' inequality'            pos   5       score=3.940   heads=9/16
   3. ' predict'               pos   7       score=2.241   heads=7/16

Each aggregated entry includes a heads= count showing how many monitored heads ranked that position highly.

How it works

  1. Load the model and tokenizer via TransformerLens.
  2. Validate the trigger resolves to exactly one token ID.
  3. Generate tokens one at a time using greedy decoding.
  4. At each step, hook two points in the attention mechanism:
    • blocks.N.attn.hook_pattern — attention weights w_ij, shape [n_heads, dest, src]
    • blocks.N.attn.hook_v — value vectors v_j, shape [src, n_heads, d_head]
  5. Compute c_ij = w_ij * v_j for the last destination position (the token just generated), yielding shape [n_heads, src, d_head].
  6. Score each contribution by its L2 norm (or logit projection with --score logit_projection) and keep the top candidates per head.
  7. When the generated token matches the trigger, print the report and exit.

The buffer is intentionally short-lived — it stores only the current generation step and is wiped on each reset, so there is no accumulation across steps.

Scoring modes

--score norm (default): ranks contributions by L2 norm ‖c_ij‖. Fast, model-agnostic, and works for any trigger.

--score logit_projection: projects each contribution through the head's output matrix W_O and then onto the trigger token's unembedding direction W_U[:, trigger_id]. Scores contributions by their directional alignment with the emitted token rather than raw magnitude. Reports labeled [score=logit_projection] in the header.

Development

make install       # uv sync --extra dev
make test          # full test suite (71 tests, no model download needed)
make test-unit     # unit tests only (~1s)
make run-example   # runs against pythia-1b with prompt.txt.example

To run a single test file:

make test-one FILE=tests/unit/test_contrib.py

Project structure

src/chronicle/
  cli.py        generation loop, CLI entry point
  model.py      model loading, trigger token validation
  hooks.py      TransformerLens hook registration
  contrib.py    c_ij computation, norm/logit-projection scoring, aggregation
  buffer.py     per-step tensor storage with internal oversampling
  display.py    token normalization, noise filtering, span grouping, context windows
  report.py     stdout formatting (pattern section, per-head, aggregated, head summary)
tests/
  unit/         trigger validation, contribution math, buffer semantics
  integration/  full CLI tests with a mock model (no GPU required)

Dependencies

About

Pre-collapse attention contribution inspector for transformer models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors