diff --git a/CLAUDE.md b/CLAUDE.md
index 264a81f..1d2544b 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -208,3 +208,4 @@ Build the site after changes: `cd site && pnpm run build` (should produce 7 page
 - Pi provider: `toolpath-pi` reads Pi session JSONL from `~/.pi/agent/sessions/`. Sessions use a tree (id/parentId) in a single file, and may link to a parent file via `parentSession` in the header. The tree is preserved as a DAG in the derived `Path`.
 - Codex provider: `toolpath-codex` reads Codex CLI rollout files from `~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl`. Sessions are date-bucketed (not project-keyed). File-change fidelity is excellent — Codex's `patch_apply_end` events carry either the unified diff (for updates) or the full file content (for adds), so the derived `Path` gets a real `raw` perspective on every file artifact. See `docs/agents/formats/codex.md` for the full format reference.
 - opencode provider: `toolpath-opencode` reads a SQLite database at `~/.local/share/opencode/opencode.db` (opened read-only). Each session's messages and 12 typed part variants (text, reasoning, tool, step-start/-finish, snapshot, patch, file, agent, subtask, retry, compaction) land as one step per message with tool invocations attached. File diffs come from a sibling bare git repo at `snapshot/<project-id>/[<sha1(worktree)>]/` via `git2` tree↔tree diffs — opencode respects the user's `.gitignore`, so changes under gitignored paths fall back to tool-input-derived structural changes with no `raw` perspective. Project id is the SHA of the repo's first root commit. See `docs/agents/formats/opencode.md` for the full format reference.
+- Format references for the agent on-disk formats we derive from live at `docs/agents/formats/`. The Claude Code format (`~/.claude/projects/…` JSONL) gets the deepest treatment — twelve focused docs at `docs/agents/formats/claude-code/` covering envelope, entry types, tools, session chains, compaction, writing-compatible JSONL, a linear walkthrough, and a version-keyed changelog. Sibling single-file references: `codex.md`, `gemini.md`, `opencode.md`. Keep them in sync with their derive crates when fields or behaviors change.
diff --git a/README.md b/README.md
index 9894add..7c5d318 100644
--- a/README.md
+++ b/README.md
@@ -206,6 +206,8 @@ let md_string = render(&doc, &RenderOptions::default());
 - [CHANGELOG.md](CHANGELOG.md) -- Release history
 - [schema/toolpath.schema.json](schema/toolpath.schema.json) -- JSON Schema
 - [examples/](examples/) -- 11 example documents covering steps, paths, and graphs
+- [docs/agents/formats/](docs/agents/formats/README.md) -- Reference for the on-disk
+  formats emitted by agents we derive from (Claude Code today; more as they land)
 
 ## Requirements
 
diff --git a/crates/toolpath-claude/README.md b/crates/toolpath-claude/README.md
index 5cfdc2f..e98eff7 100644
--- a/crates/toolpath-claude/README.md
+++ b/crates/toolpath-claude/README.md
@@ -16,6 +16,12 @@ Reads Claude Code conversation data from `~/.claude/projects/` and provides:
 - **Derivation**: Map conversations to Toolpath Path documents
 - **Watching**: Monitor conversation files for live updates (feature-gated)
 
+For the on-disk format itself — envelope fields, entry types, session chains,
+compaction, and the empirical rules for writing Claude-compatible JSONL — see
+[`docs/agents/formats/claude-code/`](https://github.com/empathic/toolpath/tree/main/docs/agents/formats/claude-code).
+That directory is the authoritative reference; this crate is its reference
+implementation.
+
 ## Derivation
 
 Convert Claude conversations into Toolpath documents:
diff --git a/crates/toolpath-claude/src/derive.rs b/crates/toolpath-claude/src/derive.rs
index 62a15de..b2c7068 100644
--- a/crates/toolpath-claude/src/derive.rs
+++ b/crates/toolpath-claude/src/derive.rs
@@ -1687,22 +1687,26 @@ mod tests {
         // steps[0] = assistant turn, steps[1] = tool step (siblings).
         let tool_step = &path.steps[1];
         let ch = &tool_step.change["/src/login.rs"];
-        let raw = ch.raw.as_deref().expect("edit tool should emit unified diff");
+        let raw = ch
+            .raw
+            .as_deref()
+            .expect("edit tool should emit unified diff");
         // Leading `/` is stripped from the header so `a/`/`b/` don't double up
         // (git-style prefixes already denote the repo root). See #36.
         assert!(raw.contains("--- a/src/login.rs"), "{}", raw);
         assert!(raw.contains("+++ b/src/login.rs"), "{}", raw);
-        assert!(!raw.contains("a//"), "header should not double-slash: {}", raw);
+        assert!(
+            !raw.contains("a//"),
+            "header should not double-slash: {}",
+            raw
+        );
         assert!(raw.contains("-validate_token()"), "{}", raw);
         assert!(raw.contains("+validate_token_v2()"), "{}", raw);
 
         // Sanity-check the parent wiring that the chat view relies on:
         // the tool step's parent is the assistant step, and they share
         // the same `entry.uuid` root so the frontend splice works.
-        assert_eq!(
-            tool_step.step.parents,
-            vec![path.steps[0].step.id.clone()]
-        );
+        assert_eq!(tool_step.step.parents, vec![path.steps[0].step.id.clone()]);
     }
 
     // ── tool result assembly ──────────────────────────────────────────
diff --git a/crates/toolpath-cli/src/bin/gen_synthetic_path.rs b/crates/toolpath-cli/src/bin/gen_synthetic_path.rs
index 678fa0d..b77c9b9 100644
--- a/crates/toolpath-cli/src/bin/gen_synthetic_path.rs
+++ b/crates/toolpath-cli/src/bin/gen_synthetic_path.rs
@@ -60,11 +60,7 @@ const LOREM: &[&str] = &[
     "sed ut perspiciatis unde omnis iste natus error sit voluptatem",
 ];
 
-const TOOLS: &[(&str, f64)] = &[
-    ("Edit", 0.50),
-    ("Write", 0.30),
-    ("MultiEdit", 0.20),
-];
+const TOOLS: &[(&str, f64)] = &[("Edit", 0.50), ("Write", 0.30), ("MultiEdit", 0.20)];
 
 const FILES: &[&str] = &[
     "src/main.rs",
@@ -101,7 +97,10 @@ fn pick_tool(rng: &mut StdRng) -> &'static str {
 
 fn synth_diff(rng: &mut StdRng, path: &str) -> String {
     let lines = rng.random_range(3..12);
-    let mut s = format!("--- a/{}\n+++ b/{}\n@@ -1,{} +1,{} @@\n", path, path, lines, lines);
+    let mut s = format!(
+        "--- a/{}\n+++ b/{}\n@@ -1,{} +1,{} @@\n",
+        path, path, lines, lines
+    );
     for i in 0..lines {
         if rng.random_bool(0.5) {
             s.push_str(&format!("-old_line_{} = value;\n", i));
diff --git a/crates/toolpath-cli/src/cmd_incept.rs b/crates/toolpath-cli/src/cmd_incept.rs
index 546ba54..d4f2a1b 100644
--- a/crates/toolpath-cli/src/cmd_incept.rs
+++ b/crates/toolpath-cli/src/cmd_incept.rs
@@ -1,5 +1,10 @@
 //! `path incept` — project a toolpath document into a Claude session
 //! that Claude Code can load and resume.
+//!
+//! Format rules this command obeys are documented at
+//! `docs/agents/formats/claude-code/writing-compatible-jsonl.md`. When a new
+//! empirical constraint is discovered here, capture it there in the same
+//! change.
 
 use anyhow::Result;
 use std::io::Read;
diff --git a/crates/toolpath-convo/src/derive.rs b/crates/toolpath-convo/src/derive.rs
index 6d9171a..cdda1d5 100644
--- a/crates/toolpath-convo/src/derive.rs
+++ b/crates/toolpath-convo/src/derive.rs
@@ -326,7 +326,10 @@ fn file_write_change(
         extra.insert("edits".to_string(), serde_json::Value::Array(edits.clone()));
     }
 
-    (file_write_diff(&tool.name, input, path, before_state), extra)
+    (
+        file_write_diff(&tool.name, input, path, before_state),
+        extra,
+    )
 }
 
 /// Compute a unified diff string for a file-write tool invocation, given the
@@ -675,8 +678,8 @@ mod tests {
             "file_path": "hello.txt",
             "content": "hi\nthere\n",
         });
-        let raw = file_write_diff("Write", &input, "hello.txt", None)
-            .expect("write should emit diff");
+        let raw =
+            file_write_diff("Write", &input, "hello.txt", None).expect("write should emit diff");
         assert!(raw.contains("+hi"));
         assert!(raw.contains("+there"));
         // No `-` lines — nothing was there before.
diff --git a/crates/toolpath-desktop/src/tray.rs b/crates/toolpath-desktop/src/tray.rs
index aa9b08d..01f17f2 100644
--- a/crates/toolpath-desktop/src/tray.rs
+++ b/crates/toolpath-desktop/src/tray.rs
@@ -362,11 +362,7 @@ pub fn tray_open_trace(
                 basename_slug(&project),
                 short(&session_id)
             );
-            (
-                value,
-                format!("Claude: {}", basename(&project)),
-                filename,
-            )
+            (value, format!("Claude: {}", basename(&project)), filename)
         }
         "pi" => {
             let value = crate::commands::derive::derive_pi(
@@ -380,11 +376,7 @@ pub fn tray_open_trace(
                 basename_slug(&project),
                 short(&session_id)
             );
-            (
-                value,
-                format!("pi.dev: {}", basename(&project)),
-                filename,
-            )
+            (value, format!("pi.dev: {}", basename(&project)), filename)
         }
         // Not wired up in the desktop backend yet. The popover disables
         // rows for these, but we still reject politely if one slips through.
@@ -574,7 +566,10 @@ mod tests {
         // produce a well-formed snapshot with all five provider slots.
         let s = collect_stats();
         let providers: Vec<_> = s.counts.iter().map(|c| c.provider).collect();
-        assert_eq!(providers, vec!["claude", "gemini", "codex", "opencode", "pi"]);
+        assert_eq!(
+            providers,
+            vec!["claude", "gemini", "codex", "opencode", "pi"]
+        );
     }
 
     #[test]
diff --git a/docs/agents/formats/README.md b/docs/agents/formats/README.md
new file mode 100644
index 0000000..ff53f88
--- /dev/null
+++ b/docs/agents/formats/README.md
@@ -0,0 +1,64 @@
+# Agent session formats
+
+This directory holds our working reference for the on-disk formats emitted by
+coding agents whose sessions we derive `toolpath` documents from. These are the
+documents we would like external consumers (other toolpath crates, workshop,
+etc.) to be able to trust without having to reverse-engineer the format
+themselves from a sampled `~/.claude/projects/…` directory.
+
+The goal is **practitioner-grade reference**: exactly what fields appear, what
+they mean, where the format has quirks or bugs, and how our own code copes with
+them. Not a spec — we don't own any of these formats. But close enough that a
+new contributor can add a derivation or a projector without a week of cargo-
+culting.
+
+## Contents
+
+- **[`claude-code/`](claude-code/README.md)** — Claude Code
+  (`~/.claude/projects/…` JSONL). Split into focused docs covering
+  directory layout, the JSONL line envelope, entry types, the `message`
+  object and content parts, tool invocation lifecycle, session chains
+  and compaction, peripheral files, writing-compatible JSONL, known
+  issues, a line-by-line walkthrough, and a version-keyed format
+  changelog. Each revision of the reference carries a date stamp at
+  the top of the subdirectory's README.
+- **[`codex.md`](codex.md)** — Codex CLI rollout files under
+  `~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl`. Single-file reference
+  covering the date-bucketed session format and the `patch_apply_end`
+  events that drive file-change fidelity.
+- **[`gemini.md`](gemini.md)** — Gemini CLI chats under
+  `~/.gemini/tmp/<project>/chats/`, including the main-file + sibling
+  sub-agent UUID directory layout.
+- **[`opencode.md`](opencode.md)** — opencode's SQLite database
+  (`~/.local/share/opencode/opencode.db`), its 12 typed message-part
+  variants, and the sibling bare-git snapshot repo used for file diffs.
+
+The Claude Code reference is the most detailed because it's the
+longest-standing provider and has the most moving parts (JSONL
+envelope variants, session chaining, compaction, sidechains, and the
+loader's own undocumented strictness on what it will accept). The
+other three sit in single files because their formats are either
+simpler or sufficiently covered there.
+
+## Conventions used in these docs
+
+- **"In the wild"** = observed in real JSONL files on disk, not just in types
+  we've defined.
+- **Field tables** show the name as it appears in JSON (so `parentUuid`, not
+  `parent_uuid`), its shape, and whether it's optional. "Optional" means we've
+  seen entries without it; "required" means we've never seen an entry missing
+  it (not that the format promises it'll always be there).
+- **Citations** point either to files under this repo (`crates/<name>/src/…`)
+  or to external sources (marked with URLs). Repo citations dominate — we
+  trust our own parsers and tests more than we trust blog posts.
+- **Version numbers** when quoted (e.g. "Claude Code 2.1.90") are what we've
+  seen in sample data, not what Anthropic has officially tagged a format
+  change at.
+
+## Maintenance
+
+When `toolpath-claude` (or its siblings) learns about a new field, entry type,
+or edge case, update the corresponding doc here in the same change. The point
+of this directory is to be the single place where format knowledge
+accumulates; if the knowledge only lives in code comments or commit messages,
+it effectively doesn't exist.
diff --git a/docs/agents/formats/claude-code/README.md b/docs/agents/formats/claude-code/README.md
new file mode 100644
index 0000000..a877377
--- /dev/null
+++ b/docs/agents/formats/claude-code/README.md
@@ -0,0 +1,169 @@
+# Claude Code on-disk format
+
+> **Reference revision:** 2026-04-23
+> **Tracks:** Claude Code 2.1.x
+> **First-hand samples:** 2.1.37, 2.1.90, 2.1.110, 2.1.112
+>
+> When you change anything in this directory, bump the revision date
+> here and in [format-changelog.md](format-changelog.md) so downstream
+> readers can tell whether a cited rule is current.
+
+Claude Code (Anthropic's CLI coding agent) persists every conversation,
+every settings change, and a fair amount of supporting state to disk
+under `~/.claude/`. Anthropic documents some of this in passing but has
+never published a specification of the JSONL line format, the session
+directory layout, or the rules that govern things like session chaining
+and compaction.
+
+These documents are our working reference for what Claude Code actually
+writes to disk, at what version, and why. The target audience is anyone
+building a tool that reads, writes, or transforms Claude Code session
+data.
+
+## How the docs are organized
+
+Each doc is focused on one aspect of the format. Read them in this order
+if you're new; otherwise, skip to what you need. If you prefer concrete
+examples to field catalogues, start with the **walkthrough** (#11) and
+use the reference docs for lookup.
+
+1. **[directory-layout.md](directory-layout.md)** — what files and
+   directories live under `~/.claude/` and how they're named.
+2. **[jsonl-envelope.md](jsonl-envelope.md)** — the top-level fields that
+   wrap every line of a session JSONL.
+3. **[entry-types.md](entry-types.md)** — the `type` discriminant and
+   every entry variant we've observed, including sidechains.
+4. **[messages.md](messages.md)** — the `message` object, content-part
+   types (text/thinking/tool_use/tool_result), and role values.
+5. **[tools.md](tools.md)** — how tool calls are recorded:
+   `tool_use`/`tool_result` pairing, `tool_use_id` linkage, and the
+   per-tool shape of the top-level `toolUseResult` summary.
+6. **[usage.md](usage.md)** — `message.usage` and the prompt-cache TTL
+   breakdown.
+7. **[session-chains.md](session-chains.md)** — when Claude Code rotates
+   to a new file, how continuations are signalled, and the `compact_boundary`
+   mechanic.
+8. **[peripheral-files.md](peripheral-files.md)** — everything that
+   isn't a session JSONL: `history.jsonl`, `todos/`, `shell-snapshots/`,
+   `file-history/`, `statsig/`, etc.
+9. **[writing-compatible-jsonl.md](writing-compatible-jsonl.md)** —
+   empirically discovered constraints if you want Claude Code to load
+   JSONL that your tool produced.
+10. **[known-issues.md](known-issues.md)** — format-level bugs,
+    corruption modes, and version drift to defend against.
+11. **[walkthrough.md](walkthrough.md)** — a representative session
+    read linearly, line by line, with cross-links back to the
+    reference docs at each step.
+12. **[format-changelog.md](format-changelog.md)** — version-keyed
+    record of field and behavior changes across Claude Code releases.
+
+## Scope and sourcing
+
+The format is undocumented at the envelope level. These docs are
+compiled from:
+
+- Inspection of real session files under `~/.claude/projects/` across
+  Claude Code versions 2.1.37 through 2.1.112.
+- Reading code that produces or consumes the format (both ours and other
+  tools in the local empathic monorepo).
+- Round-trip experiments: writing JSONL and observing whether Claude
+  Code's loader accepts it.
+
+Where a claim is empirical or version-dependent, the doc says so. Where
+it's a guess, the doc says so.
+
+## Conventions
+
+- **Field names** are shown as they appear in JSON. Envelope keys are
+  camelCase (`parentUuid`), API-adjacent keys inside `message` are
+  snake_case (`stop_reason`).
+- **"Observed"** means seen in real files on disk.
+- **"Expected"** means we have structural reasons to believe it's
+  there (e.g. a loader rejects it when absent) even if we haven't
+  enumerated it in samples.
+- **Versions in parentheses** (e.g. "2.1.90+") indicate when a field or
+  behavior first appeared or changed, to the extent we know.
+- **Keep headings anchor-stable.** Cross-links use GitHub's auto-anchors
+  (lowercased, punctuation stripped, spaces to hyphens). Avoid em-dashes
+  and other decorative punctuation in a heading that is, or might be,
+  linked to — they render inconsistently and tend to bite renames.
+
+## Field index
+
+Quick lookup: which doc defines a given field? Fields local to a single
+entry type are grouped under it.
+
+| Field                          | Defined in |
+|--------------------------------|------------|
+| `agentId`                      | [jsonl-envelope.md](jsonl-envelope.md), [entry-types.md §Sidechains](entry-types.md#sidechains) |
+| `attachment`                   | [jsonl-envelope.md](jsonl-envelope.md), [entry-types.md](entry-types.md) |
+| `cache_creation` / `cache_*_input_tokens` | [usage.md](usage.md) |
+| `caller` (on `tool_use`)       | [messages.md](messages.md) |
+| `compactMetadata`              | [entry-types.md](entry-types.md), [session-chains.md](session-chains.md) |
+| `content` (envelope, on `queue-operation`) | [entry-types.md](entry-types.md) |
+| `content` (inside `message` / `tool_result`) | [messages.md](messages.md) |
+| `cwd`                          | [jsonl-envelope.md](jsonl-envelope.md), [writing-compatible-jsonl.md](writing-compatible-jsonl.md) |
+| `durationMs`                   | [entry-types.md](entry-types.md) |
+| `entrypoint`                   | [jsonl-envelope.md](jsonl-envelope.md) |
+| `gitBranch`                    | [jsonl-envelope.md](jsonl-envelope.md) |
+| `hookCount` / `hookInfos` / `hookErrors` | [jsonl-envelope.md](jsonl-envelope.md) |
+| `id` (inside `message`)        | [messages.md](messages.md) |
+| `id` (on `tool_use`)           | [messages.md](messages.md), [tools.md](tools.md) |
+| `iterations`                   | [usage.md](usage.md) |
+| `inference_geo`                | [usage.md](usage.md) |
+| `input_tokens` / `output_tokens` | [usage.md](usage.md) |
+| `isCompactSummary`             | [entry-types.md](entry-types.md), [session-chains.md](session-chains.md) |
+| `isMeta`                       | [jsonl-envelope.md](jsonl-envelope.md) |
+| `isSidechain`                  | [jsonl-envelope.md](jsonl-envelope.md), [entry-types.md §Sidechains](entry-types.md#sidechains) |
+| `isSnapshotUpdate`             | [entry-types.md](entry-types.md) |
+| `isVisibleInTranscriptOnly`    | [entry-types.md](entry-types.md), [session-chains.md](session-chains.md) |
+| `lastPrompt`                   | [entry-types.md](entry-types.md) |
+| `leafUuid`                     | [entry-types.md](entry-types.md) |
+| `level`                        | [jsonl-envelope.md](jsonl-envelope.md) |
+| `logicalParentUuid`            | [entry-types.md](entry-types.md), [session-chains.md](session-chains.md) |
+| `message`                      | [messages.md](messages.md) |
+| `messageCount` (envelope)      | [entry-types.md](entry-types.md) |
+| `messageId`                    | [jsonl-envelope.md](jsonl-envelope.md), [entry-types.md](entry-types.md), [known-issues.md](known-issues.md) |
+| `model` (inside `message`)     | [messages.md](messages.md) |
+| `operation`                    | [entry-types.md](entry-types.md) |
+| `parentUuid`                   | [jsonl-envelope.md](jsonl-envelope.md), [known-issues.md](known-issues.md) |
+| `permissionMode`               | [entry-types.md](entry-types.md), [writing-compatible-jsonl.md](writing-compatible-jsonl.md) |
+| `preventedContinuation`        | [jsonl-envelope.md](jsonl-envelope.md) |
+| `requestId`                    | [jsonl-envelope.md](jsonl-envelope.md) |
+| `role`                         | [messages.md](messages.md) |
+| `server_tool_use`              | [usage.md](usage.md) |
+| `service_tier`                 | [usage.md](usage.md) |
+| `sessionId`                    | [jsonl-envelope.md](jsonl-envelope.md), [session-chains.md](session-chains.md) |
+| `signature` (on `thinking`)    | [messages.md](messages.md), [writing-compatible-jsonl.md](writing-compatible-jsonl.md) |
+| `slug`                         | [jsonl-envelope.md](jsonl-envelope.md), [session-chains.md](session-chains.md) |
+| `snapshot`                     | [entry-types.md](entry-types.md) |
+| `sourceToolAssistantUUID`      | [jsonl-envelope.md](jsonl-envelope.md), [tools.md](tools.md) |
+| `speed`                        | [usage.md](usage.md) |
+| `stop_reason` / `stop_sequence` | [messages.md](messages.md), [known-issues.md](known-issues.md) |
+| `stopReason` (envelope)        | [jsonl-envelope.md](jsonl-envelope.md) |
+| `subtype`                      | [entry-types.md](entry-types.md) |
+| `summary` / `leafUuid`         | [entry-types.md](entry-types.md) |
+| `thinking` / `signature`       | [messages.md](messages.md) |
+| `thinkingMetadata`             | [jsonl-envelope.md](jsonl-envelope.md) |
+| `timestamp`                    | [jsonl-envelope.md](jsonl-envelope.md) |
+| `tool_use` / `tool_result`     | [messages.md](messages.md), [tools.md](tools.md) |
+| `tool_use_id`                  | [messages.md](messages.md), [tools.md](tools.md) |
+| `toolUseResult`                | [jsonl-envelope.md](jsonl-envelope.md), [tools.md](tools.md) |
+| `trackedFileBackups`           | [entry-types.md](entry-types.md), [peripheral-files.md](peripheral-files.md) |
+| `type` (envelope)              | [entry-types.md](entry-types.md) |
+| `type` (content-part)          | [messages.md](messages.md) |
+| `usage`                        | [usage.md](usage.md) |
+| `userType`                     | [jsonl-envelope.md](jsonl-envelope.md) |
+| `uuid`                         | [jsonl-envelope.md](jsonl-envelope.md) |
+| `version`                      | [jsonl-envelope.md](jsonl-envelope.md) |
+
+For the mapping from these JSON keys to Rust fields in
+`ConversationEntry`, see the parser-surface table in
+[jsonl-envelope.md](jsonl-envelope.md#parser-surface-vs-format-surface).
+
+## Maintenance
+
+When a new field, entry type, or behavior shows up in the wild, update
+the relevant doc in the same change. The index above is the table of
+contents; keep it in sync. When you add or rename a field, update the
+field index in this README too.
diff --git a/docs/agents/formats/claude-code/directory-layout.md b/docs/agents/formats/claude-code/directory-layout.md
new file mode 100644
index 0000000..cdcbbc1
--- /dev/null
+++ b/docs/agents/formats/claude-code/directory-layout.md
@@ -0,0 +1,80 @@
+# `~/.claude/` directory layout
+
+Everything Claude Code persists to disk lives under `~/.claude/` (on
+Windows, `%USERPROFILE%\.claude\`). The directory is created on first
+run and populated lazily as features are used.
+
+## The tree
+
+```
+~/.claude/
+├── projects/                  # per-project session storage — the main artifact
+│   └── <sanitized-cwd>/
+│       ├── <session-uuid>.jsonl
+│       ├── <session-uuid>.jsonl
+│       └── …
+├── history.jsonl              # global user-prompt history (separate format)
+├── settings.json              # user-level config (permissions, hooks, etc.)
+├── todos/                     # per-session TodoWrite state (legacy; see peripheral-files.md)
+├── shell-snapshots/           # zsh/bash env dumps captured at session start
+├── file-history/              # content-addressed file backups for undo/rollback
+├── session-env/               # per-session env-var snapshots
+├── plans/                     # plan-mode artifacts
+├── sessions/                  # session cache / index
+├── statsig/                   # Anthropic feature-flag cache
+├── plugins/                   # installed plugins
+├── debug/                     # per-session debug logs
+├── paste-cache/               # clipboard history
+├── backups/                   # backups of ~/.claude.json
+└── ide/                       # per-IDE lock files (e.g. VS Code attach points)
+```
+
+Only `projects/` is necessary for reconstructing a conversation. The
+rest is supporting state; see [peripheral-files.md](peripheral-files.md)
+for the bits that matter.
+
+## `projects/` directory naming
+
+Each project directory is named after the **resolved absolute path** of
+the working directory in which Claude Code was invoked, with `/` and
+`_` characters replaced by `-`.
+
+| Original cwd                                      | Directory name                         |
+|---------------------------------------------------|----------------------------------------|
+| `/Users/alex/Devel/empathic/toolpath`             | `-Users-alex-Devel-empathic-toolpath`  |
+| `/Users/alex/my_project`                          | `-Users-alex-my-project`               |
+| `/home/bob/code`                                  | `-home-bob-code`                       |
+
+### Sanitization is lossy
+
+Both `/` and `_` map to `-`, so you cannot round-trip perfectly. Any
+tool unsanitizing a directory name can only guess where path separators
+were. For the typical case (no underscores in directory names) it works.
+
+### Path canonicalization before sanitization
+
+Claude Code resolves the cwd to a canonical path before sanitizing. On
+macOS in particular, `/tmp` is a symlink to `/private/tmp`:
+
+| Launched in         | Resolves to            | Directory name       |
+|---------------------|------------------------|----------------------|
+| `/tmp/foo`          | `/private/tmp/foo`     | `-private-tmp-foo`   |
+
+If you're building test fixtures or synthesizing directory paths, you
+must canonicalize first. A tool that writes to `-tmp-foo/` will not
+show up when Claude Code is launched in `/tmp/foo`.
+
+### Session files
+
+Each conversation is a single JSONL file named `<session-uuid>.jsonl`
+where `session-uuid` is a UUIDv4. A single *logical* conversation can
+span multiple files if Claude Code rotated mid-session — see
+[session-chains.md](session-chains.md).
+
+## `~/.claude.json`
+
+Distinct from `~/.claude/` (note the missing dot-dir). This is a single
+JSON file at `$HOME/.claude.json` holding OAuth tokens, per-project
+state (MCP servers, allowed tools, `lastSessionId`, trust-dialog state),
+and other cross-session settings. Backups of it land in
+`~/.claude/backups/`.
diff --git a/docs/agents/formats/claude-code/entry-types.md b/docs/agents/formats/claude-code/entry-types.md
new file mode 100644
index 0000000..b139aac
--- /dev/null
+++ b/docs/agents/formats/claude-code/entry-types.md
@@ -0,0 +1,405 @@
+# Entry types
+
+Every session-JSONL line has a `type` field that identifies what kind of
+entry it is. A real parser must handle all of the following; a parser
+that assumes only `user` / `assistant` / `system` will silently drop
+a large fraction of the file.
+
+## Summary table
+
+| `type`                  | Has `message`? | Has `uuid`? | Purpose |
+|-------------------------|----------------|-------------|---------|
+| `user`                  | yes            | yes         | User prompt, slash command, or a synthesized user entry carrying tool results. |
+| `assistant`             | yes            | yes         | Claude's response. Content is almost always an array of parts. |
+| `system`                | varies         | yes         | Metadata entries. Discriminated further by `subtype`. |
+| `attachment`            | no             | yes         | Tool-availability delta (e.g. a deferred tool being loaded). |
+| `file-history-snapshot` | no             | no          | File-state snapshot for undo/rollback. |
+| `permission-mode`       | no             | no          | Records the active permission mode. |
+| `queue-operation`       | no             | no          | Enqueue/dequeue of a typed-ahead message while an assistant turn is in flight. |
+| `last-prompt`           | no             | no          | Cached last user prompt. |
+| `summary`               | no             | no          | Conversation summary. May live in a different JSONL file than the conversation it describes. |
+| `compact_boundary`      | no (usually)   | yes         | Marks an autocompaction event. Also appears as `system.subtype` in some versions. |
+| `progress`              | no             | yes         | Long-running tool progress event. Should be skipped when reconstructing a transcript. |
+
+---
+
+## `user`
+
+User prompts and synthesized user-role entries. More variety than the
+name suggests. Several subclassifications exist:
+
+### Direct user prompt
+
+`message.content` is a bare string. No distinguishing field beyond that.
+
+```json
+{"type": "user", "message": {"role": "user", "content": "what does this file do?"}, ...}
+```
+
+### Slash command
+
+`message.content` is a string containing XML-ish tags:
+
+```
+<command-message>jevan</command-message>
+<command-name>/jevan</command-name>
+<command-args>please enumerate…</command-args>
+```
+
+No separate `type` — the tags inside the content are the discriminator.
+
+### Tool result carrier
+
+The user entry that follows an assistant `tool_use`. `message.content`
+is an array containing one or more `tool_result` blocks (see
+[tools.md](tools.md)). Additional envelope fields:
+
+- `toolUseResult` — top-level structured summary.
+- `sourceToolAssistantUUID` — points at the assistant entry that issued
+  the tool call.
+
+The human did not type this turn; a consumer rendering the transcript
+should typically fold it into the preceding assistant turn rather than
+display it.
+
+### Command output injection
+
+Output from a local command (e.g. `!ls`) is injected back as a user
+entry. Distinguishable by the content format; exact shape varies.
+
+### Hook result
+
+Output from `UserPromptSubmit` and similar hooks is injected as a user
+entry.
+
+### System caveat
+
+An internal system-inserted user-role note. Rare; discriminator is a
+subfield inside the content, not a top-level type.
+
+### Classifying a user entry
+
+The envelope alone does not tell you which subclass a `user` entry is —
+you have to look at a handful of fields together. Decision tree:
+
+```python
+def classify_user(entry):
+    # 1. Tool-result carrier: synthesized, not human-typed.
+    if entry.get("toolUseResult") is not None:
+        return "tool_result_carrier"
+    if entry.get("sourceToolAssistantUUID") is not None:
+        return "tool_result_carrier"
+
+    msg = entry.get("message") or {}
+    content = msg.get("content")
+
+    # Array-form content with only tool_result parts: same class.
+    if isinstance(content, list) and content and all(
+        (p.get("type") == "tool_result") for p in content
+    ):
+        return "tool_result_carrier"
+
+    # 2. Compaction summary: synthetic, flagged explicitly.
+    if entry.get("isCompactSummary") is True:
+        return "compact_summary"
+
+    # 3. Slash command: string content with the XML-ish command tags.
+    if isinstance(content, str) and "<command-name>" in content:
+        return "slash_command"
+
+    # 4. Command output injection: string content with the local-command-output tags.
+    if isinstance(content, str) and (
+        "<local-command-stdout>" in content or
+        "<local-command-stderr>" in content
+    ):
+        return "command_output"
+
+    # 5. Hook result: string content wrapped in UserPromptSubmit-style tags.
+    if isinstance(content, str) and "<user-prompt-submit-hook>" in content:
+        return "hook_result"
+
+    # 6. System caveat: string content in <system-reminder>-class tags.
+    if isinstance(content, str) and "<system-reminder>" in content:
+        return "system_caveat"
+
+    # 7. Otherwise: a direct user prompt.
+    return "direct_prompt"
+```
+
+The tags embedded in content strings are the most reliable way to
+distinguish synthesized-user entries from real ones. Order matters —
+check `toolUseResult` / `sourceToolAssistantUUID` first, then
+`isCompactSummary`, then fall through to content inspection.
+
+---
+
+## `assistant`
+
+Claude's response. `message.content` is almost always an array of parts
+(text, thinking, tool_use). See [messages.md](messages.md) for part
+types.
+
+Envelope specifics:
+- Carries `requestId` (Anthropic API request ID).
+- Carries `message.model`, `message.id`, `message.usage`,
+  `message.stop_reason` (though `stop_reason` is frequently `null` on
+  disk, even for completed turns — see
+  [known-issues.md](known-issues.md)).
+
+A single logical assistant turn can span **multiple `assistant` entries**
+— Claude Code splits thinking, text, and tool_use into separate entries
+within a turn. You cannot assume one `assistant` entry = one turn.
+
+---
+
+## `system`
+
+Metadata entries. Discriminated by `subtype`. Treat `subtype` as an
+open enumeration; new values appear across versions.
+
+### `subtype: "turn_duration"`
+
+Emitted after an assistant turn completes. Carries `durationMs` and
+`messageCount`. Useful as an authoritative "turn ended" signal.
+
+```json
+{
+  "type": "system",
+  "subtype": "turn_duration",
+  "durationMs": 57560,
+  "messageCount": 24,
+  ...
+}
+```
+
+### `subtype: "compact_boundary"`
+
+Marks an autocompaction. Carries `compactMetadata` (`trigger`, `preTokens`)
+and `logicalParentUuid`. See [session-chains.md §Compaction](session-chains.md#compaction--compact_boundary).
+
+In some versions this appears as a top-level `type: "compact_boundary"`
+rather than `type: "system"` with this subtype; treat them as the same
+concept.
+
+### `subtype: "stop_hook_summary"`
+
+Emitted by `Stop` hooks. Carries `hookCount`, `hookInfos`, `hookErrors`,
+`preventedContinuation`, `stopReason`, `level: "suggestion"`.
+
+### `subtype: "task_started"` / `"task_progress"` / `"task_notification"`
+
+Task-tool lifecycle events.
+
+### Other subtypes
+
+Other values (e.g. `"init"`) have been observed. Log unknowns rather
+than reject.
+
+---
+
+## `attachment`
+
+Records a change in the available tool set mid-session. The primary
+case is `type: "deferred_tools_delta"`, emitted when a deferred tool
+is loaded into the active set.
+
+```json
+{
+  "type": "attachment",
+  "attachment": {
+    "type": "deferred_tools_delta",
+    "addedNames": ["WebFetch", "WebSearch"],
+    "addedLines": ["WebFetch", "WebSearch"],
+    "removedNames": []
+  },
+  "uuid": "...",
+  "parentUuid": "...",
+  ...
+}
+```
+
+---
+
+## `file-history-snapshot`
+
+Captures file state at a given point, keyed by `messageId`:
+
+```json
+{
+  "type": "file-history-snapshot",
+  "messageId": "67602940-a209-437d-a791-72bf4c09c0ea",
+  "snapshot": {
+    "messageId": "67602940-a209-437d-a791-72bf4c09c0ea",
+    "trackedFileBackups": {},
+    "timestamp": "2026-04-02T13:59:26.313Z"
+  },
+  "isSnapshotUpdate": false
+}
+```
+
+`trackedFileBackups` is usually empty in observed data; when populated,
+it maps file paths to references into the content-addressed backups
+under `~/.claude/file-history/`. These support Claude Code's undo
+machinery.
+
+Note: this entry type has **no `uuid`** — the `messageId` it carries
+references a different entry's `uuid`. On resume there's a known bug
+where these `messageId`s collide with real message `uuid`s; see
+[known-issues.md](known-issues.md).
+
+---
+
+## `permission-mode`
+
+Records the active permission mode. Strict three-field shape:
+
+```json
+{"type": "permission-mode", "permissionMode": "default", "sessionId": "..."}
+```
+
+Permission-mode values observed: `"default"`, `"acceptEdits"`, `"plan"`,
+`"bypassPermissions"`.
+
+**Strictness note:** adding any other fields to this entry (including
+`uuid: ""`, `isSidechain: false`, or a trailing `parentUuid`) causes
+Claude Code's loader to reject it. See
+[writing-compatible-jsonl.md](writing-compatible-jsonl.md).
+
+---
+
+## `queue-operation`
+
+Records typed-ahead message queueing during an assistant turn.
+
+```json
+{
+  "type": "queue-operation",
+  "operation": "enqueue",
+  "content": "Probably phrase them as \"virtual artifacts\"?",
+  "timestamp": "2026-04-16T19:33:16.814Z",
+  "sessionId": "..."
+}
+```
+
+`operation` values: `"enqueue"` (user queued a message), `"dequeue"`
+(Claude Code consumed it). Note the top-level `content` field —
+distinct from `message.content`.
+
+---
+
+## `last-prompt`
+
+Cached last user prompt, for resume / history purposes:
+
+```json
+{"type": "last-prompt", "lastPrompt": "the repo is…", "sessionId": "..."}
+```
+
+---
+
+## `summary`
+
+Conversation summary entries. Minimal shape:
+
+```json
+{"type": "summary", "summary": "...", "leafUuid": "..."}
+```
+
+Summaries are generated asynchronously relative to the conversation
+they describe, and a summary may appear **in a different JSONL file**
+than the session it summarizes. `leafUuid` is the UUID of the message
+being summarized.
+
+A parser that wants to associate summaries with their conversations
+must cross-match by `leafUuid` across all files in the project
+directory.
+
+---
+
+## `compact_boundary`
+
+Marks an autocompaction. May appear either as a top-level `type` or as
+`type: "system"` with `subtype: "compact_boundary"` — treat them as
+equivalent.
+
+```jsonc
+{
+  "type": "compact_boundary",
+  "uuid": "...",
+  "parentUuid": null,                     // always null
+  "logicalParentUuid": "...",             // the real prior message UUID
+  "compactMetadata": {
+    "trigger": "auto",                    // or "manual"
+    "preTokens": 180000
+  },
+  ...
+}
+```
+
+Immediately followed by a synthetic `user`-role message with
+`isCompactSummary: true` and `isVisibleInTranscriptOnly: true`
+carrying the compacted summary as its content. See
+[session-chains.md](session-chains.md) for how this interacts with
+file rotation.
+
+---
+
+## `progress`
+
+Long-running tool progress events. Emitted by tools that stream
+intermediate output. Should be **skipped** when reconstructing a
+conversation transcript — they represent incomplete state that is
+superseded by the eventual `tool_result`.
+
+---
+
+## Sidechains
+
+Sidechains aren't an entry type — they're a property (`isSidechain: true`)
+that applies to `user`, `assistant`, and other entries alike. A
+sidechain is a conversation thread spawned by the `Task` tool (a
+subagent) or by the `/btw` slash command (aside questions).
+
+### Representation
+
+Sidechain entries carry:
+- `isSidechain: true` on every entry in the thread.
+- `agentId` — short hash identifying the subagent, e.g. `"a7bf2fd"`.
+- The thread root has `parentUuid: null` and its `message.content` is
+  the Task tool's input prompt (for Task-spawned agents).
+
+Two layouts exist depending on Claude Code version:
+
+1. **Inline** (newer, 2.1.x+) — sidechain entries are written into the
+   same JSONL as the parent session, distinguished only by
+   `isSidechain: true`.
+2. **Separate file** (older) — subagents got their own
+   `<project>/subagents/agent-<hash>.jsonl` files, with matching
+   `.meta.json` sidecars.
+
+### Usage accounting
+
+Sidechain token usage does **not** count against the parent
+conversation's context. Cache-read tokens from sidechains often
+"mirror" the parent because the prompt cache is shared.
+
+### Parent linkage
+
+Sidechain entries carry their own `parentUuid` chain within the
+sidechain thread. The link *to* the parent conversation is implicit —
+established by the `Task` tool invocation in the parent, which carries
+the `agentId` in its result.
+
+---
+
+## Type discrimination pseudocode
+
+```python
+def classify(entry):
+    t = entry["type"]
+    if t == "system":
+        return ("system", entry.get("subtype", "unknown"))
+    if t == "user":
+        return ("user", classify_user(entry))   # see §Classifying a user entry
+    return (t, None)
+```
diff --git a/docs/agents/formats/claude-code/format-changelog.md b/docs/agents/formats/claude-code/format-changelog.md
new file mode 100644
index 0000000..78e27ed
--- /dev/null
+++ b/docs/agents/formats/claude-code/format-changelog.md
@@ -0,0 +1,164 @@
+# Format changelog
+
+A version-keyed record of field and behavior changes we've seen in the
+Claude Code on-disk format. This is compiled from samples on disk and
+from reading upstream code; it is **not** a changelog Anthropic
+publishes, and the exact patch version where something landed is
+usually a best guess.
+
+## How to read this
+
+- **"Observed across 2.1.37 – 2.1.112"** means every sampled version in
+  that range contained it. Versions outside that range are not first-hand.
+- **"2.1.x+ (origin unclear)"** means we saw it in 2.1.x samples but
+  don't know which patch introduced it.
+- **"2.0.x+"** means we believe from upstream code / changelogs that
+  the field was present in 2.0.x, even though we don't have first-hand
+  samples from that era.
+- **"Pre-sample era"** entries describe the pre-2.0 layout from code
+  reading and upstream references, not from files we've inspected.
+
+When precision matters, treat a version here as an upper bound ("no
+later than") unless the note says otherwise.
+
+## Format-revision stamp
+
+This reference tracks Claude Code **2.1.x**. First-hand samples span
+client versions 2.1.37, 2.1.90, 2.1.110, and 2.1.112. Reference
+revision: **2026-04-23**.
+
+---
+
+## Claude Code 2.1.x
+
+### Envelope
+
+| Field / behavior                                       | Status in 2.1.x                                    |
+|--------------------------------------------------------|----------------------------------------------------|
+| `type`, `uuid`, `timestamp`, `sessionId`, `parentUuid` | Observed across 2.1.37 – 2.1.112                   |
+| `cwd`, `gitBranch`, `version`, `userType`, `entrypoint`| Observed across 2.1.37 – 2.1.112                   |
+| `requestId` on assistant entries                       | Observed across 2.1.37 – 2.1.112                   |
+| `slug` (human-readable conversation slug)              | 2.1.x+ (origin unclear); persists across rotations |
+| `agentId` on sidechain entries                         | 2.1.x+ (inline-sidechain layout)                   |
+| `isSidechain: true` for inline sidechains              | 2.1.x+; replaces the older separate-file layout    |
+| `thinkingMetadata` on some user entries                | 2.1.x+ (origin unclear)                            |
+| Hook-injected envelope fields (`hookCount`, `hookInfos`, `hookErrors`, `preventedContinuation`, `stopReason`, `level`) | 2.1.x+ (origin unclear) |
+
+See [jsonl-envelope.md](jsonl-envelope.md) for field definitions.
+
+### Entry types
+
+| Entry type                                             | Status in 2.1.x                                    |
+|--------------------------------------------------------|----------------------------------------------------|
+| `user`, `assistant`, `system`                          | Observed across 2.1.37 – 2.1.112                   |
+| `file-history-snapshot`                                | Observed across 2.1.37 – 2.1.112                   |
+| `permission-mode`                                      | Observed across 2.1.37 – 2.1.112                   |
+| `summary`                                              | Observed across 2.1.37 – 2.1.112                   |
+| `attachment` (deferred-tool deltas)                    | 2.1.x+ (origin unclear)                            |
+| `queue-operation` (typed-ahead message enqueue/dequeue)| 2.1.x+ (origin unclear)                            |
+| `progress` (streaming tool output)                     | 2.1.x+ (origin unclear)                            |
+| `last-prompt`                                          | 2.1.x+ (origin unclear)                            |
+| `compact_boundary` as top-level `type`                 | Newer variant; coexists with older `type: "system"` + `subtype: "compact_boundary"` |
+| `system.subtype` values: `turn_duration`, `stop_hook_summary`, `task_started`/`task_progress`/`task_notification` | 2.1.x+ (origin unclear) |
+
+See [entry-types.md](entry-types.md).
+
+### `message.usage` subfields
+
+| Subfield                                               | Since                                              |
+|--------------------------------------------------------|----------------------------------------------------|
+| `input_tokens`, `output_tokens`                        | Always                                             |
+| `cache_creation_input_tokens`, `cache_read_input_tokens` (flat) | Always (when caching was used)            |
+| `cache_creation: { ephemeral_5m_input_tokens, ephemeral_1h_input_tokens }` | 2.0.x+                         |
+| `service_tier`                                         | 2.0.x+                                             |
+| `server_tool_use: { web_search_requests, web_fetch_requests }` | 2.1.x+                                     |
+| `iterations`                                           | 2.1.x+                                             |
+| `inference_geo`, `speed`                               | 2.1.x+ (often empty on observed samples)           |
+
+See [usage.md](usage.md).
+
+### Content parts
+
+| Part / behavior                                        | Status in 2.1.x                                    |
+|--------------------------------------------------------|----------------------------------------------------|
+| `text`, `tool_use`, `tool_result`                      | Observed across 2.1.37 – 2.1.112                   |
+| `thinking` with `signature`                            | Observed on models that support extended thinking (`claude-opus-4-6`, later `-4-7`) |
+| `redacted_thinking`                                    | Format-defined; rarely on disk in coding sessions  |
+| `image`, `document`, `server_tool_use`, `web_search_tool_result` | Format-defined; rarely on disk in coding sessions |
+
+`thinking` parts require a valid Anthropic-issued `signature`; otherwise
+they are silently dropped on resume. See [messages.md](messages.md).
+
+### Session chains and compaction
+
+| Behavior                                               | Status in 2.1.x                                    |
+|--------------------------------------------------------|----------------------------------------------------|
+| Bridge entry (first real entry of successor file carries the **previous** session's `sessionId`) | Observed across 2.1.37 – 2.1.112 |
+| `slug` persistence across rotations                    | 2.1.x+                                             |
+| Duplicate `compact_boundary` at top of successor files | 2.1.x+ (observed intermittently)                   |
+| Inline compaction (`compact_boundary` + synthetic `user` summary with `isCompactSummary: true` / `isVisibleInTranscriptOnly: true`) | 2.1.x+ is the default |
+| Rotation on autocompact to a separate `acompact-<hash>.jsonl` | Older behavior; not observed in 2.1.x         |
+| Inline sidechains (`isSidechain: true` + `agentId` in the main file) | 2.1.x+ default                           |
+| Separate-file sidechains under `subagents/agent-<hash>.jsonl` | Older behavior; compatibility path only     |
+| `compactMetadata.trigger`: `"auto"` / `"manual"`       | Observed in 2.1.x                                  |
+| `compactMetadata.preTokens`                            | Observed in 2.1.x                                  |
+
+See [session-chains.md](session-chains.md).
+
+### Peripheral files
+
+| Path                                                   | Status in 2.1.x                                    |
+|--------------------------------------------------------|----------------------------------------------------|
+| `~/.claude/todos/`                                     | **Marked legacy**; current versions inline TodoWrite state into `tool_result` entries |
+| `~/.claude/history.jsonl`                              | Observed across 2.1.37 – 2.1.112; Unix-millis timestamps (distinct from session JSONL ISO-8601) |
+| `~/.claude/history.jsonl` per-entry `sessionId`        | Absent on older entries; present on newer ones     |
+| `~/.claude/sessions/sessions-index.json`               | Observed in some versions; can be stale or missing |
+| `~/.claude/file-history/<session-uuid>/<contentHash>@v<versionNumber>` | Observed across 2.1.37 – 2.1.112   |
+| `~/.claude/shell-snapshots/snapshot-<shell>-<unix-millis>-<random>.sh` | Observed across 2.1.37 – 2.1.112   |
+
+See [peripheral-files.md](peripheral-files.md).
+
+### Tools
+
+The built-in tool set is the fastest-moving surface. We do not track it
+as an enumerated changelog here — treat [tools.md §Common tool `input`
+shapes](tools.md#common-tool-input-shapes) as illustrative, and check
+Anthropic's tool documentation for the authoritative current list.
+
+What *is* stable enough to version-track:
+
+| Behavior                                               | Status in 2.1.x                                    |
+|--------------------------------------------------------|----------------------------------------------------|
+| `tool_use` / `tool_result` two-entry pairing           | Observed across 2.1.37 – 2.1.112                   |
+| `sourceToolAssistantUUID` back-reference on tool-result carrier | Observed across 2.1.37 – 2.1.112          |
+| Top-level `toolUseResult` as sibling of `message`      | Observed across 2.1.37 – 2.1.112 for structured-output tools |
+| Parallel tool calls (multiple `tool_use` in one assistant entry) | Observed across 2.1.37 – 2.1.112         |
+| Very-large-output spill to `projects/<project>/<session>/tool-results/` | Documented behavior; not observed first-hand in our samples |
+
+---
+
+## Pre-sample era (before 2.1.37)
+
+Drawn from upstream code reading and adjacent tooling, not from files
+we've inspected directly. Treat as orientation, not as verified fact.
+
+- **Subagents stored per-file** under
+  `projects/<project>/subagents/agent-<hash>.jsonl`, with a matching
+  `.meta.json` sidecar, and autocompacted subagents under
+  `agent-acompact-<hash>.jsonl`.
+- **Autocompaction rotated to a new file** named `acompact-<hash>.jsonl`
+  rather than recording an inline `compact_boundary`.
+- **`message.usage`** was flatter — no `cache_creation` TTL breakdown,
+  no `service_tier`, no `server_tool_use`, no `iterations`.
+- **`history.jsonl`** entries did not always carry `sessionId`.
+
+---
+
+## Process
+
+When you add a field, entry type, or behavior note to any other doc in
+this directory, add a corresponding row here in the same change. Cite
+the version where you can; cite "2.1.x+ (origin unclear)" when you
+can't. Bump the format-revision stamp at the top of this file *and*
+the matching stamp in [README.md](README.md) whenever you update the
+changelog.
diff --git a/docs/agents/formats/claude-code/jsonl-envelope.md b/docs/agents/formats/claude-code/jsonl-envelope.md
new file mode 100644
index 0000000..0a04996
--- /dev/null
+++ b/docs/agents/formats/claude-code/jsonl-envelope.md
@@ -0,0 +1,150 @@
+# Session JSONL: the line envelope
+
+A session JSONL is a file where each line is a standalone JSON object.
+No framing, no leading sentinel, no trailer. Lines are terminated with
+`\n`; blank lines occur but should be ignored.
+
+This document covers the **envelope** — the top-level fields that wrap
+each line. The `message` sub-object and tool-invocation details have
+their own docs ([messages.md](messages.md), [tools.md](tools.md)).
+
+## Field conventions
+
+- **Envelope keys are camelCase** (`parentUuid`, `sessionId`,
+  `isSidechain`).
+- **Keys inside `message` that map to the Anthropic API are snake_case**
+  (`stop_reason`, `input_tokens`). Some older samples use camelCase
+  aliases for API-side keys; a tolerant parser accepts both.
+- **Field presence depends on entry type.** There is no single union
+  that applies to every line; `type` is the discriminant. See
+  [entry-types.md](entry-types.md).
+
+## Complete field catalogue
+
+Every envelope field we have observed, in rough order of prominence:
+
+### Identity / position
+
+| Field          | Shape                      | Notes |
+|----------------|----------------------------|-------|
+| `type`         | string                     | Discriminant. Values in [entry-types.md](entry-types.md). Present on **every** line. |
+| `uuid`         | UUIDv4 string              | Per-entry ID. Empty string (`""`) or absent on some metadata entries (`permission-mode`, `queue-operation`, `last-prompt`, `file-history-snapshot`). |
+| `timestamp`    | ISO-8601 string            | e.g. `"2026-04-02T13:59:26.313Z"`. Millisecond precision. Absent on pure-metadata entries. |
+| `sessionId`    | UUIDv4 string              | Usually equals the filename stem. On continuation files, the first real entry carries the **previous** session's ID — see [session-chains.md](session-chains.md). |
+| `parentUuid`   | UUIDv4 string \| `null`    | Prior entry in the conversation DAG. `null` for the first entry of a session and for `compact_boundary` entries (which use `logicalParentUuid` instead). |
+
+### Context at time of entry
+
+| Field          | Shape             | Notes |
+|----------------|-------------------|-------|
+| `cwd`          | absolute path     | Working directory when the entry was written. Will match the pre-sanitization path used in `projects/`. |
+| `gitBranch`    | string            | Current branch. **Empty string** (not `null`, not absent) when the cwd isn't a git repo. |
+| `version`      | string            | Claude Code client version, e.g. `"2.1.90"`. Observed values include `2.1.37`, `2.1.90`, `2.1.110`, `2.1.112`. Absent on some metadata entries. |
+| `userType`     | string            | Almost always `"external"`. Other values (e.g. `"ant"` for Anthropic-internal) have been reported but are not common. |
+| `entrypoint`   | string            | How the session was launched: `"cli"` is typical. Other values (e.g. `"claude-desktop"`) exist. |
+
+### API correlation
+
+| Field        | Shape        | Appears on | Notes |
+|--------------|--------------|------------|-------|
+| `requestId`  | string       | `assistant` | Anthropic API request ID, e.g. `"req_011CZf4hD7fj…"`. Useful for deduping streamed messages. |
+| `messageId`  | UUID string  | `file-history-snapshot` (and as a reference from other entries) | The Anthropic API message ID, *distinct* from the entry's own `uuid`. Known collision bug on resume (see [known-issues.md](known-issues.md)). |
+
+### Conversation threading
+
+| Field             | Shape       | Notes |
+|-------------------|-------------|-------|
+| `isSidechain`     | bool        | `true` for entries belonging to a Task-tool-spawned subagent. See [entry-types.md §Sidechains](entry-types.md#sidechains). |
+| `agentId`         | short hex string | Identifies a subagent thread. Present on sidechain entries. Format is a short hash (e.g. `"a7bf2fd"`). |
+| `slug`            | string      | Human-readable conversation slug (e.g. `"crystalline-giggling-sunset"`). **Persists across rotations**, making it one of the more reliable signals for linking continuation files. |
+
+### Tool mechanics
+
+| Field                      | Shape   | Appears on | Notes |
+|----------------------------|---------|------------|-------|
+| `toolUseResult`            | object  | `user` entries carrying `tool_result` blocks | Top-level structured summary of the tool's output. Sibling of `message`, not nested. See [tools.md](tools.md). |
+| `sourceToolAssistantUUID`  | UUIDv4  | `user` entries carrying `tool_result` blocks | `uuid` of the assistant entry that issued the matching `tool_use`. |
+
+### Entry-type-specific fields
+
+| Field                | Shape   | Appears on | Notes |
+|----------------------|---------|------------|-------|
+| `message`            | object  | `user`, `assistant`  | Conversation payload. See [messages.md](messages.md). |
+| `snapshot`           | object  | `file-history-snapshot` | `{messageId, trackedFileBackups, timestamp}`. |
+| `isSnapshotUpdate`   | bool    | `file-history-snapshot` | Whether this snapshot updates a prior one. |
+| `permissionMode`     | string  | `permission-mode` | `"default"`, `"acceptEdits"`, `"plan"`, `"bypassPermissions"`. |
+| `attachment`         | object  | `attachment` | Delta to available tools, e.g. `{type: "deferred_tools_delta", addedNames: [...], addedLines: [...], removedNames: [...]}`. |
+| `operation`          | string  | `queue-operation` | `"enqueue"` / `"dequeue"`. |
+| `content`            | string  | `queue-operation` | Queued message text. **Conflicts in name with `message.content`** — distinguish by entry `type`. |
+| `lastPrompt`         | string  | `last-prompt` | Cached last user prompt. |
+| `subtype`            | string  | `system`, `compact_boundary` | Discriminant within metadata entries. See [entry-types.md](entry-types.md). |
+| `durationMs`         | number  | `system` (turn_duration) | Milliseconds the assistant turn took. |
+| `messageCount`       | number  | `system` (turn_duration) | Number of messages in the turn. |
+| `isMeta`             | bool    | various     | API-visibility flag. `true` marks entries the loader should hide from the transcript sent back to the API. |
+| `isCompactSummary`   | bool    | synthetic user message after `compact_boundary` | Always paired with `isVisibleInTranscriptOnly: true`. |
+| `isVisibleInTranscriptOnly` | bool | see above | Entry is visible in the UI but not replayed to the model. |
+| `logicalParentUuid`  | UUID    | `compact_boundary` | Points at the pre-compact last message. `parentUuid` is `null` on these. |
+| `compactMetadata`    | object  | `compact_boundary` | `{trigger: "auto"|"manual", preTokens: number}`. |
+| `thinkingMetadata`   | object  | some user entries | `{level, disabled, triggers[]}`. Indicates extended-thinking configuration. |
+
+### Hook-injected fields
+
+When a hook contributes to an entry (e.g. `Stop` hook writing a summary
+`system` entry), additional fields appear:
+
+| Field                    | Shape           | Notes |
+|--------------------------|-----------------|-------|
+| `hookCount`              | number          | How many hooks ran for this event. |
+| `hookInfos`              | array           | Per-hook info objects. |
+| `hookErrors`             | array           | Per-hook error objects. |
+| `preventedContinuation`  | bool            | Whether hook output blocked Claude from continuing. |
+| `stopReason`             | string          | Hook-provided reason (distinct from `message.stop_reason`). |
+| `level`                  | string          | Severity, e.g. `"suggestion"`, `"info"`. |
+
+## Unknown fields
+
+New fields appear across versions. A parser should flatten unknown keys
+into an `extra` map rather than reject the line.
+
+## Parser surface vs. format surface
+
+This doc catalogues fields at the *format* level — every key we have
+observed on an envelope. Our parser doesn't type all of them equally.
+`toolpath-claude`'s `ConversationEntry`
+([`crates/toolpath-claude/src/types.rs`](../../../../crates/toolpath-claude/src/types.rs))
+promotes the fields a typical consumer needs into named Rust fields and
+lets everything else land in a `#[serde(flatten)] extra: HashMap<String,
+Value>`.
+
+| Field name (JSON)          | Where it lives in `ConversationEntry` |
+|----------------------------|---------------------------------------|
+| `type`                     | `entry_type: String`                  |
+| `uuid`                     | `uuid: String`                        |
+| `timestamp`                | `timestamp: String`                   |
+| `sessionId`                | `session_id: Option<String>`          |
+| `parentUuid`               | `parent_uuid: Option<String>`         |
+| `cwd`                      | `cwd: Option<String>`                 |
+| `gitBranch`                | `git_branch: Option<String>`          |
+| `version`                  | `version: Option<String>`             |
+| `userType`                 | `user_type: Option<String>`           |
+| `isSidechain`              | `is_sidechain: bool` (default false)  |
+| `message`                  | `message: Option<Message>`            |
+| `requestId`                | `request_id: Option<String>`          |
+| `toolUseResult`            | `tool_use_result: Option<Value>`      |
+| `snapshot`                 | `snapshot: Option<Value>`             |
+| `messageId`                | `message_id: Option<String>`          |
+| *everything else*          | `extra: HashMap<String, Value>`       |
+
+Concretely, fields like `slug`, `agentId`, `entrypoint`,
+`sourceToolAssistantUUID`, `permissionMode`, `attachment`, `operation`,
+`content` (on `queue-operation`), `lastPrompt`, `subtype`, `durationMs`,
+`messageCount`, `isMeta`, `isCompactSummary`,
+`isVisibleInTranscriptOnly`, `logicalParentUuid`, `compactMetadata`,
+`thinkingMetadata`, and the hook-injected envelope fields
+(`hookCount`, `hookInfos`, `hookErrors`, `preventedContinuation`,
+`stopReason`, `level`) are all documented here but accessed via
+`entry.extra["<jsonKey>"]` in code. This is deliberate — it keeps the
+typed surface small while round-tripping arbitrary format drift.
+
+If you're building a new consumer in another language, the same split
+tends to work: a core struct for the hot fields, a map for the rest.
diff --git a/docs/agents/formats/claude-code/known-issues.md b/docs/agents/formats/claude-code/known-issues.md
new file mode 100644
index 0000000..bc12f86
--- /dev/null
+++ b/docs/agents/formats/claude-code/known-issues.md
@@ -0,0 +1,155 @@
+# Known issues and corruption modes
+
+A parser that wants to be robust against real-world Claude Code output
+needs to defend against format-level bugs, races, and drift. This is
+the list we've built up from upstream issue trackers, our own testing,
+and adjacent tools.
+
+## Format bugs Anthropic has shipped
+
+### Dangling `parentUuid` references
+
+An entry's `parentUuid` may reference a UUID that doesn't exist
+anywhere in the file. Observed intermittently; causes DAG reconstruction
+to fail if you assume every `parentUuid` resolves.
+
+**Defense:** during traversal, treat missing parents as roots rather
+than erroring out.
+
+### `file-history-snapshot.messageId` collisions on resume
+
+On resume, `file-history-snapshot` entries reuse the `uuid` of prior
+message entries as their `messageId`. This means if you index entries
+by UUID, the snapshot collides with the real message.
+
+**Impact:** up to 25% of entries unreachable via `parentUuid`
+traversal in heavily-resumed sessions.
+
+**Defense:** don't use `messageId` as a unique key across entry types.
+Key by `(type, uuid)` or by `(type, messageId)` specifically for
+snapshots.
+
+### Missing `stop_reason` on assistant entries
+
+Assistant entries are frequently persisted with `stop_reason: null`
+even for turns that completed normally — the entry is written before
+the streaming API response finalizes.
+
+**Defense:** don't treat `stop_reason == null` as "in-flight". Infer
+`"end_turn"` for assistant entries missing it, unless you have direct
+evidence the turn was interrupted.
+
+### Autocompact can destroy in-progress work
+
+Under some conditions an autocompact pass can replace in-progress
+scratch work with a summary, effectively losing it. This is a Claude
+Code bug, not a format bug — but if you're archiving sessions for
+replay, a post-compact file may be less useful than the pre-compact
+snapshot.
+
+**Defense:** if continuity matters, checkpoint before compaction.
+
+## Race conditions
+
+### Multi-terminal writes to the same project
+
+Multiple Claude Code instances launched in the same cwd may write to
+**the same project directory** at overlapping times, producing
+interleaved turns across session files or (more rarely) interleaved
+lines within a single file.
+
+**Defense:** a session reader should gate discovery by `created_at`
+timestamps or by content-match against known user input, so that one
+instance doesn't accidentally pick up another's session.
+
+### `sessions-index.json` drift
+
+The optional `~/.claude/sessions/sessions-index.json` can be missing,
+stale, or inconsistent with the files actually on disk.
+
+**Defense:** treat the index as a cache; rescan the filesystem when
+precision matters.
+
+## Format-level invisibles
+
+These aren't bugs, but they trip up consumers who expect the JSONL to
+be a complete record of what happened.
+
+### Permission prompts
+
+When Claude Code gates a tool call behind a permission prompt, the
+prompt itself and the user's accept/deny decision leave **no JSONL
+entry**. The only observable signal is a gap in timestamps between
+the `tool_use` and the corresponding `tool_result`.
+
+**Defense:** if you're trying to distinguish "Claude took 5 seconds to
+run Bash" from "user took 5 seconds deciding to allow Bash," you can't.
+Treat timestamp gaps around tool results with uncertainty.
+
+### Mid-turn text-only assistant entries
+
+A single logical assistant turn can span multiple `assistant` JSONL
+entries — thinking, text, and `tool_use` can each be separate entries.
+Text-only assistant entries can appear **between** tool-call batches,
+not only at end-of-turn.
+
+**Defense:** don't treat "assistant emitted text" as "turn is over."
+Use `stop_reason: "end_turn"` or a following `system/turn_duration`
+entry as the authoritative end-of-turn signal.
+
+### Agent-progress nesting variance
+
+Some agent-progress entries wrap the message double-nested
+(`data.message.message` instead of `data.message`) depending on
+version. If you're parsing Task-tool progress events, try both paths.
+
+## Drift between versions
+
+The format is stable at the coarse level but drifts in details.
+Patterns:
+
+- **New optional fields** appear additively. Unknown envelope fields
+  should be preserved through reads/writes.
+- **Entry-type enumerations** grow. New `system.subtype` values and
+  new top-level `type` values appear without notice.
+- **Tool input schemas** change when Anthropic ships built-in tool
+  updates. Preserve unknown input keys.
+- **`usage` subfields** (notably `server_tool_use`, `iterations`,
+  `cache_creation`) were added in 2.0.x → 2.1.x. Older files lack
+  them.
+
+**Defense:** always flatten unknown fields into a catch-all rather
+than rejecting. Version-gate any behavior that assumes a specific
+subfield is present.
+
+## Corruption modes to expect
+
+- **Truncated last line.** If Claude Code was killed mid-write, the
+  final line of a session JSONL may be partial or invalid JSON.
+  Expect an unterminated-string or unexpected-EOF parse error on
+  the tail.
+- **Empty lines.** Blank lines appear mid-file in some samples. Skip
+  them rather than error.
+- **Whitespace-only lines.** Same handling as empty lines.
+- **Large output overflow.** Very large tool outputs (pages of text)
+  may be spilled to `projects/<project>/<session>/tool-results/`
+  rather than inlined. A parser that assumes `tool_result.content` is
+  always inline may return stub content instead of the real output.
+- **Cycles in the parent DAG.** Rare, but possible with corrupted
+  data. Traverse defensively.
+
+## Things that look like bugs but aren't
+
+- **`gitBranch: ""`** — empty string, not null, is the correct
+  representation when cwd isn't a git repo.
+- **`stop_reason: null`** — see above. Not a crash; a timing artifact.
+- **Multiple assistant entries per turn** — intentional. Thinking,
+  text, and tool calls get their own entries.
+- **Tool-result-only user entries** — intentional. These are synthesized,
+  not user input. Fold them into the preceding assistant turn when
+  rendering.
+- **First entry's `sessionId` != filename stem** — intentional. Bridge
+  entry marking a session continuation. See
+  [session-chains.md](session-chains.md).
+- **`compact_boundary.parentUuid: null`** — intentional. The real
+  prior message is in `logicalParentUuid`.
diff --git a/docs/agents/formats/claude-code/messages.md b/docs/agents/formats/claude-code/messages.md
new file mode 100644
index 0000000..e8c2f8e
--- /dev/null
+++ b/docs/agents/formats/claude-code/messages.md
@@ -0,0 +1,179 @@
+# The `message` object
+
+`user` and `assistant` entries carry a `message` object containing the
+actual conversation payload. Metadata entries (`system`, `attachment`,
+`permission-mode`, etc.) do not.
+
+## Shape
+
+```jsonc
+{
+  "role":          "user" | "assistant" | "system",
+  "content":       string | ContentPart[],
+  "model":         "claude-opus-4-6",   // assistant only
+  "id":            "msg_01P33i…",       // assistant only — Anthropic API message ID
+  "type":          "message",           // assistant only — always the literal "message"
+  "stop_reason":   "end_turn" | "tool_use" | "stop_sequence" | null,
+  "stop_sequence": string | null,
+  "usage":         Usage                // assistant only — see usage.md
+}
+```
+
+Both `stop_reason` and `stop_sequence` can be `null`. `stop_reason` is
+frequently `null` on disk even for turns that completed normally,
+because the entry is persisted before the streaming API response
+finalizes — see [known-issues.md](known-issues.md).
+
+## The string-or-array `content` trap
+
+`message.content` is **either a bare string or an array of content
+parts**, depending on what the entry carries. A parser must handle both
+shapes.
+
+```jsonc
+// Bare string — typical for simple user prompts
+{"role": "user", "content": "what does this file do?"}
+
+// Array of parts — assistant responses, tool results, slash commands, paste
+{"role": "assistant", "content": [{"type": "text", "text": "Done."}]}
+```
+
+Empirically:
+- **User entries** use both shapes. Direct prompts are strings; tool
+  result carriers are arrays.
+- **Assistant entries** use arrays virtually always. Even a
+  plain-text response is wrapped as `[{"type": "text", "text": "…"}]`.
+  The Claude Code *loader* relies on this — see
+  [writing-compatible-jsonl.md](writing-compatible-jsonl.md).
+
+## Content part types
+
+Each part has a `type` discriminant.
+
+### `text`
+
+Plain text, the common case.
+
+```json
+{"type": "text", "text": "I'll help with that."}
+```
+
+### `thinking`
+
+Extended-thinking output. Only emitted by models that support it
+(`claude-opus-4-6` and later `-4-7`). Two fields:
+
+```jsonc
+{
+  "type": "thinking",
+  "thinking": "…reasoning text…",
+  "signature": "EoYDClkIDBgCKkDVU…"   // base64; ~450 chars; cryptographic proof
+}
+```
+
+The `signature` is an Anthropic-issued cryptographic proof of the
+thinking content. Thinking blocks without a valid signature are
+**rejected** as prior-turn context when the session is resumed — the
+API won't replay them back to the model. A tool that rewrites or
+truncates thinking content will break resume.
+
+Empty-string `thinking` values with a valid `signature` are observed
+and legal (they represent thinking content that was later redacted).
+
+### `tool_use`
+
+A tool call issued by the assistant.
+
+```jsonc
+{
+  "type": "tool_use",
+  "id": "toolu_01LJtcvehSnXfMztfk8b8ZLC",
+  "name": "Grep",
+  "input": {
+    "pattern": "crab.?city",
+    "-i": true,
+    "output_mode": "files_with_matches"
+  },
+  "caller": {"type": "direct"}       // optional; origin of the call
+}
+```
+
+- **`id`** — Anthropic tool-use ID with the `toolu_` prefix. Primary
+  key linking this `tool_use` to its eventual `tool_result`.
+- **`name`** — tool name. Built-in tools use PascalCase (`Read`,
+  `Bash`, `Grep`). MCP-provided tools are namespaced:
+  `mcp__<server>__<tool>`.
+- **`input`** — tool-specific parameter object. See [tools.md](tools.md).
+- **`caller`** — optional. `{type: "direct"}` for directly-issued
+  calls. The full enumeration of `caller.type` values isn't well-known.
+
+### `tool_result`
+
+The result of a prior tool call, carried by the following **user**
+entry.
+
+```jsonc
+{
+  "type": "tool_result",
+  "tool_use_id": "toolu_01LJtcvehSnXfMztfk8b8ZLC",
+  "content": "Found 79 files\npackages/…",  // string OR array of text parts
+  "is_error": false                          // default false
+}
+```
+
+- **`tool_use_id`** — matches the `id` from the issuing `tool_use`.
+- **`content`** — string, or an array of objects shaped
+  `{text: "..."}`. An array with multiple parts should be joined with
+  `\n` to recover the intended output.
+- **`is_error`** — defaults to `false`. `true` indicates the tool
+  raised or returned an error.
+
+See [tools.md](tools.md) for the full tool-invocation lifecycle.
+
+### Other content types (API-level, rarely seen on disk)
+
+The Anthropic API defines several content-block types that Claude Code
+can in principle persist but that don't commonly show up in typical
+coding sessions:
+
+- `image` — image blocks in messages
+- `document` — attached documents
+- `redacted_thinking` — thinking content redacted by Anthropic
+- `server_tool_use` — server-executed tool calls (e.g. built-in web search)
+- `web_search_tool_result` — results from server-side web search
+
+A tolerant parser preserves unknown `type` values rather than failing.
+
+## Role values
+
+Lowercase: `"user"`, `"assistant"`, `"system"`. The `role` inside
+`message` is distinct from the envelope `type` — they agree in every
+sample we've inspected (`type: "user"` ↔ `role: "user"`,
+`type: "assistant"` ↔ `role: "assistant"`), but they are separate
+fields in the format. Key type discrimination off the envelope `type`;
+`role` carries redundant information.
+
+## Convenience: detecting "empty" user turns
+
+A user entry that carries only `tool_result` parts (no text, no image,
+no other user input) is a synthesized tool-result carrier, not a real
+user turn. Detect it with:
+
+```python
+def is_tool_result_only(entry):
+    msg = entry.get("message")
+    if not msg or msg.get("role") != "user":
+        return False
+    content = msg.get("content")
+    if isinstance(content, str):
+        return False  # bare string is always a real user turn
+    # content is an array
+    for part in content:
+        t = part.get("type")
+        if t != "tool_result":
+            return False
+    return bool(content)
+```
+
+UIs rendering a transcript typically fold these into the preceding
+assistant turn. See [tools.md](tools.md).
diff --git a/docs/agents/formats/claude-code/peripheral-files.md b/docs/agents/formats/claude-code/peripheral-files.md
new file mode 100644
index 0000000..e0a1681
--- /dev/null
+++ b/docs/agents/formats/claude-code/peripheral-files.md
@@ -0,0 +1,185 @@
+# Peripheral files
+
+Everything under `~/.claude/` that isn't a session JSONL. Most of these
+are supporting state that a conversation reader can ignore, but they
+matter if you're building anything broader than a transcript viewer.
+
+## `~/.claude/history.jsonl`
+
+Global user-prompt history, across all projects. JSONL, same line
+framing as sessions.
+
+### Line shape
+
+```json
+{
+  "display": "The following fails with an error: bazel build --compilation_mode opt //packages/galaxy/...",
+  "pastedContents": {},
+  "timestamp": 1759167114955,
+  "project": "/Users/alex/Devel/empathic/empathic",
+  "sessionId": "..."
+}
+```
+
+Fields:
+- **`display`** — user's prompt text as shown in the UI.
+- **`pastedContents`** — object mapping paste-slot IDs to pasted
+  content (e.g. files the user dragged in). Usually empty.
+- **`timestamp`** — Unix milliseconds. Note: JSONL sessions use
+  ISO-8601; `history.jsonl` uses epoch ms.
+- **`project`** — absolute workspace path (the pre-sanitization form).
+- **`sessionId`** — session the prompt was submitted in. Absent in
+  older entries; treat as optional.
+
+Unlike session JSONLs, `history.jsonl` is not subject to the 30-day
+cleanup — prompts are kept indefinitely. Users can opt out with
+`CLAUDE_CODE_SKIP_PROMPT_HISTORY=1`.
+
+## `~/.claude/settings.json`
+
+User-level config. JSON, not JSONL. Holds permissions, hooks,
+attribution settings, `statusLine` config, env vars. Project-local
+overrides live at `<project>/.claude/settings.json` and
+`.claude/settings.local.json`.
+
+Not covered in detail here — see Claude Code's own settings docs when
+writing a tool that consumes this file.
+
+## `~/.claude.json`
+
+(Note: file in `$HOME`, not under `~/.claude/`.) A single JSON file
+holding:
+
+- OAuth tokens and account state.
+- Per-project state: `mcpServers`, `allowedTools`,
+  `hasTrustDialogAccepted`, `lastSessionId`, `projectOnboardingSeen`.
+- Cross-session preferences.
+
+Backed up automatically to `~/.claude/backups/.claude.json.backup.<unix-millis>`
+on changes.
+
+## `~/.claude/todos/`
+
+Per-session TodoWrite state. **Marked legacy** by Anthropic — current
+Claude Code versions no longer write here; TodoWrite state is inlined
+into the session's tool-result entries instead. Safe to delete.
+
+Filename convention: `<sessionId>-agent-<agentId>.json`.
+
+Content: a JSON array of todo objects.
+
+```json
+[
+  {"content": "Design a combined stream…", "status": "completed", "priority": "high", "id": "1"},
+  {"content": "Wire it to handle_connection", "status": "pending", "priority": "medium", "id": "2"}
+]
+```
+
+## `~/.claude/shell-snapshots/`
+
+Captured shell environment at session start. One file per snapshot.
+
+Filename: `snapshot-<shell>-<unix-millis>-<random>.sh`, e.g.
+`snapshot-zsh-1759194893070-ab12cd.sh`.
+
+Content: a shell script that recreates the user's environment —
+function definitions, aliases, exported variables. Claude Code sources
+these when running `Bash` tool commands so they operate in the same
+environment the user would see.
+
+## `~/.claude/file-history/`
+
+Content-addressed file backups supporting undo/rollback. When Claude
+Code edits a file, it writes the pre-edit content here before applying
+the change.
+
+Structure: `<session-uuid>/<contentHash>@v<versionNumber>`.
+
+The `trackedFileBackups` object on `file-history-snapshot` entries
+references entries here by (contentHash, versionNumber).
+
+## `~/.claude/session-env/`
+
+Per-session environment-variable snapshots. Typically one empty
+directory per session (the dir itself is a presence marker).
+
+## `~/.claude/plans/`
+
+Plan-mode artifacts. When a user enters plan mode, the plan drafts
+Claude produces land here.
+
+## `~/.claude/sessions/`
+
+A session cache / index. In some versions, contains a
+`sessions-index.json` with a global view of all sessions across
+projects. Fields per entry:
+
+```
+{version: 1, sessions: [
+  {sessionId, fullPath, fileMtime, firstPrompt, messageCount,
+   created, modified, gitBranch, projectPath, isSidechain}
+]}
+```
+
+Can be stale or missing. Treat it as a hint, not ground truth — rescan
+`projects/` if precision matters.
+
+## `~/.claude/statsig/`
+
+Anthropic's feature-flag cache. Fixed filenames:
+
+- `statsig.cached.evaluations.<id>`
+- `statsig.session_id.<id>`
+- `statsig.stable_id.<id>`
+
+Not useful to external tools.
+
+## `~/.claude/plugins/`
+
+Installed plugins directory. `plugins/blocklist.json` lists plugins
+Anthropic has flagged:
+
+```json
+{
+  "fetchedAt": "...",
+  "plugins": [{"plugin": "...", "added_at": "...", "reason": "...", "text": "..."}]
+}
+```
+
+## `~/.claude/debug/`
+
+Per-session debug logs. Text files, one per session. Verbose logging
+of what Claude Code did internally. Useful for investigating bugs.
+
+## `~/.claude/paste-cache/`
+
+Clipboard / paste history. Attachments pasted into Claude Code land
+here. Referenced by paste-slot IDs inside `history.jsonl`'s
+`pastedContents`.
+
+## `~/.claude/image-cache/`
+
+Cached images from pasted screenshots, URLs, etc.
+
+## `~/.claude/ide/`
+
+Per-IDE lock files. When Claude Code runs in an IDE extension (VS Code,
+JetBrains), each active IDE process creates a `.lock` file here to
+coordinate.
+
+## `~/.claude/backups/`, `~/.claude/cache/`, `~/.claude/chrome/`
+
+Internal state directories, generally not interesting to external
+tools.
+
+## `stats-cache.json`
+
+File at `~/.claude/stats-cache.json`. Cached usage statistics for the
+CLI's stats commands. Full schema varies; not commonly consumed
+externally.
+
+## Cleanup / retention
+
+Session files and most peripheral state are cleaned up after
+`cleanupPeriodDays` days (default 30). `history.jsonl` is not on that
+schedule. Plugins, settings, and OAuth state are never cleaned up.
diff --git a/docs/agents/formats/claude-code/session-chains.md b/docs/agents/formats/claude-code/session-chains.md
new file mode 100644
index 0000000..b14406f
--- /dev/null
+++ b/docs/agents/formats/claude-code/session-chains.md
@@ -0,0 +1,219 @@
+# Session chains, rotation, and compaction
+
+A single logical Claude Code conversation can span multiple JSONL files.
+Claude Code **rotates** to a new file under several conditions, and
+**compacts** the in-memory conversation mid-file when the context
+window fills. Reconstructing a full logical conversation means
+understanding both mechanics.
+
+## Rotation triggers
+
+Claude Code begins a new JSONL file (new session UUID, new filename) in
+at least these situations:
+
+- **Context overflow / autocompaction in some versions** — older
+  versions rotated to an `acompact-<hash>.jsonl` on autocompact.
+  Current versions usually keep compaction inline; see
+  [§Compaction](#compaction--compact_boundary).
+- **Exiting plan mode** — leaving plan mode reliably produces a new
+  file.
+- **Explicit resume** — `claude --resume <session-uuid>` continues
+  into a new file.
+- **`--fork-session`** — explicitly fork a session into a branch.
+- **Manual user action** (e.g. `/clear` followed by continuation in
+  some setups).
+
+Don't rely on any filename prefix as a rotation signal — modern
+versions give successor files the same UUID-naming scheme as fresh
+sessions.
+
+## Detecting a continuation
+
+When a file exists that continues a prior session, the continuation is
+signalled by one or more of these markers:
+
+### 1. The bridge entry (primary signal)
+
+The **first real entry** of a continuation file carries the **previous
+session's** `sessionId`, not its own filename UUID.
+
+```
+file:  session-B.jsonl
+line 1 (first real entry):  { "sessionId": "session-A", ... }  ← bridge
+line 2+:                    { "sessionId": "session-B", ... }
+```
+
+This is the most reliable signal. Algorithm for classifying any file:
+
+```python
+def is_continuation_file(path):
+    first_entry = read_first_entry(path)
+    file_uuid = filename_stem(path)
+    return first_entry.get("sessionId") != file_uuid
+```
+
+A file whose first entry's `sessionId` matches the filename is a
+**chain head** (fresh session or the start of a multi-file chain).
+A file whose first entry's `sessionId` is different is a **successor**
+of that earlier sessionId.
+
+### 2. The `slug` field
+
+A conversation slug (e.g. `"crystalline-giggling-sunset"`) persists
+across rotations. Files sharing the same `slug` belong to the same
+logical conversation.
+
+### 3. Duplicate `compact_boundary`
+
+A continuation file may begin with a byte-for-byte identical
+`compact_boundary` record copied from the parent, followed by the
+synthetic compaction summary.
+
+### 4. `parentUuid` bridge
+
+The new file's first real entry's `parentUuid` points to the parent
+file's last entry's `uuid`.
+
+## Chain resolution
+
+Given a set of files in a project directory, you can build the full
+chain graph by:
+
+1. For each file, read the first real entry and compute
+   `(filename_uuid, first_entry_sessionId)`.
+2. If `first_entry_sessionId != filename_uuid`, record an edge:
+   `first_entry_sessionId → filename_uuid` (predecessor → successor).
+3. Walk backward from any given file to the chain **head** (no
+   predecessor) and forward to the chain **tail** (no successor).
+
+A chain head's `sessionId` equals its filename UUID; every other
+file in the chain has an incoming edge.
+
+**Cycle defense:** don't assume the graph is always acyclic. Corrupt
+data has produced self-loops and cycles in practice. Track visited
+nodes during traversal.
+
+## Reading a logical conversation
+
+To present a multi-file conversation as a single stream:
+
+1. Read the chain in order (head → tail).
+2. Concatenate entries.
+3. **Skip bridge entries.** The first real entry of any non-head file
+   is a replay of the parent's last entry (it carries the parent's
+   `sessionId`) and would cause duplicates if kept.
+
+```python
+def read_chain(chain_files):
+    for i, path in enumerate(chain_files):
+        for j, entry in enumerate(read_jsonl(path)):
+            is_first_real = (j == first_real_line(path))
+            is_bridge = (
+                is_first_real and i > 0 and
+                entry.get("sessionId") != filename_stem(path)
+            )
+            if is_bridge:
+                continue
+            yield entry
+```
+
+Or simpler: **keep only entries whose `sessionId` matches the filename's
+session ID**. This drops bridge entries and any other replayed
+prefix records.
+
+## Compaction — `compact_boundary`
+
+When the context window fills, Claude Code runs an autocompaction pass
+that replaces the older portion of the conversation with a summary.
+The JSONL records this inline with a `compact_boundary` entry.
+
+### The boundary marker
+
+```jsonc
+{
+  "type": "compact_boundary",    // or "system" with "subtype": "compact_boundary"
+  "uuid": "...",
+  "parentUuid": null,            // always null on the boundary
+  "logicalParentUuid": "...",    // points at the real prior message
+  "compactMetadata": {
+    "trigger": "auto",           // "auto" or "manual" (user ran /compact)
+    "preTokens": 180000          // conversation size before compaction
+  },
+  "sessionId": "...",
+  "timestamp": "..."
+}
+```
+
+Key property: `parentUuid` is `null`, resetting the DAG. The actual
+prior message is referenced via `logicalParentUuid` so UIs can still
+render the pre-compact history.
+
+### The synthetic summary
+
+Immediately after the boundary, a synthetic `user`-role message carries
+the compacted-history summary as its content:
+
+```jsonc
+{
+  "type": "user",
+  "message": {
+    "role": "user",
+    "content": "...compacted summary text..."
+  },
+  "isCompactSummary": true,
+  "isVisibleInTranscriptOnly": true,
+  ...
+}
+```
+
+These flags tell the loader:
+- **`isCompactSummary: true`** — this is the synthesized summary, not
+  real user input.
+- **`isVisibleInTranscriptOnly: true`** — show it in the UI transcript
+  but don't replay it to the model (the model sees the actual
+  compacted context, which was already injected on the API side).
+
+When rendering a transcript, skip these synthetic summary entries or
+mark them specially; treating them as real user messages will confuse
+consumers.
+
+### Compaction strategies
+
+Several compaction strategies exist internally:
+
+- **Full compaction** — the default autocompact. Everything older than
+  a threshold is replaced by a summary.
+- **Session-memory compaction** — summary preserved in session memory
+  across rotations.
+- **Microcompaction** — after the prompt cache's 1h TTL expires for
+  older tool outputs, those outputs are cleared while keeping the last
+  ~5 tool results. Less aggressive than full compaction.
+
+From a format perspective, all three produce the same
+`compact_boundary` + synthetic-summary sequence; the `trigger` value
+and the `preTokens` count are the discriminators.
+
+## Subagent / sidechain files
+
+Older Claude Code versions stored subagents in separate files:
+
+```
+projects/<project>/subagents/agent-<hash>.jsonl
+projects/<project>/subagents/agent-<hash>.meta.json
+projects/<project>/subagents/agent-acompact-<hash>.jsonl   # compacted
+```
+
+Newer versions inline sidechain entries into the main session JSONL
+with `isSidechain: true` and an `agentId` field. See
+[entry-types.md §Sidechains](entry-types.md#sidechains).
+
+## Pitfalls
+
+- A file's `sessionId` on **non-first** entries is a less reliable
+  signal than the first entry's `sessionId`. Hook-injected or
+  cross-referenced entries sometimes carry a sessionId that matches a
+  different file.
+- The `sessions-index.json` index sometimes gets out of sync with the
+  actual files on disk. Re-scan rather than trust the index blindly.
+- A continuation can cross project directories if `cwd` changed
+  mid-session (rare but possible).
diff --git a/docs/agents/formats/claude-code/tools.md b/docs/agents/formats/claude-code/tools.md
new file mode 100644
index 0000000..4df026c
--- /dev/null
+++ b/docs/agents/formats/claude-code/tools.md
@@ -0,0 +1,229 @@
+# Tool calls
+
+Tool calls are the most complex part of the format. A single tool
+invocation produces **two JSONL entries** plus optional top-level
+summary data, and the pairing has to be reconstructed by the reader.
+
+## The two-entry lifecycle
+
+A tool call is always a pair:
+
+1. **Assistant entry** with a `tool_use` content part in
+   `message.content`. This is the call.
+2. **User entry** whose `message.content` is an array of `tool_result`
+   parts. This is the response.
+
+Example (abbreviated):
+
+```jsonc
+// 1) Assistant issues the call
+{
+  "type": "assistant",
+  "uuid": "3995c068-...",
+  "message": {
+    "role": "assistant",
+    "content": [
+      {
+        "type": "tool_use",
+        "id": "toolu_01LJtcvehSnXfMztfk8b8ZLC",
+        "name": "Grep",
+        "input": {"pattern": "TODO"}
+      }
+    ],
+    "stop_reason": "tool_use"
+  }
+}
+
+// 2) User entry (synthesized, not human-typed) carries the result
+{
+  "type": "user",
+  "uuid": "1e5e8efb-...",
+  "parentUuid": "3995c068-...",
+  "sourceToolAssistantUUID": "3995c068-...",
+  "message": {
+    "role": "user",
+    "content": [
+      {
+        "type": "tool_result",
+        "tool_use_id": "toolu_01LJtcvehSnXfMztfk8b8ZLC",
+        "content": "src/main.rs\nsrc/lib.rs\n"
+      }
+    ]
+  },
+  "toolUseResult": {
+    "mode": "files_with_matches",
+    "filenames": ["src/main.rs", "src/lib.rs"],
+    "numFiles": 2
+  }
+}
+```
+
+The **primary key** linking the two is `tool_use.id` ↔
+`tool_result.tool_use_id`. The envelope also provides a convenience
+back-reference: the tool-result-carrying user entry has
+`sourceToolAssistantUUID` pointing at the `uuid` of the issuing
+assistant entry.
+
+## One assistant entry, multiple tool calls
+
+An assistant entry can include multiple `tool_use` parts in its
+`content` array. Each gets its own matching `tool_result` part in
+the following user entry's `content` array.
+
+```jsonc
+// Assistant: two tool calls in one entry
+{"content": [
+  {"type": "tool_use", "id": "toolu_A", "name": "Read",  "input": {...}},
+  {"type": "tool_use", "id": "toolu_B", "name": "Grep",  "input": {...}}
+]}
+
+// User: two results matched by ID
+{"content": [
+  {"type": "tool_result", "tool_use_id": "toolu_A", "content": "..."},
+  {"type": "tool_result", "tool_use_id": "toolu_B", "content": "..."}
+]}
+```
+
+Parallel tool calls — multiple `tool_use` parts in a single assistant
+entry — are a normal pattern that agents use to fan out reads.
+
+## Tool result `content` shape
+
+The `content` field on a `tool_result` is either a string or an array
+of text-carrying objects:
+
+```jsonc
+// String form
+{"type": "tool_result", "tool_use_id": "...", "content": "file contents"}
+
+// Array form
+{"type": "tool_result", "tool_use_id": "...", "content": [
+  {"text": "line 1"},
+  {"text": "line 2"}
+]}
+```
+
+To recover the output, join array-form parts with `\n`.
+
+## The top-level `toolUseResult`
+
+The tool-result-carrying user entry may also have a top-level
+`toolUseResult` field (sibling of `message`, not nested inside it).
+This is a **structured summary** of the tool's output, populated for
+tools that have structured outputs.
+
+Whether `toolUseResult` is present depends on the tool. Tools with
+structured outputs emit it; tools whose output is a single blob of text
+or a status string leave it absent.
+
+| Tool        | Top-level `toolUseResult`? | Inline `tool_result.content`? |
+|-------------|----------------------------|-------------------------------|
+| `Read`      | no                         | yes — file contents as string |
+| `Write`     | no                         | yes — success string          |
+| `Edit`      | no                         | yes — diff or success string  |
+| `Bash`      | no                         | yes — stdout/stderr as string |
+| `Grep`      | **yes**                    | yes — human-readable summary  |
+| `Glob`      | **yes**                    | yes                           |
+| `TodoWrite` | **yes**                    | yes                           |
+| `Task`      | **yes**                    | yes — agent summary           |
+| `WebSearch` | **yes**                    | yes                           |
+| `WebFetch`  | **yes** (when the fetch succeeds) | yes                    |
+
+MCP-provided tools follow their server's conventions rather than
+Claude Code's. We have not seen `toolUseResult` populated for MCP tools
+in the samples we've inspected.
+
+### Observed `toolUseResult` shapes
+
+**`Grep`:**
+```json
+{
+  "mode": "files_with_matches",
+  "filenames": ["packages/.../MainHeader.svelte", "..."],
+  "numFiles": 79
+}
+```
+
+Alternative `mode` values: `"content"` (shows matching lines),
+`"count"` (per-file match counts), mirroring the tool's `output_mode`
+parameter.
+
+**`Glob`** (expected shape):
+```json
+{"filenames": ["src/a.rs", "src/b.rs"], "durationMs": 12}
+```
+
+**`TodoWrite`** (expected shape):
+```json
+{"todos": [{"content": "...", "status": "completed", "activeForm": "..."}, ...]}
+```
+
+**`Task`** (expected shape):
+```json
+{
+  "agentId": "a7bf2fd",
+  "totalTokens": 12345,
+  "totalToolUseCount": 7,
+  "usage": {...},
+  "result": "...summary text..."
+}
+```
+
+### Spilled-to-disk outputs
+
+Very large tool outputs may be spilled to files under
+`projects/<project>/<session>/tool-results/` rather than inlined into
+`tool_result.content`. In that case the content field contains a
+reference. We haven't captured a concrete example yet — parsers that
+assume content is always inline may break on very long outputs.
+
+## Common tool `input` shapes
+
+The built-in tool set is **not stable** — Anthropic ships new tools,
+renames old ones, and adds/removes input fields between Claude Code
+releases. The list below is illustrative, not canonical; it captures
+what we've observed in Claude Code 2.1.x samples. Consult Anthropic's
+tool documentation for the authoritative current shapes, and treat any
+unfamiliar field on a known tool as additive drift rather than a
+parsing error.
+
+- **`Read`** — `{file_path, offset?, limit?, pages?}`
+- **`Write`** — `{file_path, content}`
+- **`Edit`** — `{file_path, old_string, new_string, replace_all?}`
+- **`Bash`** — `{command, description?, timeout?, run_in_background?}`
+- **`Grep`** — `{pattern, path?, glob?, type?, output_mode?, -i?, -A?, -B?, -C?, -n?, head_limit?, offset?, multiline?}`
+- **`Glob`** — `{pattern, path?}`
+- **`WebFetch`** — `{url, prompt}`
+- **`WebSearch`** — `{query, allowed_domains?, blocked_domains?}`
+- **`Task`** / **`Agent`** — `{description, prompt, subagent_type?}`
+- **`TodoWrite`** — `{todos: [{content, status, activeForm}]}`
+- **`NotebookEdit`** — `{notebook_path, cell_id?, new_source, cell_type?, edit_mode?}`
+
+Tools provided by MCP servers appear as `mcp__<server>__<tool>`.
+The `input` schema is controlled by the MCP server, not Claude Code —
+the list above has nothing to say about them.
+
+## Consumer concerns
+
+### Cross-boundary pairing
+
+When watching a live session, a `tool_use` may appear in one read
+window and its `tool_result` in a later one. A consumer that emits
+turns eagerly needs a "late join" step that merges the result back
+into the earlier assistant turn once it arrives.
+
+### Permission prompts leave no trace
+
+Claude Code can gate tool execution behind a permission prompt. The
+prompt itself and the user's accept/deny decision are **not** recorded
+in the JSONL. The only signal is a gap in timestamps between the
+`tool_use` and the `tool_result`. A denied tool call still eventually
+produces a `tool_result` (with `is_error: true` or a denial message).
+
+### Synthesized user entries
+
+The user entry carrying `tool_result`s was not typed by a human. A
+transcript-rendering UI should usually fold it into the preceding
+assistant turn rather than show it as "the user said …". Detection:
+the entry is `type: "user"`, the only content-part kinds are
+`tool_result`, and there's a `sourceToolAssistantUUID` envelope field.
diff --git a/docs/agents/formats/claude-code/usage.md b/docs/agents/formats/claude-code/usage.md
new file mode 100644
index 0000000..a471376
--- /dev/null
+++ b/docs/agents/formats/claude-code/usage.md
@@ -0,0 +1,128 @@
+# Token usage accounting
+
+Assistant entries carry a `usage` object inside `message.usage` that
+records the token counts Anthropic billed for that turn, including
+prompt-cache statistics. The shape has grown over time and now mixes
+flat fields with nested breakdowns that duplicate the flat totals.
+
+## Full observed shape
+
+```jsonc
+{
+  "input_tokens": 2,
+  "output_tokens": 21,
+
+  "cache_creation_input_tokens": 12291,
+  "cache_read_input_tokens":     12361,
+
+  "cache_creation": {
+    "ephemeral_5m_input_tokens": 0,
+    "ephemeral_1h_input_tokens": 12291
+  },
+
+  "server_tool_use": {
+    "web_search_requests": 0,
+    "web_fetch_requests":  0
+  },
+
+  "service_tier":   "standard",
+  "inference_geo":  "",
+  "speed":          "standard",
+
+  "iterations": [
+    {
+      "input_tokens": 2,
+      "output_tokens": 21,
+      "cache_read_input_tokens":     12361,
+      "cache_creation_input_tokens": 12291,
+      "cache_creation": {"ephemeral_5m_input_tokens": 0, "ephemeral_1h_input_tokens": 12291},
+      "type": "message"
+    }
+  ]
+}
+```
+
+## Field reference
+
+### Core token counts
+
+- **`input_tokens`** — tokens sent in the request (input side, after
+  prompt-cache reads).
+- **`output_tokens`** — tokens generated by the model in this response.
+
+### Prompt caching
+
+Prompt-cache counts are **separate from** `input_tokens`. The model
+sees all three, but they're billed at different rates (cache writes
+are more expensive than regular inputs; cache reads are cheaper).
+
+- **`cache_creation_input_tokens`** — tokens written to the prompt
+  cache during this call.
+- **`cache_read_input_tokens`** — tokens retrieved from the prompt
+  cache during this call.
+- **`cache_creation`** — a nested breakdown of
+  `cache_creation_input_tokens` by TTL bucket:
+  - `ephemeral_5m_input_tokens` — cached for ~5 minutes.
+  - `ephemeral_1h_input_tokens` — cached for ~1 hour.
+
+  The two TTL buckets sum to the flat `cache_creation_input_tokens`.
+  Older versions of Claude Code only emitted the flat field; newer
+  versions emit both.
+
+### Server-side tool use
+
+- **`server_tool_use`** — counts of tool calls executed
+  server-side by Anthropic (as opposed to client-side by Claude Code).
+  Observed subfields: `web_search_requests`, `web_fetch_requests`. Other
+  server-tool subfields may appear as Anthropic ships new built-ins.
+
+### Billing / routing
+
+- **`service_tier`** — `"standard"` or `"batch"`. Determines billing
+  rate. Sessions run through batch API land in `"batch"`; everything
+  else is `"standard"`.
+- **`inference_geo`** — geographic region the request was routed to.
+  Often empty string; otherwise a region code.
+- **`speed`** — inference speed tier; `"standard"` is the only value
+  we've seen.
+
+### Agentic loop iterations
+
+- **`iterations`** — array of per-iteration usage objects for turns
+  that ran through an internal agentic loop (multi-step tool-use
+  cycles inside a single message API call). Each iteration has the
+  same shape as the enclosing `usage`.
+
+  The enclosing totals (`input_tokens`, `output_tokens`, etc.) are
+  typically the sum of the iterations' corresponding fields. Not all
+  turns have `iterations`.
+
+## Older vs. newer shapes
+
+Versioning is gradual and largely additive. Rules of thumb:
+
+| Shape feature                                             | Since        |
+|-----------------------------------------------------------|--------------|
+| Flat `input_tokens` / `output_tokens`                     | always       |
+| Flat `cache_creation_input_tokens` / `cache_read_input_tokens` | always (for cached prompts) |
+| Nested `cache_creation: {ephemeral_5m, ephemeral_1h}`     | 2.0.x+       |
+| `service_tier`                                            | 2.0.x+       |
+| `server_tool_use`                                         | 2.1.x+       |
+| `iterations`                                              | 2.1.x+       |
+| `inference_geo`, `speed`                                  | 2.1.x+ (often empty) |
+
+The practical consequence: any field may be absent on an older entry.
+A tool doing token accounting should treat everything but
+`input_tokens` / `output_tokens` as `Optional<number>` and default to
+zero when summing.
+
+## Not recorded in `usage`
+
+- **Thinking tokens** are not reported separately. They are included
+  in the `output_tokens` total.
+- **Sidechain usage** is recorded on the sidechain's own assistant
+  entries, not rolled up into the parent conversation's totals.
+  Cache-read tokens on sidechain entries often "mirror" the parent
+  because the prompt cache is shared.
+- **User-entry token counts** are not recorded. Only assistant entries
+  carry `usage`.
diff --git a/docs/agents/formats/claude-code/walkthrough.md b/docs/agents/formats/claude-code/walkthrough.md
new file mode 100644
index 0000000..1bd5247
--- /dev/null
+++ b/docs/agents/formats/claude-code/walkthrough.md
@@ -0,0 +1,282 @@
+# Walkthrough: a session, line by line
+
+Every other doc in this directory is a field-by-field reference. This
+one is the complement: a single representative session, read linearly,
+with commentary at each line about what you're seeing and which doc
+defines it.
+
+Use this to:
+
+- Orient yourself before diving into the reference docs.
+- Check that your parser handles every common shape in order.
+- Have a concrete example to point at when explaining the format to a
+  new contributor.
+
+The JSONL below is **synthesized for illustration** — UUIDs,
+timestamps, and `tool_use` IDs are not taken from a real file. The
+shapes, field names, and ordering follow what we've observed in real
+sessions. For the exact field definitions, follow the cross-links.
+
+## Setting
+
+Claude Code was launched in `/Users/alex/demo`. The file we're reading
+is at `~/.claude/projects/-Users-alex-demo/c0ff2e4a-1111-4222-9333-444455556666.jsonl`.
+See [directory-layout.md](directory-layout.md) for how that path was
+chosen.
+
+The user asked Claude to read a file and then summarize it. Then the
+session got long enough to trigger a compaction, and the user asked a
+follow-up.
+
+---
+
+## Line 1 — `permission-mode`
+
+```json
+{"type":"permission-mode","permissionMode":"default","sessionId":"c0ff2e4a-1111-4222-9333-444455556666"}
+```
+
+**Purpose:** records which permission mode was active when the session
+started.
+
+**Things to notice:**
+
+- Exactly three fields. Adding anything — even `uuid: ""` or
+  `timestamp` — causes the loader to reject the line. See
+  [writing-compatible-jsonl.md §`permission-mode` entries must be
+  exactly three fields](writing-compatible-jsonl.md#2-permission-mode-entries-must-be-exactly-three-fields).
+- `sessionId` matches the filename stem, so this is a **chain head**
+  (not a continuation file). See [session-chains.md](session-chains.md).
+
+---
+
+## Line 2 — `user` (direct prompt)
+
+```json
+{"type":"user","uuid":"u-0001","parentUuid":null,"sessionId":"c0ff2e4a-1111-4222-9333-444455556666","timestamp":"2026-04-23T09:00:01.123Z","isSidechain":false,"userType":"external","entrypoint":"cli","cwd":"/Users/alex/demo","gitBranch":"","version":"2.1.112","message":{"role":"user","content":"what's in src/main.rs?"}}
+```
+
+**Purpose:** the first real user prompt.
+
+**Things to notice:**
+
+- `parentUuid: null` because this is the first conversation entry in
+  the session. See [jsonl-envelope.md §Identity /
+  position](jsonl-envelope.md#identity--position).
+- `gitBranch: ""` (empty string, not `null`) because the cwd isn't a
+  git repo. See [known-issues.md §Things that look like bugs but
+  aren't](known-issues.md#things-that-look-like-bugs-but-arent).
+- `message.content` is a **bare string**. Legal on user entries;
+  **illegal on assistant entries** — see line 3. See
+  [messages.md §The string-or-array `content`
+  trap](messages.md#the-string-or-array-content-trap).
+- Envelope keys are camelCase (`parentUuid`, `sessionId`,
+  `isSidechain`, `userType`, `gitBranch`). API-adjacent keys inside
+  `message` will be snake_case when they appear (`stop_reason`,
+  `input_tokens`). See
+  [jsonl-envelope.md §Field conventions](jsonl-envelope.md#field-conventions).
+
+---
+
+## Line 3 — `assistant` (tool call)
+
+```json
+{"type":"assistant","uuid":"a-0001","parentUuid":"u-0001","sessionId":"c0ff2e4a-1111-4222-9333-444455556666","timestamp":"2026-04-23T09:00:02.456Z","isSidechain":false,"userType":"external","cwd":"/Users/alex/demo","gitBranch":"","version":"2.1.112","requestId":"req_01ABCD","message":{"role":"assistant","type":"message","id":"msg_01EFGH","model":"claude-opus-4-6","content":[{"type":"text","text":"I'll read it."},{"type":"tool_use","id":"toolu_001","name":"Read","input":{"file_path":"/Users/alex/demo/src/main.rs"}}],"stop_reason":"tool_use","stop_sequence":null,"usage":{"input_tokens":12,"output_tokens":48,"cache_creation_input_tokens":8421,"cache_read_input_tokens":0,"cache_creation":{"ephemeral_5m_input_tokens":0,"ephemeral_1h_input_tokens":8421},"service_tier":"standard"}}}
+```
+
+**Purpose:** Claude acknowledges and calls `Read`.
+
+**Things to notice:**
+
+- `message.content` is an **array**. Even a plain-text response is
+  wrapped as `[{"type":"text","text":"..."}]`. The loader will crash
+  on a bare-string content here. See
+  [writing-compatible-jsonl.md §Assistant `message.content` must always
+  be an array](writing-compatible-jsonl.md#1-assistant-messagecontent-must-always-be-an-array).
+- Two parts in one entry — a `text` part *and* a `tool_use` part.
+  That's common: Claude narrates the action in the same entry that
+  issues the call. See [messages.md §Content part
+  types](messages.md#content-part-types).
+- `stop_reason: "tool_use"` signals that the assistant expects a tool
+  result next.
+- `requestId` is the Anthropic API request ID (prefix `req_`).
+  `message.id` is the Anthropic API message ID (prefix `msg_`).
+  Both are distinct from the envelope `uuid`. See
+  [jsonl-envelope.md §API correlation](jsonl-envelope.md#api-correlation).
+- `usage.cache_creation` breaks down `cache_creation_input_tokens`
+  into 5-minute and 1-hour TTL buckets. See
+  [usage.md §Prompt caching](usage.md#prompt-caching).
+
+---
+
+## Line 4 — `user` (tool result carrier)
+
+```json
+{"type":"user","uuid":"u-0002","parentUuid":"a-0001","sessionId":"c0ff2e4a-1111-4222-9333-444455556666","timestamp":"2026-04-23T09:00:02.891Z","isSidechain":false,"userType":"external","cwd":"/Users/alex/demo","gitBranch":"","version":"2.1.112","sourceToolAssistantUUID":"a-0001","message":{"role":"user","content":[{"type":"tool_result","tool_use_id":"toolu_001","content":"fn main() { println!(\"hello\"); }\n","is_error":false}]}}
+```
+
+**Purpose:** carries the tool result back. **Not human-typed** — a
+transcript UI should fold this into the previous assistant turn rather
+than render it as a user message.
+
+**Things to notice:**
+
+- `type: "user"` but every content part is `tool_result`. Use the
+  `classify_user()` decision tree in
+  [entry-types.md §Classifying a user entry](entry-types.md#classifying-a-user-entry)
+  to recognize this.
+- `sourceToolAssistantUUID: "a-0001"` is the convenience back-reference
+  to the issuing assistant entry. The primary link is
+  `tool_result.tool_use_id` ↔ `tool_use.id`. See
+  [tools.md §The two-entry lifecycle](tools.md#the-two-entry-lifecycle).
+- No top-level `toolUseResult` — `Read` doesn't have a structured-
+  output summary. Compare to `Grep` in
+  [tools.md §The top-level `toolUseResult`](tools.md#the-top-level-tooluseresult).
+
+---
+
+## Line 5 — `assistant` (text only, same logical turn)
+
+```json
+{"type":"assistant","uuid":"a-0002","parentUuid":"u-0002","sessionId":"c0ff2e4a-1111-4222-9333-444455556666","timestamp":"2026-04-23T09:00:03.210Z","isSidechain":false,"userType":"external","cwd":"/Users/alex/demo","gitBranch":"","version":"2.1.112","requestId":"req_01IJKL","message":{"role":"assistant","type":"message","id":"msg_01MNOP","model":"claude-opus-4-6","content":[{"type":"text","text":"It's a minimal Rust program that prints \"hello\"."}],"stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":3,"output_tokens":14,"cache_read_input_tokens":8421,"service_tier":"standard"}}}
+```
+
+**Things to notice:**
+
+- `stop_reason: null` even though the turn ended cleanly. That's a
+  timing artifact — the entry was written before the streaming API
+  response finalized. Don't infer "turn still in flight" from this.
+  See [known-issues.md §Missing `stop_reason` on assistant
+  entries](known-issues.md#missing-stop_reason-on-assistant-entries).
+- One logical assistant turn spans **two** `assistant` entries
+  (`a-0001` with the tool call, `a-0002` with the text reply). Never
+  assume one `assistant` JSONL entry = one turn. See
+  [entry-types.md §`assistant`](entry-types.md#assistant).
+- `cache_read_input_tokens` is high while `cache_creation_input_tokens`
+  is zero — the prompt cache warmed up on the previous turn is now
+  being read.
+
+---
+
+## Line 6 — `system` (turn_duration)
+
+```json
+{"type":"system","subtype":"turn_duration","uuid":"s-0001","parentUuid":"a-0002","sessionId":"c0ff2e4a-1111-4222-9333-444455556666","timestamp":"2026-04-23T09:00:03.215Z","durationMs":2092,"messageCount":4}
+```
+
+**Purpose:** authoritative "turn ended" signal. More reliable than
+`stop_reason` for detecting end-of-turn.
+
+**Things to notice:**
+
+- `subtype` is an open enumeration. Treat unknown values as "log and
+  keep." See
+  [entry-types.md §`system`](entry-types.md#system).
+- `messageCount: 4` refers to the four entries in this turn
+  (`u-0001`, `a-0001`, `u-0002`, `a-0002`).
+
+---
+
+## …much later in the session…
+
+After many more turns, the conversation approaches the context window
+limit and triggers an autocompaction.
+
+## Line 117 — `compact_boundary`
+
+```json
+{"type":"compact_boundary","uuid":"cb-0001","parentUuid":null,"logicalParentUuid":"a-0116","sessionId":"c0ff2e4a-1111-4222-9333-444455556666","timestamp":"2026-04-23T10:14:52.000Z","compactMetadata":{"trigger":"auto","preTokens":180000}}
+```
+
+**Things to notice:**
+
+- `parentUuid: null` — the DAG resets here. The real prior message
+  lives in `logicalParentUuid`. See
+  [session-chains.md §The boundary
+  marker](session-chains.md#the-boundary-marker).
+- In some versions this shows up as `type: "system"` with
+  `subtype: "compact_boundary"` instead of the top-level
+  `compact_boundary` type. Treat as equivalent. See
+  [format-changelog.md](format-changelog.md).
+- `compactMetadata.trigger: "auto"` — the client compacted on its own,
+  not because the user ran `/compact`.
+
+## Line 118 — synthetic summary (user entry)
+
+```json
+{"type":"user","uuid":"u-0118","parentUuid":"cb-0001","sessionId":"c0ff2e4a-1111-4222-9333-444455556666","timestamp":"2026-04-23T10:14:52.010Z","isSidechain":false,"userType":"external","cwd":"/Users/alex/demo","gitBranch":"","version":"2.1.112","isCompactSummary":true,"isVisibleInTranscriptOnly":true,"message":{"role":"user","content":"Summary: user asked to read src/main.rs; I reported it's a minimal hello-world Rust program. We then…"}}
+```
+
+**Things to notice:**
+
+- `isCompactSummary: true` flags this as the synthesized summary, not
+  real user input.
+- `isVisibleInTranscriptOnly: true` tells the loader: show it in the
+  UI but don't replay it to the model (the model already sees the
+  real compacted context on the API side).
+- UIs should mark this specially — treating it as "the user said X"
+  will confuse consumers.
+
+---
+
+## Line 119 — `user` (real follow-up after compaction)
+
+```json
+{"type":"user","uuid":"u-0119","parentUuid":"u-0118","sessionId":"c0ff2e4a-1111-4222-9333-444455556666","timestamp":"2026-04-23T10:15:10.000Z","isSidechain":false,"userType":"external","entrypoint":"cli","cwd":"/Users/alex/demo","gitBranch":"","version":"2.1.112","message":{"role":"user","content":"can you add error handling?"}}
+```
+
+Back to a normal direct prompt. Life resumes.
+
+---
+
+## What a rotation would look like
+
+Say this session gets long enough that Claude Code rotates to a new
+file. The next file (`d1eeeeee-2222-4333-a444-555566667777.jsonl`)
+would open with a **bridge entry**:
+
+```json
+{"type":"user","uuid":"u-0200","parentUuid":"a-0199","sessionId":"c0ff2e4a-1111-4222-9333-444455556666","timestamp":"2026-04-23T10:40:00.000Z","...":"..."}
+```
+
+Note the `sessionId` is the **previous** session's ID, not the new
+file's. That mismatch is the primary signal that this file is a
+continuation. See [session-chains.md §Detecting a
+continuation](session-chains.md#detecting-a-continuation). A reader
+reconstructing the logical conversation should **skip the bridge
+entry** — it's a replay of the parent's last line.
+
+Subsequent entries in the new file carry the new `sessionId`
+(`d1eeeeee-…`).
+
+## What a sidechain would look like
+
+A `Task` tool call spawns a subagent. In 2.1.x the sidechain entries
+are **inlined into the same JSONL**, marked by:
+
+- `isSidechain: true` on every entry in the subagent thread
+- `agentId` — a short hash identifying the subagent (e.g. `"a7bf2fd"`)
+- The thread root has `parentUuid: null`; its `message.content` is
+  the Task tool's input prompt
+
+Sidechain `usage` is accounted on the sidechain's own assistant
+entries and does not roll up into the parent conversation's totals.
+See [entry-types.md §Sidechains](entry-types.md#sidechains).
+
+---
+
+## If you made it here
+
+You now have a mental model for the line-level shape of a Claude Code
+session. The reference docs fill in the corners:
+
+- The full envelope field catalogue: [jsonl-envelope.md](jsonl-envelope.md).
+- Every entry type and its variants: [entry-types.md](entry-types.md).
+- The `message` object and content parts: [messages.md](messages.md).
+- Tool-call lifecycle in detail: [tools.md](tools.md).
+- Token accounting: [usage.md](usage.md).
+- Multi-file chains and compaction: [session-chains.md](session-chains.md).
+- Everything outside the session JSONL: [peripheral-files.md](peripheral-files.md).
+- Rules for writing your own: [writing-compatible-jsonl.md](writing-compatible-jsonl.md).
+- Bugs, drift, corruption modes: [known-issues.md](known-issues.md).
+- Version-keyed field history: [format-changelog.md](format-changelog.md).
diff --git a/docs/agents/formats/claude-code/writing-compatible-jsonl.md b/docs/agents/formats/claude-code/writing-compatible-jsonl.md
new file mode 100644
index 0000000..30d3782
--- /dev/null
+++ b/docs/agents/formats/claude-code/writing-compatible-jsonl.md
@@ -0,0 +1,182 @@
+# Writing JSONL that Claude Code will load
+
+If your tool produces session JSONL that you want Claude Code to load
+(for resume, for replay, for synthesizing a starting state), there are
+rules beyond "valid JSON per line." The loader is strict about several
+things. This document captures the constraints we've discovered
+empirically; the list is not exhaustive.
+
+## Where Claude Code looks
+
+Claude Code reads sessions from
+`~/.claude/projects/<sanitized-cwd>/*.jsonl`. To make a file visible to
+`claude --resume` or the session picker:
+
+1. Determine the **canonical** absolute path of the intended working
+   directory (on macOS, resolve `/tmp` → `/private/tmp`).
+2. Sanitize by replacing `/` and `_` with `-`.
+3. Create the directory at `~/.claude/projects/<sanitized>/` if it
+   doesn't exist.
+4. Write `<session-uuid>.jsonl` inside it. The UUID must be a UUIDv4;
+   the filename stem is the session ID.
+
+See [directory-layout.md](directory-layout.md) for details.
+
+## Hard rules
+
+These are empirically confirmed. Violating them causes the loader to
+reject or mis-render the session.
+
+### 1. Assistant `message.content` must always be an array
+
+Claude Code's loader unconditionally calls array methods
+(`.map(…)` and similar) on `message.content` for `assistant` entries.
+A bare-string content crashes the loader — even for a plain text
+response.
+
+```jsonc
+// ❌ Broken
+{"type": "assistant", "message": {"role": "assistant", "content": "Done."}}
+
+// ✅ Correct
+{"type": "assistant", "message": {"role": "assistant",
+  "content": [{"type": "text", "text": "Done."}]}}
+```
+
+User entries tolerate both shapes.
+
+### 2. `permission-mode` entries must be exactly three fields
+
+Observed legal shape:
+
+```json
+{"type": "permission-mode", "permissionMode": "default", "sessionId": "..."}
+```
+
+Adding any additional field — even seemingly benign ones like `uuid: ""`,
+`isSidechain: false`, or a `timestamp` — causes the loader to reject
+the entry. If you're using a typed serializer that emits default
+values, emit this entry as raw JSON instead.
+
+### 3. `cwd` must be the canonical path
+
+The `cwd` on an entry must match the pre-sanitization form of the
+directory name under `projects/`. If you place a file in
+`projects/-private-tmp-foo/` but put `"cwd": "/tmp/foo"` on the
+entries, Claude Code's behavior is inconsistent — the session may
+appear but tool invocations may fail.
+
+## Strong conventions
+
+Real Claude Code entries always carry these envelope fields. The
+loader does not reject their absence outright, but UI rendering and
+downstream tools may behave oddly without them:
+
+| Field         | Why it matters |
+|---------------|----------------|
+| `userType`    | `"external"` on all typical user-written entries. |
+| `entrypoint`  | `"cli"` (or appropriate value for the launching surface). |
+| `cwd`         | Used for git/branch context and to validate project alignment. |
+| `version`     | Claude Code client version string. |
+| `gitBranch`   | Current branch; **empty string**, not null, when cwd isn't a repo. |
+| `sessionId`   | Must match the file's session UUID (or be a bridge entry — see below). |
+| `uuid`        | UUIDv4 per entry. |
+| `timestamp`   | ISO-8601 with millisecond precision. |
+| `parentUuid`  | The prior entry's `uuid` (or `null` for the first entry). |
+| `isSidechain` | Explicit `false` on non-sidechain entries. |
+
+## Structural rules
+
+### Preserve `tool_use` / `tool_result` pairing
+
+Every `tool_use` part in an assistant entry must be followed by a
+corresponding user entry carrying a `tool_result` part with the same
+`tool_use_id`. Assistant entries whose `stop_reason` is `"tool_use"`
+without a matching result in the following user entry confuse the
+loader.
+
+If you're constructing synthetic conversation state with tool calls,
+you must synthesize plausible results.
+
+### Preserve `thinking` signatures
+
+If you include `thinking` parts in assistant content, each must carry
+a valid `signature`. A `thinking` part without a signature (or with a
+signature that doesn't verify) will be **dropped** when the session is
+resumed — the API won't replay it to the model. If you're constructing
+synthetic state that includes thinking, consider whether you actually
+need it; constructing a valid signature is not possible without an
+Anthropic-signed source.
+
+### Assistant entries typically have non-null `model`, `id`, `type`
+
+- `model` — e.g. `"claude-opus-4-6"`.
+- `id` — Anthropic-format API message ID (prefix `msg_`). Synthesized
+  IDs should follow the shape `msg_<base62>`.
+- `type: "message"` — literal.
+
+Omitting these may render OK in the transcript but break resume.
+
+### `stop_reason` tolerance
+
+`stop_reason` is often `null` on real entries (written before the
+stream finalizes), so the loader tolerates `null`. Safe values for
+synthetic entries: `"end_turn"`, `"tool_use"`, `null`.
+
+## Round-trip rules
+
+If you read a session and want to write it back out losslessly:
+
+- **Preserve unknown fields.** A field you don't recognize on read must
+  be round-tripped on write. New Claude Code fields arrive
+  additively.
+- **Preserve field order at your peril.** The loader doesn't care, but
+  diff tools and humans do. Emit envelope fields before `message`
+  for readability.
+- **Preserve `content` array ordering.** Text-before-tool_use and
+  thinking-before-text within a single assistant entry matter for
+  replay semantics.
+
+## Starting a fresh session file
+
+Minimum ingredients to make `claude --resume <uuid>` pick up a new
+session:
+
+1. Directory `~/.claude/projects/<sanitized-cwd>/` exists.
+2. File `<uuid>.jsonl` exists inside it.
+3. First line is a valid `permission-mode` entry for that session
+   (three fields only).
+4. Subsequent lines form a valid conversation — at minimum, a single
+   `user` entry with the envelope fields from the strong-conventions
+   table and a string or array `message.content`.
+
+That's enough for Claude Code to see the session. A functioning
+"continuable" session needs at least one assistant turn with a clean
+`stop_reason` for resume to know where to pick up.
+
+## What we haven't verified
+
+- Exact tolerance for truncated or missing `usage` objects on
+  synthetic assistant entries.
+- Whether `toolUseResult` top-level summaries are ever consulted by
+  the loader, or are purely for UI/debugging.
+- How strict the loader is about `messageId` uniqueness across
+  `file-history-snapshot` entries (given known collision bugs exist
+  in Anthropic's own output).
+
+When in doubt: copy a real entry's envelope shape field-for-field.
+
+## Reference implementation
+
+`path incept` is the in-tree tool that produces Claude-compatible JSONL
+from a Toolpath document. Its implementation is the working example of
+every rule in this doc:
+
+- CLI entry point: [`crates/toolpath-cli/src/cmd_incept.rs`](../../../../crates/toolpath-cli/src/cmd_incept.rs)
+- Projection logic: `ClaudeProjector` in
+  [`crates/toolpath-claude/src/project.rs`](../../../../crates/toolpath-claude/src/project.rs)
+  (implements `toolpath_convo::ConversationProjector`)
+
+When you add a new rule here, update `cmd_incept.rs` or `project.rs`
+in the same change; when you discover a rule empirically by fixing a
+bug there, capture it here.