A state preservation system for AI coding agents that lose context on compaction.
AI coding agents (Codex, Claude Code, Cursor, etc.) work in bounded context windows. When conversations get long, older context gets compacted or dropped. The agent wakes up after compaction and doesn't know:
- What it just finished
- What it was supposed to do next
- What files matter for the current task
- Whether it's re-doing work that's already done
If you've ever had to tell your agent "do you remember where you left off?" — this is why.
A checkpoint script that writes relay batons, not breadcrumbs.
Every checkpoint captures:
- step — what phase of work just completed
- note — what happened and what was learned
- next_cmd — the exact command or action to take next
- validation — proof status (green, pending, failed)
- branch + head — git state at checkpoint time
- extra key=value pairs — artifact paths, file references, failure counts, whatever the next agent instance needs
The critical field is next_cmd. Without it, the agent reads the checkpoint and re-investigates work that's already done. With it, the agent picks up the baton and runs.
Two files, two purposes:
- Timestamped snapshots (
context_checkpoint_{timestamp}_{label}.md+.json) — the permanent history - LATEST.md + LATEST.json — the hot pointer that the next agent instance reads on cold start
The agent's system instructions tell it to read LATEST after any compaction, run the next_cmd, and checkpoint again before it dies. The loop is:
checkpoint → compaction kills context → new context reads LATEST → picks up next_cmd → works → checkpoint → repeat
The --track flag lets multiple agents checkpoint independently. Each track has its own entry in LATEST.json. Parallel agents use --no-set-current to avoid stomping the primary track. --track is required for every snapshot.
python /path/to/context_checkpoint.py --repo-root . snapshot \
--step "verifier schedule fix landed" \
--note "focused_schedule_clause_fragment added, Utah one-case proof green" \
--next-cmd "python3 -m pytest -q tests/unit/test_document_research_verifier.py && run mixed-family forward proof" \
--label "verifier-fix" \
--track "MIXED_RETRIEVAL" \
--validation "green" \
--extra "artifact=/var/tmp/proof_run_20260412/run_report.json" \
--extra "defect_count=0"By default, checkpoints are written to <repo-root>/runtime/checkpoints/. You can override this with --checkpoint-dir (relative to --repo-root or absolute).
--extra entries must be valid key=value pairs. Malformed entries fail fast.
python /path/to/context_checkpoint.py --repo-root . resume --liveOr resume a specific track:
python /path/to/context_checkpoint.py --repo-root . resume --track "MIXED_RETRIEVAL" --livetrack=MIXED_RETRIEVAL
step=verifier schedule fix landed
note=focused_schedule_clause_fragment added, Utah one-case proof green
branch=main
head=fa124d5
next_cmd=python3 -m pytest -q tests/unit/test_document_research_verifier.py && run mixed-family forward proof
That's everything it needs. No re-investigation. No re-discovery. Just pick up and go.
Add something like this to your agent's system instructions (see checkpoint_protocol.md for the full SOP):
After any compaction or context loss:
1. Read LATEST.md + LATEST.json from <repo-root>/runtime/checkpoints/ (or your --checkpoint-dir)
2. Verify git state matches checkpoint
3. Run next_cmd from your track
4. Write a fresh checkpoint before starting new work
Before any phase transition:
1. Write a checkpoint with next_cmd pointing to the next phase
context_checkpoint.py— the checkpoint scriptcheckpoint_protocol.md— system instruction SOP for agents
- Python 3.9+
- No external dependencies (stdlib only)
Built out of necessity. An AI coding agent was running multi-day autonomous development sessions, hitting context compactions, and re-doing its own work because it couldn't remember what came next. The checkpoint system turned 2,065 context deaths into 2,065 clean handoffs. The agent on the other end shipped a qualified enterprise document research system with formal governance, 11 corpus families, and a three-layer retrieval hardening stack.
The script is 426 lines of Python. The system it enabled is considerably larger.
MIT