Give your local Claude agent a sleep cycle. Every night it reviews your past sessions offline, replays your recurring tasks on your own API budget, and consolidates what it learns into validated memory (
CLAUDE.md) and skills (SKILL.md). Your agent gets better the more you use it — no model-weight training.
SkillOpt-Sleep is the deployment-time companion to SkillOpt. SkillOpt trains a skill offline on a benchmark; SkillOpt-Sleep applies the same discipline to your own daily usage: bounded text edits, accepted only through a held-out validation gate, with rejected edits kept as negative feedback.
It synthesizes three ideas:
| Idea | Contribution |
|---|---|
| SkillOpt | skill/memory = trainable text; bounded add/delete/replace edits; held-out gate keeps only changes that help. |
| Claude Dreams | offline consolidation over past sessions; input never mutated; output reviewed then adopted. |
| Agent sleep | periodic offline replay turns short-term episodes into long-term skill. |
harvest ~/.claude transcripts → mine recurring tasks → replay offline
→ consolidate (reflect → bounded edit → GATE) → stage proposal → (you) adopt
Nothing live is modified until you run /sleep adopt (the Dreams "review,
then adopt or discard" contract). Every adopt backs up the prior file first.
Requirements: Python ≥ 3.10, and the claude CLI (and/or codex CLI) on PATH.
# 1) get the code (the plugin ships inside the SkillOpt repo)
git clone https://github.com/microsoft/SkillOpt.git
cd SkillOpt
# 2) add the plugin to Claude Code as a local marketplace
/plugin marketplace add ./skillopt-sleep-plugin
/plugin install skillopt-sleep@skillopt-sleep
# 3) verify
/sleep statusThe plugin's bundled runner (scripts/sleep.sh) auto-selects a Python ≥ 3.10
interpreter and calls the skillopt_sleep engine in the repo. No pip install
is required for the default mock backend or for claude/codex backends —
they shell out to the CLIs you already have.
# from inside any project you use with Claude Code:
/sleep dry-run # safe preview: what it would learn, no changes staged
/sleep run # full cycle: stages a reviewed proposal (still no live edits)
/sleep status # see history + the latest staged proposal
/sleep adopt # apply the staged proposal to CLAUDE.md / SKILL.md (with backup)Or call the engine directly (Python ≥ 3.10):
python -m skillopt_sleep run --project "$(pwd)" --scope invoked --backend mock
python -m skillopt_sleep run --project "$(pwd)" --backend claude # real lift via Claude
python -m skillopt_sleep run --project "$(pwd)" --backend codex # real lift via CodexDefault backend is mock — deterministic, no API spend — so you can try the
plumbing for free. Switch to --backend claude or --backend codex for genuine
improvement on your own budget.
SkillOpt-Sleep is validated against gbrain-evals'
public skillopt-v1 suite — the same benchmark gbrain scores its own skill
optimizer against. We take a deliberately deficient skill and run one sleep
night; held-out scoring is done by a local rule judge (no judge-API, no way to
grade its own homework).
| Backend | Seed | Held-out before → after | Nights |
|---|---|---|---|
| Claude (Haiku 4.5) | brief-writer | 0.00 → 1.00 | 1 |
| Codex | brief-writer | 0.00 → 1.00 | 2 |
Both took a brief-writer with no risks section / no confidence level and, within
1–2 nights, proposed gated edits that lifted the held-out score to perfect —
into the protected LEARNED block, nothing else touched. The Codex 2-night
trace even shows the optimizer diagnosing its own residual failure and
adding a meta-rule to fix it. Full writeup + reproduction:
docs/sleep/real_api_results.md.
Reproduce:
git clone https://github.com/garrytan/gbrain-evals /tmp/gbrain-evals
python -m skillopt_sleep.experiments.run_gbrain --backend claude --model haiku \
--seeds brief-writer --data-root /tmp/gbrain-evals/eval/data/skillopt-v1 \
--nights 1 --limit-replay 3 --limit-holdout 3
python -m skillopt_sleep.experiments.run_gbrain --backend codex \
--seeds brief-writer --data-root /tmp/gbrain-evals/eval/data/skillopt-v1 \
--nights 1 --limit-replay 3 --limit-holdout 3python -m skillopt_sleep.experiments.run_experiment --persona researcher --assert-improves
python -m skillopt_sleep.experiments.run_experiment --persona programmer --assert-improvesEach prints the held-out score rising from baseline toward 1.0 as the gate
accepts the general rules your tasks need, and confirms the gate rejects an
injected harmful edit. Recorded output: docs/sleep/experiment_results.md.
"${CLAUDE_PLUGIN_ROOT}/scripts/install-cron.sh" "$(pwd)" # prints a crontab line; installs nothing- Read-only harvest of
~/.claude.mockreplay has no side effects. - Proposals are staged, never auto-applied (unless you opt in with
--auto-adopt). - Every adopt writes a backup under the staging dir's
backup/. - Per-night token/task budget caps; secrets redacted from prompts.
freshreplay (Phase 3) runs only in throwaway git worktrees.
Phase 1 (engine + deterministic experiment + plugin surface) is complete.
Phase 3 adds the real-API miner/judge and fresh worktree replay. See
docs/superpowers/specs/2026-06-07-skillopt-sleep-claude-code-plugin-design.md.