Name	Name	Last commit message	Last commit date
parent directory ..
.claude-plugin	.claude-plugin
commands	commands
hooks	hooks
scripts	scripts
skills/skillopt-sleep	skills/skillopt-sleep
README.md	README.md

SkillOpt-Sleep (Claude Code plugin)

Give your local Claude agent a sleep cycle. Every night it reviews your past sessions offline, replays your recurring tasks on your own API budget, and consolidates what it learns into validated memory (CLAUDE.md) and skills (SKILL.md). Your agent gets better the more you use it — no model-weight training.

SkillOpt-Sleep is the deployment-time companion to SkillOpt. SkillOpt trains a skill offline on a benchmark; SkillOpt-Sleep applies the same discipline to your own daily usage: bounded text edits, accepted only through a held-out validation gate, with rejected edits kept as negative feedback.

It synthesizes three ideas:

Idea	Contribution
SkillOpt	skill/memory = trainable text; bounded add/delete/replace edits; held-out gate keeps only changes that help.
Claude Dreams	offline consolidation over past sessions; input never mutated; output reviewed then adopted.
Agent sleep	periodic offline replay turns short-term episodes into long-term skill.

What it does (one "night")

harvest ~/.claude transcripts → mine recurring tasks → replay offline
   → consolidate (reflect → bounded edit → GATE) → stage proposal → (you) adopt

Nothing live is modified until you run /sleep adopt (the Dreams "review, then adopt or discard" contract). Every adopt backs up the prior file first.

Install

Requirements: Python ≥ 3.10, and the claude CLI (and/or codex CLI) on PATH.

# 1) get the code (the plugin ships inside the SkillOpt repo)
git clone https://github.com/microsoft/SkillOpt.git
cd SkillOpt

# 2) add the plugin to Claude Code as a local marketplace
/plugin marketplace add ./skillopt-sleep-plugin
/plugin install skillopt-sleep@skillopt-sleep

# 3) verify
/sleep status

The plugin's bundled runner (scripts/sleep.sh) auto-selects a Python ≥ 3.10 interpreter and calls the skillopt_sleep engine in the repo. No pip install is required for the default mock backend or for claude/codex backends — they shell out to the CLIs you already have.

Quick start

# from inside any project you use with Claude Code:
/sleep dry-run     # safe preview: what it would learn, no changes staged
/sleep run         # full cycle: stages a reviewed proposal (still no live edits)
/sleep status      # see history + the latest staged proposal
/sleep adopt       # apply the staged proposal to CLAUDE.md / SKILL.md (with backup)

Or call the engine directly (Python ≥ 3.10):

python -m skillopt_sleep run --project "$(pwd)" --scope invoked --backend mock
python -m skillopt_sleep run --project "$(pwd)" --backend claude   # real lift via Claude
python -m skillopt_sleep run --project "$(pwd)" --backend codex    # real lift via Codex

Default backend is mock — deterministic, no API spend — so you can try the plumbing for free. Switch to --backend claude or --backend codex for genuine improvement on your own budget.

Does it actually improve? (real models, public benchmark)

SkillOpt-Sleep is validated against gbrain-evals' public skillopt-v1 suite — the same benchmark gbrain scores its own skill optimizer against. We take a deliberately deficient skill and run one sleep night; held-out scoring is done by a local rule judge (no judge-API, no way to grade its own homework).

Backend	Seed	Held-out before → after	Nights
Claude (Haiku 4.5)	brief-writer	0.00 → 1.00	1
Codex	brief-writer	0.00 → 1.00	2

Both took a brief-writer with no risks section / no confidence level and, within 1–2 nights, proposed gated edits that lifted the held-out score to perfect — into the protected LEARNED block, nothing else touched. The Codex 2-night trace even shows the optimizer diagnosing its own residual failure and adding a meta-rule to fix it. Full writeup + reproduction: docs/sleep/real_api_results.md.

Reproduce:

git clone https://github.com/garrytan/gbrain-evals /tmp/gbrain-evals
python -m skillopt_sleep.experiments.run_gbrain --backend claude --model haiku \
  --seeds brief-writer --data-root /tmp/gbrain-evals/eval/data/skillopt-v1 \
  --nights 1 --limit-replay 3 --limit-holdout 3
python -m skillopt_sleep.experiments.run_gbrain --backend codex \
  --seeds brief-writer --data-root /tmp/gbrain-evals/eval/data/skillopt-v1 \
  --nights 1 --limit-replay 3 --limit-holdout 3

Deterministic proof (no API, no keys)

python -m skillopt_sleep.experiments.run_experiment --persona researcher --assert-improves
python -m skillopt_sleep.experiments.run_experiment --persona programmer  --assert-improves

Each prints the held-out score rising from baseline toward 1.0 as the gate accepts the general rules your tasks need, and confirms the gate rejects an injected harmful edit. Recorded output: docs/sleep/experiment_results.md.

Schedule it nightly

"${CLAUDE_PLUGIN_ROOT}/scripts/install-cron.sh" "$(pwd)"   # prints a crontab line; installs nothing

Safety

Read-only harvest of ~/.claude. mock replay has no side effects.
Proposals are staged, never auto-applied (unless you opt in with --auto-adopt).
Every adopt writes a backup under the staging dir's backup/.
Per-night token/task budget caps; secrets redacted from prompts.
fresh replay (Phase 3) runs only in throwaway git worktrees.

Status

Phase 1 (engine + deterministic experiment + plugin surface) is complete. Phase 3 adds the real-API miner/judge and fresh worktree replay. See docs/superpowers/specs/2026-06-07-skillopt-sleep-claude-code-plugin-design.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

SkillOpt-Sleep (Claude Code plugin)

What it does (one "night")

Install

Quick start

Does it actually improve? (real models, public benchmark)

Deterministic proof (no API, no keys)

Schedule it nightly

Safety

Status

FilesExpand file tree

claude-code

Directory actions

More options

Directory actions

More options

Latest commit

History

claude-code

Folders and files

parent directory

README.md

SkillOpt-Sleep (Claude Code plugin)

What it does (one "night")

Install

Quick start

Does it actually improve? (real models, public benchmark)

Deterministic proof (no API, no keys)

Schedule it nightly

Safety

Status