English | 中文 | 한국어 | 日本語 | Español | Tiếng Việt | Português
Watch the agent. Catch the lies. Stop only when the work is actually done.
A Claude Code plugin that keeps the current agent in a self-referential loop inside a single session and refuses to let it quit until the task genuinely stops producing file edits — no "completion flag", no way for the agent to cheat its way out.
Quick Start • Why Watchdog? • How It Works • Commands • Installation • Inspired By
| Role | Name | GitHub |
|---|---|---|
| Creator & Maintainer | Jonyan Dunh | @JonyanDunh |
Step 1: Install
/plugin marketplace add https://github.com/JonyanDunh/claude-code-watchdog
/plugin install watchdog
/reload-pluginsStep 2: Verify
/watchdog:helpStep 3: Start a watchdog
/watchdog:start "Fix the flaky auth tests in tests/auth/*.ts. Keep iterating until the whole suite passes." --max-iterations 20That's it. Watchdog re-feeds your prompt after every turn until Claude either:
- finishes a turn without modifying any file, or
- hits the
--max-iterationssafety cap, or - you manually run
/watchdog:stop.
Everything else is automatic. The agent never knows a loop is running.
- Zero agent cheating — The agent is never told it is inside a loop. No
systemMessage, no iteration counter, no setup banner. It cannot short-circuit by emitting a fake completion signal. - Forced tool verification — A pure-text turn ("I've checked, all good") never ends the loop. The agent must actually invoke a tool before exit is even considered.
- LLM-judged, project-aware file-change detection — On every hook fire, watchdog spawns a short-lived Claude Code subprocess (
claude -p --model haiku) and asks it the single question: "did this turn modify any project file?". The subprocess sees every tool invocation's full input and decides semantically. Haiku is the model — the important part is that it's an isolated, stateless Claude Code process, not a custom API client, so your existingclaudeauthentication is reused as-is. - Per-session isolation — State file is keyed by the parent Claude Code process ID, discovered by walking the process ancestry. 100 concurrent watchdogs in the same project directory never collide.
- Hidden by design — All diagnostic output goes to stderr. The JSONL transcript never leaks loop metadata into the agent's context.
- Apache 2.0 — Cleanly derived from Anthropic's own
ralph-loopplugin, with full attribution in NOTICE.
You run the command once, then Claude Code handles the rest:
# You run ONCE:
/watchdog:start "Your task description" --max-iterations 20
# Then Claude Code automatically:
# 1. Works on the task
# 2. Tries to exit
# 3. Stop hook blocks the exit and re-feeds the SAME prompt
# 4. Claude iterates on the same task, seeing its own previous edits
# 5. Repeat until a turn finishes without modifying any project file
# (or --max-iterations is reached)The loop happens inside your current session — no external while true, no orchestrator process. The Stop hook in hooks/stop-hook.js blocks normal session exit and re-injects the prompt as a new user turn using Claude Code's native {"decision": "block", "reason": ...} protocol.
This creates a self-referential feedback loop where:
- The prompt never changes between iterations
- Claude's previous work persists in files
- Each iteration sees modified files and git history
- Claude autonomously improves by reading its own past work
The loop exits when both of these are true for the latest assistant turn:
| Check | Requirement |
|---|---|
| Tool usage precondition | The turn must have invoked at least one tool. Pure-text turns never exit. |
| Classifier subprocess verdict | A short-lived Claude Code subprocess (claude -p --model haiku) returns NO_FILE_CHANGES. The subprocess reads every tool invocation's full input and decides semantically whether the turn directly modified any project file. |
If either check fails, the loop continues. Additional exit paths:
--max-iterationsreached (hard cap, always respected)- User runs
/watchdog:stop(removes the state file) - State file manually removed from disk
| Command | Effect | Example |
|---|---|---|
/watchdog:start <PROMPT> [--max-iterations N] |
Start a watchdog in the current session | /watchdog:start "Refactor services/cache.ts. Iterate until pnpm test:cache passes." --max-iterations 20 |
/watchdog:stop |
Cancel the watchdog in the current session | /watchdog:stop |
/watchdog:help |
Print the full reference inside Claude Code | /watchdog:help |
If your prompt contains newlines, quotes, backticks, $, or other characters that would break shell-argument parsing inside the slash command's ! block — for example a multi-paragraph Markdown task spec — pass it as a file instead:
/watchdog:start --prompt-file ./tmp/my-task.md --max-iterations 20The file is read directly by Node (fs.readFileSync), bypassing shell escaping entirely. Relative paths resolve against the Claude Code session's current working directory. UTF-8 BOM is stripped automatically (so Windows Notepad files are safe), CRLF content is preserved byte-for-byte, and leading/trailing whitespace is trimmed. Mutually exclusive with an inline <PROMPT> — pick one or the other.
Works with Linux/macOS/WSL POSIX paths (/home/you/…, ./tmp/…), Windows absolute paths (C:\Users\you\…, C:/Users/you/…), and UNC paths (\\server\share\…). ~ is expanded by your shell (bash/zsh), not by Watchdog — on cmd.exe use %USERPROFILE%\… or an absolute path. Paths with spaces must be quoted as usual: --prompt-file "./my prompts/task.md". See /watchdog:help for the full path-handling reference.
By default the loop exits the moment the Haiku classifier returns its first NO_FILE_CHANGES verdict. For high-stakes work where you want belt-and-suspenders confirmation that the agent has really converged, raise the bar:
/watchdog:start "Refactor services/cache.ts. Iterate until pnpm test:cache passes." --exit-confirmations 3 --max-iterations 20The loop will now require three consecutive clean turns before exiting. The streak counter is reset to 0 the moment the classifier returns anything other than NO_FILE_CHANGES — including FILE_CHANGES, AMBIGUOUS, classifier failures (CLI_MISSING / CLI_FAILED), or a pure-text turn (no tool invocations). Convergence has to be unbroken to count.
Default is 1, identical to pre-1.3.0 behavior. Mutually exclusive with --no-classifier.
If you started the loop with --prompt-file and want to refine the task while it runs, add --watch-prompt-file:
/watchdog:start --prompt-file ./tmp/task.md --watch-prompt-file --max-iterations 30The Stop hook now re-reads the prompt file at the start of every iteration. If the content has changed since the previous turn, the new version becomes the next user turn and the --exit-confirmations streak counter is reset to 0 (a redefined task should not inherit convergence from the old task).
Hot-reload never crashes the loop: if the file is missing, empty, or unreadable when the hook fires, the cached prompt is silently kept and the loop continues. You can edit, rename, or temporarily move the file mid-loop without breaking anything — the next iteration picks up whatever the file looks like at that moment.
Requires --prompt-file. Passing --watch-prompt-file alone is an error.
For ralph-loop-style runs where you don't want any LLM judging convergence — you'll stop the loop manually or via --max-iterations:
/watchdog:start "Keep iterating until I /watchdog:stop." --no-classifierThe Stop hook skips the Haiku call entirely. The only ways to exit become --max-iterations and /watchdog:stop. --max-iterations is optional — if you omit it (as in the example above), the loop is truly unbounded and only stops when you say so.
The claude CLI is not even required in this mode (the Haiku subprocess is never spawned). Compatible with --prompt-file and --watch-prompt-file. Mutually exclusive with --exit-confirmations — the streak counter is meaningless when there is no classifier returning verdicts.
Per-session state lives at .claude/watchdog.claudepid.<PID>.local.json, where <PID> is the parent Claude Code process ID discovered by walking the process ancestry. Example:
{
"active": true,
"iteration": 3,
"max_iterations": 20,
"claude_pid": 1119548,
"started_at": "2026-04-11T12:00:00Z",
"prompt": "Fix the flaky auth tests..."
}Every Claude Code session has a distinct PID, so 100 concurrent watchdogs in the same project directory never collide — each gets its own state file, and /watchdog:stop in any one of them only cancels that specific session's loop.
Monitor active watchdogs:
# List all active per-session state files in this project
ls .claude/watchdog.claudepid.*.local.json
# Inspect one via jq or node
node -e "console.log(JSON.parse(require('fs').readFileSync('.claude/watchdog.claudepid.<PID>.local.json','utf8')))"Manually kill everything in this project:
rm -f .claude/watchdog.claudepid.*.local.json/plugin marketplace add https://github.com/JonyanDunh/claude-code-watchdog
/plugin install watchdog
/reload-pluginsVerify with /watchdog:help.
To try Watchdog without touching your global config, load it for one session only:
claude --plugin-dir /absolute/path/to/claude-code-watchdogFor CI/CD, corporate deployments, or offline use, clone the repo and wire it up manually in ~/.claude/settings.json:
{
"extraKnownMarketplaces": {
"claude-code-watchdog": {
"source": {
"source": "directory",
"path": "/absolute/path/to/claude-code-watchdog"
}
}
},
"enabledPlugins": {
"watchdog@claude-code-watchdog": true
}
}Then run /reload-plugins inside Claude Code.
By design, the agent must not know it is inside a loop. If it knew, it would be tempted to short-circuit by claiming completion from memory on the first turn. Watchdog enforces this by:
-
No
systemMessageemitted from the Stop hook — no iteration counter, no status banner. -
Setup script writes only the user's prompt to stdout — no "Loop activated, iteration 1" header, no initialization output the agent would see.
-
Re-fed prompt is the original text + a single verification reminder, in plain English:
Please re-run the verification by actually invoking tools. Do not, without performing any real tool calls, base your answer on prior context and tell me the check is complete.
-
All diagnostics go to stderr (
>&2) — Claude Code's transcript does not capture them as agent context.
From the agent's point of view, the same user is asking the same question over and over, occasionally adding "please actually re-run the checks". There is no visible Stop hook, no iteration counter, no loop metadata. The agent cannot cheat what it does not know exists.
Write the prompt so "no more edits needed" is a genuine, verifiable answer.
❌ Bad: "Build a todo API and make it good."
✅ Good:
Build a REST API for todos in `src/api/todos.ts`.
Requirements:
- All CRUD endpoints working
- Input validation in place
- 80%+ test coverage in `tests/todos.test.ts`
- All tests pass with `pnpm test`The loop exits on "no files modified". If your task has no verifiable end state, it will just spin.
✅ Good:
Refactor `services/cache.ts` to remove the legacy LRU implementation.
Steps:
1. Delete the old LRU class and its tests
2. Update all callers in `src/` to use the new cache API
3. Run `pnpm typecheck && pnpm test:cache` after each change
4. Iterate until both pass without warningsTell the agent how to notice failure and adapt.
Implement feature X using TDD:
1. Write failing tests in tests/feature-x.test.ts
2. Write minimum code to pass
3. Run `pnpm test:feature-x`
4. If any test fails, read the failure, fix, re-run
5. Refactor only after all tests are greenThe classifier subprocess is not infallible. A stuck agent that keeps making meaningless edits, or one that gets confused and stops editing prematurely, should fall through to a hard stop. --max-iterations 20 is a reasonable default for most work.
The flag is optional, though. If you genuinely want an unlimited loop (e.g., a long-running maintenance loop you intend to stop manually with /watchdog:stop, or a --no-classifier run where convergence is judged by you, not Haiku), just omit the flag entirely.
Good for:
- Tasks with clear, automated success criteria (tests, lints, typechecks)
- Iterative refinement: fix → test → fix → test
- Greenfield implementations you can walk away from
- Systematic code review with fixes
Not good for:
- Tasks requiring human judgment or design decisions
- One-shot operations (a single command, a single file edit)
- Anything where "done" is subjective
- Production debugging that needs external context
Watchdog needs both claude and node in your PATH — node runs the plugin's hook and setup scripts, and claude is what watchdog spawns (claude -p --model haiku) to judge whether each turn modified any project file.
| Requirement | Why |
|---|---|
| Claude Code 2.1+ | Uses the Stop hook system and marketplace plugin format |
node 18+ in PATH |
Runtime for the plugin's hook and setup scripts |
claude CLI in PATH |
Watchdog spawns a short-lived claude -p --model haiku subprocess on every hook fire to classify the turn. Must be authenticated (OAuth or ANTHROPIC_API_KEY) — the subprocess reuses your existing session credentials. |
If you installed Claude Code via npm install -g @anthropic-ai/claude-code, you get both claude and node as a package deal — the npm install adds claude to your PATH, and Node.js is npm's own runtime so it's already there. Nothing else to install.
If you installed Claude Code some other way (standalone binary, Homebrew, Windows installer), claude is already in your PATH but you may need to install Node.js 18+ separately:
macOS (Homebrew):
brew install node
# claude CLI: see https://docs.anthropic.com/claude-codeDebian / Ubuntu / WSL2:
# Option 1: distro package (may be older than 18)
sudo apt update && sudo apt install -y nodejs
# Option 2: NodeSource (current LTS)
curl -fsSL https://deb.nodesource.com/setup_lts.x | sudo -E bash -
sudo apt install -y nodejsFedora / RHEL:
sudo dnf install -y nodejsArch / Manjaro:
sudo pacman -S --needed nodejsWindows (native PowerShell / cmd):
# winget
winget install OpenJS.NodeJS.LTS
# or scoop
scoop install nodejs-lts
# or download the installer from https://nodejs.org| Platform | Status |
|---|---|
| Linux (Node 18 / 20 / 22) | ✅ Tested in CI |
| macOS (Node 18 / 20 / 22) | ✅ Tested in CI |
| Windows (Node 18 / 20 / 22) | ✅ Tested in CI |
This repo is both the marketplace and the plugin — marketplace.json points to ./.
claude-code-watchdog/
├── .claude-plugin/
│ ├── marketplace.json # marketplace manifest
│ └── plugin.json # plugin manifest
├── commands/
│ ├── start.md # /watchdog:start
│ ├── stop.md # /watchdog:stop
│ └── help.md # /watchdog:help
├── hooks/
│ ├── hooks.json # registers the Stop hook (invokes node)
│ └── stop-hook.js # the core loop logic
├── scripts/
│ ├── setup-watchdog.js # creates the state file
│ └── stop-watchdog.js # removes the state file
├── lib/ # shared modules (reused by all entry points)
│ ├── constants.js # state path pattern, marker tokens, prompt templates
│ ├── log.js # stderr diagnostics
│ ├── stdin.js # sync stdin reader
│ ├── state.js # atomic state file lifecycle
│ ├── transcript.js # JSONL parser + current-turn tool extraction
│ ├── judge.js # Claude Code classifier subprocess + verdict parser
│ └── claude-pid.js # process ancestry walk
├── test/ # node:test unit + integration tests
│ ├── fixtures/ # transcript JSONL fixtures
│ ├── transcript.test.js
│ ├── state.test.js
│ ├── judge.test.js
│ ├── claude-pid.test.js
│ ├── setup.test.js
│ ├── stop-watchdog.test.js
│ ├── stop-hook.test.js
│ └── stop-hook-haiku.test.js
├── .github/ # CI workflow (node --test matrix, jsonlint, markdownlint) + issue/PR templates
├── .gitattributes # forces LF line endings
├── LICENSE # Apache License 2.0
├── NOTICE # attribution to ralph-loop
├── README.md # this file
└── README.{zh,ja,ko,es,vi,pt}.md # translations
Watchdog is a derivative work of Anthropic's ralph-loop plugin (Apache License 2.0, © Anthropic, PBC). The original ralph-loop used a <promise>COMPLETE</promise> XML-tag protocol where the agent explicitly declared completion.
Watchdog keeps the core mechanic — a Stop hook that re-feeds the prompt — and changes these things on top:
| Watchdog | ralph-loop | |
|---|---|---|
| Exit trigger | A short-lived Claude Code subprocess (claude -p --model haiku) is the sole judge. It reads every tool invocation's full input and decides semantically whether any project file was directly modified. |
The agent must emit a <promise>…</promise> XML tag in its final text. The phrase inside the tags is configurable via --completion-promise "…" (e.g. COMPLETE, DONE). A Stop hook grep matches the exact string. |
| Exit precondition | Tools must have been called AND the classifier subprocess says NO_FILE_CHANGES |
Just the <promise> text match. The agent can cheat by emitting the tag prematurely; ralph-loop's only defense is a prompt that asks the agent not to lie. |
| Agent visibility | Completely hidden (no systemMessage, no banner, stderr-only diagnostics) | Agent is told about the loop and the promise protocol |
| State scoping | One state file per Claude Code session — unlimited concurrent watchdogs in the same project | One state file per project — only ONE ralph-loop can run per project at a time |
| State file format | JSON (parsed with native JSON.parse) |
Markdown with YAML frontmatter (parsed with sed/awk/grep) |
| Runtime | Node.js 18+ | Bash + jq + POSIX coreutils |
| Prompt input | Inline via $ARGUMENTS, or --prompt-file <path> — reads the file directly with Node's fs.readFileSync, bypassing shell argument parsing entirely. Safe for multi-paragraph Markdown containing newlines, quotes, backticks, $, etc. UTF-8 BOM is stripped automatically; CRLF is preserved byte-for-byte. |
Inline via $ARGUMENTS in the slash command's ! shell block only. Any unescaped ", `, $, or newline in the prompt breaks bash parsing with unexpected EOF. No file or stdin fallback — multi-paragraph Markdown task specs must be mangled into a single-line, shell-safe string first. |
| Convergence flexibility | --exit-confirmations N requires N consecutive clean NO_FILE_CHANGES verdicts before exit (default 1). --no-classifier skips Haiku entirely for ralph-loop-style runs that exit only via --max-iterations or /watchdog:stop. |
A single <promise>…</promise> tag-emit-then-grep mechanism with no tunable strictness — either the agent emits the configured promise phrase or it doesn't. |
| Prompt evolution | --watch-prompt-file hot-reloads --prompt-file on every iteration. You can edit the task spec mid-loop and the next turn picks it up (and resets the convergence streak, since the task changed). Missing/empty/unreadable file silently keeps the cached prompt — hot-reload never crashes the loop. |
Prompt is fixed at /ralph-loop "..." time and cannot be changed without canceling and restarting the loop. |
See NOTICE for the full attribution and the complete list of modifications.
Apache License 2.0. See LICENSE and NOTICE.
Watchdog is a derivative work of ralph-loop (© Anthropic, PBC, Apache 2.0). This project is not affiliated with or endorsed by Anthropic.
Inspired by: ralph-loop (Anthropic, PBC)
Watch the agent. Catch the lies. Stop only when the work is truly done.