Dual-phase Upfront-plan & Unit-verify Loop — an MCP server that uses LLMs as peer reviewers for development plans and code. Supports OpenAI, Anthropic, Google, OpenRouter, and any OpenAI-compatible provider.
DUUL is a Model Context Protocol server that enables any MCP client (such as Claude Desktop or Claude Code) to request structured peer reviews from external LLMs. It implements a 2-phase review loop:
- Upfront-plan Review -- A Senior Architect persona reviews the implementation plan before any code is written.
- Unit-verify Review -- A Strict QA Engineer persona reviews the code against the approved plan.
The calling agent iterates with the reviewer on each phase until it receives an APPROVE verdict, then moves to the next phase. This creates a cross-model peer review workflow where one LLM checks the work of another.
Token-efficient by design: Phase 1 (plan authoring) is delegated to a Sonnet-class subagent, since the reviewer catches any plan issues anyway. Phase 2 (code implementation) stays on Opus for maximum code quality. This typically reduces Phase 1 token costs by ~80%.
The reviewer has workspace-aware file exploration -- when given a workspace_root, it can autonomously browse the codebase using 7 built-in tools (read files, search code, list directories, etc.) to make informed review decisions instead of speculating.
- Node.js 20+
- API key for at least one supported provider (OpenAI, Anthropic, Google, or OpenRouter)
- Recommended: ripgrep (
rg) for faster code search within the reviewer's workspace exploration. Without it, the reviewer falls back togit greporgrep, which are significantly slower on large codebases.
# macOS
brew install ripgrep
# Ubuntu / Debian
sudo apt install ripgrep
# Windows (scoop)
scoop install ripgrepclaude mcp add duul \
-e OPENAI_API_KEY=sk-... \
-- npx -y @planningo/duulOr add manually to your project-level .mcp.json:
{
"mcpServers": {
"duul": {
"command": "npx",
"args": ["-y", "@planningo/duul"],
"env": {
"OPENAI_API_KEY": "sk-..."
}
}
}
}Add the following to your claude_desktop_config.json:
{
"mcpServers": {
"duul": {
"command": "npx",
"args": ["-y", "@planningo/duul"],
"env": {
"OPENAI_API_KEY": "sk-...",
"REVIEW_PROVIDER": "openai"
}
}
}
}git clone https://github.com/Planningo/duul.git
cd duul
npm install
npm run buildThen point the MCP config at node /absolute/path/to/duul/build/index.js instead of npx -y @planningo/duul.
Once installed, just ask in natural language: "run DUUL" or "use DUUL for this".
All configuration is done via environment variables, passed through the MCP env block (not a .env file).
| Variable | Required | Default | Description |
|---|---|---|---|
REVIEW_PROVIDER |
No | openai |
Provider: openai, anthropic, google, openrouter, compatible |
REVIEW_MODEL |
No | Provider default | Model ID (e.g. gpt-5.4, claude-opus-4-20250514, gemini-3.1-pro-preview) |
OPENAI_API_KEY |
Conditional | -- | Required for openai or compatible provider |
ANTHROPIC_API_KEY |
Conditional | -- | Required for anthropic provider |
GOOGLE_API_KEY |
Conditional | -- | Required for google provider |
OPENROUTER_API_KEY |
Conditional | -- | Required for openrouter provider |
REVIEW_API_KEY |
No | -- | API key for compatible provider (falls back to OPENAI_API_KEY) |
Default models per provider:
- OpenAI:
gpt-5.4 - Anthropic:
claude-opus-4-20250514 - Google:
gemini-3.1-pro-preview
Each phase has a maximum number of review iterations. When exceeded, the server returns requires_human_review: true so the caller can escalate to a human.
| Variable | Default | Description |
|---|---|---|
MAX_PLAN_REVIEW_ITERATIONS |
7 |
Max plan review rounds before human escalation |
MAX_CODE_REVIEW_ITERATIONS |
7 |
Max code review rounds before human escalation |
MAX_PARTITION_ITERATIONS |
5 |
Max execution partition rounds before human escalation |
Example: relaxed limits for complex projects
{
"mcpServers": {
"duul": {
"command": "node",
"args": ["/absolute/path/to/duul/build/index.js"],
"env": {
"OPENAI_API_KEY": "sk-...",
"MAX_PLAN_REVIEW_ITERATIONS": "10",
"MAX_CODE_REVIEW_ITERATIONS": "10",
"MAX_PARTITION_ITERATIONS": "7"
}
}
}
}Example: tight limits for quick tasks
{
"env": {
"MAX_PLAN_REVIEW_ITERATIONS": "3",
"MAX_CODE_REVIEW_ITERATIONS": "3"
}
}Opt-in cap on the total bytes the reviewer can pull from the workspace via its file-exploration tools per review call. When set, once exceeded subsequent tool calls return a budget-exhausted message so the reviewer submits its verdict instead of continuing to request files.
| Variable | Default | Description |
|---|---|---|
DUUL_MAX_REVIEWER_BYTES |
(unset = no cap) | Max cumulative bytes returned by reviewer file tools per review call |
Unset by default: early measurements showed a 200KB default tripped ~1/3 of code reviews into spurious REVISEs, which actually cost more rounds. If you want the cap, set it explicitly — 200000–500000 is a reasonable starting range for cost-conscious setups. Raise or lower based on how complex your typical review is.
You can also override the iteration limit on individual review calls via the max_review_iterations input parameter (range: 1–20). This takes priority over the environment variable.
{
"plan": "...",
"max_review_iterations": 3,
"iteration_count": 1
}Priority order: per-request max_review_iterations > environment variable > default.
Each review request can include a reviewer_config object to override provider and model settings:
{
"reviewer_config": {
"provider": "anthropic",
"model": "claude-opus-4-20250514",
"temperature": 0.3,
"top_p": 0.2
}
}| Field | Type | Default | Description |
|---|---|---|---|
provider |
string |
env / openai |
openai, anthropic, google, openrouter, compatible |
model |
string | { plan?, code?, partition? } |
env / provider default | Model identifier. Pass an object to use different models per tool (see below). |
base_url |
string |
-- | Custom API endpoint (for compatible or self-hosted) |
api_key |
string |
-- | Per-request API key (overrides env) |
temperature |
number |
0.2 |
Sampling temperature (0–2) |
top_p |
number |
0.1 |
Nucleus sampling (0–1) |
model can be a single string (applied to every review tool) or an object with per-tool overrides. Tools that are not listed fall back to REVIEW_MODEL/provider default.
{
"reviewer_config": {
"model": {
"code": "claude-opus-4-20250514"
}
}
}Intended direction: upgrade, not downgrade. Plan-phase defects compound through implementation, so the default for plan should stay on a strong model. This knob is for users who want to spend MORE on code_review (e.g. use Opus for code while keeping plan on the default), not to save money by weakening plan.
Empirical numbers from real DUUL usage in this repo (42 reviewer calls, gpt-5.4, prompt caching enabled). Treat as a rough budgeting guide — your numbers will vary with project size and review complexity.
| Tool | Avg tokens/call | Avg cost/call | Cache hit rate |
|---|---|---|---|
plan_review |
100,966 | $0.065 | 79% |
code_review |
179,837 | $0.122 | 79% |
| Combined avg | 132,890 | $0.088 | 79% |
A typical task (1–3 plan rounds + 1–2 code rounds) usually lands around $0.30–$0.50 in reviewer cost.
What drives cost down:
- Anthropic / OpenAI prompt caching (~30% reduction on iterating sessions; cache reads billed at 0.1× input rate)
- Per-tool model override (
reviewer_config.model = { code: "claude-opus-4" }to escalate code-only) - Optional file-read budget (
DUUL_MAX_REVIEWER_BYTES) for hard cost ceilings
Measure your own usage:
node scripts/token-report.mjs --plan max20 --all-timeReads ~/.duul/usage.jsonl (set DUUL_DEBUG_TOKEN=1 in your MCP env to enable logging) and ~/.claude/projects/<encoded-cwd>/*.jsonl for combined Claude Code + reviewer breakdown.
flowchart TD
Start(["User: 'run DUUL'"]):::trigger --> Plan["Write implementation plan\n(Sonnet subagent)"]:::sonnet
subgraph Phase1["Phase 1: Plan Ping-Pong — Sonnet (max 7 iterations)"]
Plan --> PR["request_plan_review"]
PR --> IterCheck1{iteration\nlimit?}
IterCheck1 -- "exceeded" --> Human1["⏸ requires_human_review: true"]
IterCheck1 -- "within limit" --> Review1[/"LLM Reviewer\n(Senior Architect)"/]
Review1 --> Status1{review_status?}
Status1 -- "incomplete" --> Narrow1["Retry with narrower scope\n(fewer artifact_refs)"]
Narrow1 --> PR
Status1 -- "completed" --> Verdict1{verdict?}
Verdict1 -- "REVISE" --> Fix1["Fix plan based on\nblocking_issues"]
Fix1 --> PR
Verdict1 -- "APPROVE" --> PlanOK(["Plan Approved ✓"]):::approved
end
PlanOK --> Impl["Implement code\n(write actual files)"]:::opus
subgraph Phase2["Phase 2: Code Ping-Pong — Opus (max 7 iterations)"]
Impl --> CR["request_code_review\n+ approved_plan"]
CR --> IterCheck2{iteration\nlimit?}
IterCheck2 -- "exceeded" --> Human2["⏸ requires_human_review: true"]
IterCheck2 -- "within limit" --> Review2[/"LLM Reviewer\n(Strict QA Engineer)"/]
Review2 --> Status2{review_status?}
Status2 -- "incomplete" --> Narrow2["Retry with narrower scope"]
Narrow2 --> CR
Status2 -- "completed" --> Verdict2{verdict?}
Verdict2 -- "REVISE" --> Fix2["Fix code based on\nblocking_issues + vulnerabilities"]
Fix2 --> CR
Verdict2 -- "APPROVE" --> CodeOK(["Code Approved ✓"]):::approved
end
CodeOK --> Done(["Done: Plan approved & code review passed"]):::done
classDef trigger fill:#e1f5fe,stroke:#0288d1,color:#01579b
classDef approved fill:#e8f5e9,stroke:#388e3c,color:#1b5e20
classDef done fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20,stroke-width:2px
classDef sonnet fill:#fff3e0,stroke:#f57c00,color:#e65100
classDef opus fill:#ede7f6,stroke:#7b1fa2,color:#4a148c
After Phase 1 approval, large plans can be split into parallelizable subtasks before Phase 2:
flowchart LR
PlanOK(["Plan Approved"]) --> EP["request_execution_partition"]
EP --> Mode{execution_mode?}
Mode -- "serial" --> Serial["Single agent\nexecutes all"]
Mode -- "parallel" --> Parallel["Spawn N agents\n(new workspaces)"]
Mode -- "hybrid" --> Hybrid["Mix: parallel groups\n+ serial checkpoints"]
Serial --> Phase2["Phase 2 per subtask"]
Parallel --> Phase2
Hybrid --> Phase2
The DUUL loop is activated by mentioning "DUUL" in conversation. The server embeds workflow instructions that the MCP client picks up automatically.
Trigger examples:
- "run DUUL", "use DUUL for this", "start DUUL"
Not triggers (these are normal requests the agent handles itself):
- "review my code", "check this", "look over my plan"
DUUL Phase 1: Submit a development plan for review by an LLM acting as a Senior Software Architect.
Input Schema:
| Field | Type | Required | Description |
|---|---|---|---|
plan |
string |
Yes | Detailed implementation plan |
project_context |
object |
No | Structured project context |
project_context.file_tree |
string |
No | Project file tree summary (max 2000 chars) |
project_context.changed_files |
string[] |
No | List of files related to this change |
project_context.package_versions |
Record<string, string> |
No | Key package versions |
project_context.relevant_code |
Array<{ file_path, code }> |
No | Existing code snippets for context |
constraints |
string[] |
No | Special constraints: performance, memory, security, etc. |
notes_to_reviewer |
string |
No | Context or rebuttals for the reviewer |
workspace_root |
string |
No | Absolute path to workspace root (enables file exploration) |
project_root |
string |
No | Deprecated -- use workspace_root |
working_directories |
string[] |
No | Subdirectories to restrict file access to |
linked_roots |
string[] |
No | Read-only external workspace roots (max 5) |
changed_files |
string[] |
No | Files changed in this review scope (top-level) |
entrypoints |
string[] |
No | Entry point files the reviewer should start from |
artifact_refs |
Array<{ path, reason, priority }> |
No | Important file references with priority (max 30) |
tracked_only |
boolean |
No | Only allow access to git-tracked files |
git_head_sha |
string |
No | Current git HEAD SHA |
previous_git_head_sha |
string |
No | Previous review round's git HEAD SHA |
previous_review_id |
string |
No | Response ID from previous review call |
iteration_count |
number |
No | Current iteration number (caller tracks, server enforces limit) |
max_review_iterations |
number |
No | Override default iteration limit (1–20) |
reviewer_config |
object |
No | Per-request reviewer configuration |
Output Schema:
| Field | Type | Description |
|---|---|---|
verdict |
"APPROVE" | "REVISE" |
Final verdict |
review_status |
"completed" | "incomplete" |
Whether the review was fully completed |
confidence |
number (0-1) |
Confidence in the verdict, advisory only |
requires_human_review |
boolean |
Whether a human should review this |
architectural_analysis |
string |
Structural pros/cons analysis |
blocking_issues |
Array<{ description, suggestion }> |
Issues that must be fixed before proceeding |
merge_blockers |
Array<{ description, suggestion }> | null |
Subset of blocking_issues that should block merge |
non_blocking_suggestions |
string[] |
Optional improvement suggestions |
edge_cases |
string[] |
Unconsidered edge cases |
checklist_for_implementation |
string[] |
Must-follow checklist for implementation |
follow_up_todos |
string[] | null |
Follow-up tasks after implementation |
missing_context |
string[] | null |
Files or context the reviewer could not access |
evidence_files |
string[] | null |
Files the reviewer examined as evidence |
used_tools |
string[] | null |
Tool calls made during review |
tool_exhaustion_reason |
"budget" | "repeat" | "round_limit" | null |
Why the tool loop was exhausted (if incomplete) |
review_id |
string |
Response ID for maintaining context across rounds |
iteration_count |
number |
Current iteration count (echoed back) |
iteration_limit |
number |
Effective iteration limit for this phase |
iteration_limit_reached |
boolean |
Whether the iteration limit was reached |
parallelization_hint |
"serial" | "parallel" | "hybrid" | null |
Whether the plan can be parallelized |
coordination_risks |
string[] | null |
Risks if parallelizing |
recommended_subtask_boundaries |
string[] | null |
Suggested subtask splits |
DUUL Phase 2: Submit code for review by an LLM acting as a Strict QA Engineer. Requires the previously approved plan.
Input Schema:
| Field | Type | Required | Description |
|---|---|---|---|
code |
string |
Yes | The code to review |
approved_plan |
string |
Yes | The previously approved plan this code implements |
file_path |
string |
No | File path for contextual feedback |
dependencies |
object |
No | Related library version info |
relevant_code |
Array<{ file_path, code }> |
No | Related code snippets for context |
notes_to_reviewer |
string |
No | Context or rebuttals for the reviewer |
workspace_root |
string |
No | Absolute path to workspace root (enables file exploration) |
working_directories |
string[] |
No | Subdirectories to restrict file access to |
linked_roots |
string[] |
No | Read-only external workspace roots (max 5) |
changed_files |
string[] |
No | Files changed in this review scope |
entrypoints |
string[] |
No | Entry point files |
artifact_refs |
Array<{ path, reason, priority }> |
No | Important file references (max 30) |
tracked_only |
boolean |
No | Only allow access to git-tracked files |
git_head_sha |
string |
No | Current git HEAD SHA |
previous_review_id |
string |
No | Response ID from previous review call |
iteration_count |
number |
No | Current iteration number |
max_review_iterations |
number |
No | Override default iteration limit (1–20) |
reviewer_config |
object |
No | Per-request reviewer configuration |
Output Schema:
| Field | Type | Description |
|---|---|---|
verdict |
"APPROVE" | "REVISE" |
Final verdict |
review_status |
"completed" | "incomplete" |
Whether the review was fully completed |
confidence |
number (0-1) |
Confidence in the verdict, advisory only |
requires_human_review |
boolean |
Whether a human should review this |
logic_validation |
string |
How accurately the code implements the approved plan |
blocking_issues |
Array<{ description, suggestion }> |
Issues that must be fixed |
merge_blockers |
Array<{ description, suggestion }> | null |
Subset that should block merge |
non_blocking_suggestions |
string[] |
Optional improvement suggestions |
vulnerabilities |
Array<{ type, description, severity }> |
Security/performance vulnerabilities |
optimized_snippet |
string | null |
Optimized code block, or null |
follow_up_todos |
string[] | null |
Follow-up tasks |
missing_context |
string[] | null |
Context the reviewer could not access |
review_id |
string |
Response ID for context continuity |
iteration_count |
number |
Current iteration count |
iteration_limit |
number |
Effective iteration limit |
iteration_limit_reached |
boolean |
Whether the limit was reached |
When workspace_root is provided, the reviewer gains access to 7 file exploration tools:
| Tool | Description |
|---|---|
read_file |
Read entire file content (warns if > 50KB) |
list_directory |
List files and directories |
search_in_files |
Regex search across files (uses rg > git grep > grep) |
read_file_range |
Read specific line range (max 200 lines) |
stat_file |
Get file size, modification time, and type |
read_json |
Read JSON file with optional JSON pointer |
list_tracked_files |
List git-tracked files with optional prefix filter |
- Blocked paths:
.git/,build/,dist/,*.log linked_rootsare read-onlytracked_only: truerestricts to git-tracked files only- Symlink escape from workspace/linked roots is prevented
- System directories and shallow paths (< 3 depth) are rejected
| Provider | Structured Outputs | Tool Calling | Previous Response ID | JSON Schema Strict |
|---|---|---|---|---|
| OpenAI | Yes | Yes | Yes | Yes |
| Anthropic | No (JSON prompt + zod) | No | No | No |
| No (JSON mode + zod) | No | No | No | |
| OpenRouter | Yes (via OpenAI API) | Yes | Yes | Yes |
| Compatible | Yes (via OpenAI API) | Yes | Yes | Yes |
Degradation behavior:
- No structured outputs: JSON prompting + zod validation fallback.
- No tool calling: Reviewer cannot explore the workspace. Provide more context via
relevant_codeandartifact_refs. - No previous response ID: Each review call is independent (no conversation memory).
src/
index.ts Entry point. MCP server + stdio transport.
schemas/
common.ts Shared schemas (ArtifactRef, ReviewerConfig, IterationMeta).
plan-review.ts Plan review input/output schemas.
code-review.ts Code review input/output schemas.
execution-partition.ts Execution partition input/output schemas.
prompts/
plan-review-system.ts Senior Architect system prompt.
code-review-system.ts Strict QA Engineer system prompt.
execution-partition-system.ts Project Manager system prompt.
services/
reviewer.ts Provider factory + callReview() dispatcher.
review-limits.ts Iteration limit resolution and enforcement.
filesystem.ts Workspace-scoped file operations + security.
providers/
types.ts ReviewerProvider interface + capabilities.
openai.ts OpenAI: structured outputs + tool loop.
anthropic.ts Anthropic: JSON prompt + zod.
google.ts Google: JSON mode + zod.
tools/
plan-review.ts request_plan_review MCP tool.
code-review.ts request_code_review MCP tool.
execution-partition.ts request_execution_partition MCP tool.
MIT
