Skip to content

feat: agent self-correction via validation feedback loop#57

Merged
nicknisi merged 14 commits intomainfrom
nicknisi/cli-agent-resiliance
Feb 14, 2026
Merged

feat: agent self-correction via validation feedback loop#57
nicknisi merged 14 commits intomainfrom
nicknisi/cli-agent-resiliance

Conversation

@nicknisi
Copy link
Member

@nicknisi nicknisi commented Feb 14, 2026

Summary

  • Add fast typecheck/build validation that runs between agent turns, giving the agent structured feedback to self-correct within the same session (up to 2 retries)
  • Auto-detect build systems across ecosystems: JS (package.json), Go (go.mod), Elixir (mix.exs), .NET (*.csproj), Kotlin/Java (build.gradle) — interpreted languages pass through silently
  • Unify eval executor with production runAgent so evals exercise the actual retry path
  • Track three-tier pass rates: first-attempt, with-correction, with-retry

Why

The installer ran its agent as a single-shot operation — when validation caught fixable issues, the results went to the user, not back to the agent. The agent never got a chance to fix its own mistakes.

Eval results (14 frameworks, --state=example):

Metric Value
First-attempt pass rate 92.9%
With-correction pass rate 100%
Self-corrected scenarios 1 of 14
Quality score 4.5/5

Architecture

Agent writes code
    ↓
Typecheck (tsc --noEmit, TS only, ~5s)
    ↓
Pass? → Build (auto-detected per ecosystem)
    ↓         ↓
    ↓    Pass? → Full validation (env vars, files, patterns)
    ↓      ↓
    ↓   Format errors → yield back into same SDK conversation
    ↓      ↓
    ↓   Agent fixes (retains full context, max 2 retries)

The retry loop uses an async generator that yields follow-up user messages into the SDK's query(). The agent retains full conversation context.

Changes

Quick checks (src/lib/validation/quick-checks.ts): Typecheck + build as composable steps. Short-circuits on typecheck failure. quickCheckValidateAndFormat shared between production and evals.

Multi-ecosystem build detection (src/lib/validation/build-validator.ts): detectBuildCommand checks package.json, go.mod, mix.exs, *.csproj, build.gradle. Returns null for interpreted languages.

Retry loop (src/lib/agent-interface.ts): Async generator yields correction prompts on validation failure. Promise-based turn coordination. Exports AgentRunConfig + onMessage hook for evals.

Evals (tests/evals/agent-executor.ts): Delegates to production runAgent. Three-tier success criteria: first-attempt (80%), with-correction (90%), with-retry (95%). --no-correction flag.

Validator composability (src/lib/validation/validator.ts): Exported validatePackages, validateEnvVars, validateFiles, validateFrameworkSpecific with return-based signatures.

Notes

  • dotnet eval scenario disabled (broken SDK)
  • Quality grader JSON parsing fixed (greedy regex matched braces inside <thinking> tags)

Restructure validation into composable steps so typecheck (~5s) runs
independently before full validation. Quick checks short-circuit on
typecheck failure and format errors as actionable agent prompts,
laying the foundation for the agent retry loop.
Extend the async generator in agent-interface to yield follow-up
correction prompts when quick-checks (typecheck/build) fail. The agent
retains full conversation context and gets up to 2 chances to fix its
own mistakes before results surface to the user. Configurable via
maxRetries option (default 2, 0 to disable).
Add retry-aware execution to AgentExecutor using the same async
generator + quick-checks pattern from production. Evals now track
three tiers: first-attempt, with-correction, and with-retry pass
rates. Adds --no-correction flag to disable for baseline comparison.
AgentExecutor now delegates to the production runAgent instead of
reimplementing the retry-aware async generator. Exports AgentRunConfig
so evals can construct it directly, adds onMessage hook for latency
tracking. Includes 13 tests verifying the wiring.
…rics

First-attempt now means zero corrections, which is stricter than before.
Lower threshold to 30% (aspirational), add withCorrectionPassRate at 90%
as the primary quality gate, keep withRetryPassRate at 95%.
Two eval runs show ~21-27% first-attempt rate. The correction loop
consistently brings it to 93-100%. Set threshold at 20% to catch
regressions without failing on normal variance.
…hreshold

detectTypecheckCommand was falling back to npx tsc --noEmit for every
project including Python, Ruby, Go, etc. Now checks for tsconfig.json
before falling back — no tsconfig means skip typecheck entirely. This
eliminates false correction triggers on non-JS frameworks.

Raises first-attempt threshold to 50% since the false positives were
the main driver of the low rate.
…port

Extend quick-checks to auto-detect Go (go.mod), Elixir (mix.exs),
.NET (*.csproj), and Kotlin/Java (build.gradle) build commands from
project files. Interpreted languages (Python, Ruby, PHP) pass through
silently — no universal build command exists for them.
…parsing

Raise firstAttemptPassRate from 50% to 80% now that false positives
from non-TS projects are eliminated (85.7% observed in latest run).

Fix quality grader parsing: the greedy regex matched braces inside
<thinking> tags. Now extracts JSON only after </thinking> and uses
a non-greedy pattern to avoid capturing nested objects.
…move dead code

Extract passResult helper (4 identical object literals → 1 function),
unify parseTypecheckErrors into single regex with Set dedup, extract
quickCheckValidateAndFormat shared between agent-runner and eval
executor, remove getIntegration indirection and dead continueUrl param.
@nicknisi nicknisi merged commit 920fc87 into main Feb 14, 2026
5 checks passed
@nicknisi nicknisi deleted the nicknisi/cli-agent-resiliance branch February 14, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant