diff --git a/docs/cli-reference.md b/docs/cli-reference.md index c3597e4..41704fc 100644 --- a/docs/cli-reference.md +++ b/docs/cli-reference.md @@ -288,6 +288,8 @@ When the model invokes a tool, a dim `> tool_name(arg)` line is printed and the | `/safe-mode off` | Restore tool categories to their state before safe mode was enabled | | `/provider` | Show the current model, endpoint, and API key store | | `/provider setup` | Reconfigure provider URL, model ID, and API key; saves immediately | +| `/max-tokens ` | Cap the model's output to `n` tokens per response | +| `/max-tokens reset` | Restore the provider's default max output tokens | | `/exit` | End the session | When safe mode is active, the prompt gains a `[safe]` prefix as a persistent visual reminder. diff --git a/docs/strategies.md b/docs/strategies.md index ce92453..78cb4df 100644 --- a/docs/strategies.md +++ b/docs/strategies.md @@ -71,7 +71,7 @@ Selection: 1. **Tool-call routing (preferred):** If the agent calls `handoff(route_keyword: "...")` via the `Handoff` plugin, the argument is used directly as the routing keyword — no text scanning occurs. This is the most reliable signal because it is a typed function argument, not free text. fuseraft also terminates the agent's tool loop immediately when `handoff` is called, so the agent cannot accidentally call other tools after signalling completion. Add `- Handoff` to an agent's `Plugins` list and instruct it to `call handoff(route_keyword: "KEYWORD")` instead of emitting the keyword as text. 2. **Text scanning (fallback):** If no `handoff` tool call is present, the response text is scanned for every keyword configured in `Routes`. -3. **Strict matching** — a text keyword matches only when it appears **alone on its own line** (after stripping markdown formatting characters `*` and `_`). A keyword embedded in a sentence or used as a prose section header (e.g. `BUGS FOUND: 3 failures`) does not match. This prevents accidental routing when agents reference another role's keyword in their output. +3. **Line matching** — a text keyword matches when it appears **alone on its own line** (exact match) or at the **start of a line followed by whitespace or punctuation** (e.g. `BUGS FOUND: all issues fixed`), after stripping markdown formatting characters `*` and `_`. A keyword embedded mid-sentence does not match. This prevents accidental routing when agents reference another role's keyword in their output. 4. If **multiple** text keywords appear on their own lines in the same response, the response is rejected as ambiguous and a correction is injected asking the agent to use exactly one keyword. This prevents silent first-match bias from config ordering. 5. The **single** matched keyword is checked against `SourceAgents` — the route only fires if the message author is in that list (or if `SourceAgents` is omitted). 6. If a route has validators (`Validator` or `Validators`), they run before the route fires. If validation fails, the source agent is re-invoked with an error message injected. @@ -267,7 +267,7 @@ Selection: 6. If no signal is detected, the current state's agent is re-invoked with a nudge listing the available signals. 7. A `Terminal: true` state re-invokes its agent every turn until the `Termination` strategy fires — it has no outgoing transitions. -**Signal detection rules** are the same as keyword routing: the signal must appear alone on its own line (after stripping `*`/`_` markdown). Agents may also use the `Handoff` plugin (`handoff(route_keyword: "SIGNAL")`) for typed, unambiguous signalling. +**Signal detection rules** are the same as keyword routing: the signal must appear alone on its own line or at the start of a line followed by whitespace or punctuation (after stripping `*`/`_` markdown). Agents may also use the `Handoff` plugin (`handoff(route_keyword: "SIGNAL")`) for typed, unambiguous signalling. **`StateMachineConfig` fields** @@ -338,50 +338,50 @@ A declarative directed graph where each agent is bound to a named node and edges Selection: Type: graph Graph: - Entry: planner + EntryNode: planner Nodes: - Id: planner Agent: Planner - Edges: - - To: developer - Keyword: "HANDOFF TO DEVELOPER" - Validators: - - RequireBrief - - Id: developer Agent: Developer - Edges: - - To: tester - Keyword: "HANDOFF TO TESTER" - Validators: - - RequireWriteFile - - RequireShellPass - - To: planner - Keyword: REPLAN REQUIRED - - Id: tester Agent: Tester - Edges: - - To: reviewer - Keyword: "HANDOFF TO REVIEWER" - Validators: - - TestReportValid - - To: developer - Keyword: BUGS FOUND - - Id: reviewer Agent: Reviewer - Edges: - - To: approved - Keyword: APPROVED - Validators: - - RequireReviewJudgement - - To: developer - Keyword: REVISION REQUIRED - - Id: approved Agent: Reviewer Terminal: true + Edges: + - From: planner + To: developer + Keyword: "HANDOFF TO DEVELOPER" + Validators: + - RequireBrief + - From: developer + To: tester + Keyword: "HANDOFF TO TESTER" + Validators: + - RequireWriteFile + - RequireShellPass + - From: developer + To: planner + Keyword: REPLAN REQUIRED + - From: tester + To: reviewer + Keyword: "HANDOFF TO REVIEWER" + Validators: + - TestReportValid + - From: tester + To: developer + Keyword: BUGS FOUND + - From: reviewer + To: approved + Keyword: APPROVED + Validators: + - RequireReviewJudgement + - From: reviewer + To: developer + Keyword: REVISION REQUIRED ``` **How it works** @@ -389,7 +389,7 @@ Selection: 1. **BFS layer assignment:** at startup, fuseraft computes a BFS layer for every node from the entry node following only forward edges. Edges are classified as *forward* (target layer > source layer) or *back-edges* (target layer ≤ source layer). 2. **Forward edges** activate the target agent in the current multi-agent phase via normal framework messaging. 3. **Back-edges** break the current phase. When a back-edge keyword is detected and all validators pass, the orchestrator terminates the active phase and restarts execution from the target node. -4. **Keyword detection** uses the same rules as keyword routing: the keyword must appear alone on its own line (after stripping `*`/`_` markdown), or be emitted via the `Handoff` plugin (`handoff(route_keyword: "KEYWORD")`). Only the current node's outgoing edges are checked — keywords that belong to other nodes are ignored. +4. **Keyword detection** uses strict line matching: the keyword must appear **alone on its own line** with no trailing text (after stripping `*`/`_` markdown), or be emitted via the `Handoff` plugin (`handoff(route_keyword: "KEYWORD")`). This is stricter than keyword routing — a keyword at the start of a line followed by punctuation (e.g. `APPROVED: see notes`) does not match in graph mode. Only the current node's outgoing edges are checked — keywords that belong to other nodes are ignored. 5. **Terminal nodes** (`Terminal: true`) invoke the termination check before keyword detection. Back-edges on a terminal node are unreachable — if you need a terminal outcome with evidence gating, use a routing node whose forward edge points to a separate terminal node with validators on that edge. 6. **Unconditional edges** (no `Keyword`) fire automatically after the agent's turn without keyword scanning. Unconditional forward edges hand off immediately; unconditional back-edges break the phase immediately. @@ -398,21 +398,24 @@ Selection: A single node may declare back-edges to different target nodes — the key differentiator from keyword routing's loop-back conventions. In the example below the `reviewer` node routes back to two different targets depending on which keyword fires: ```yaml -- Id: reviewer - Agent: Reviewer - Edges: - - To: approved - Keyword: APPROVED - Validators: - - RequireReviewJudgement - - To: developer - Keyword: REVISION REQUIRED # back-edge → developer - - To: planner - Keyword: REPLAN REQUIRED # back-edge → planner (different target) - -- Id: approved - Agent: Reviewer - Terminal: true +Nodes: + - Id: reviewer + Agent: Reviewer + - Id: approved + Agent: Reviewer + Terminal: true +Edges: + - From: reviewer + To: approved + Keyword: APPROVED + Validators: + - RequireReviewJudgement + - From: reviewer + To: developer + Keyword: REVISION REQUIRED # back-edge → developer + - From: reviewer + To: planner + Keyword: REPLAN REQUIRED # back-edge → planner (different target) ``` In keyword routing this pattern requires two separate loop-back routes and depends on keyword scanning order. In graph routing the topology is explicit: each edge has a distinct target. @@ -421,29 +424,33 @@ In keyword routing this pattern requires two separate loop-back routes and depen | Field | Type | Required | Description | |-------|------|----------|-------------| -| `Entry` | string | yes | Node ID of the first node to execute. | -| `Nodes` | array | yes | Ordered list of `GraphNodeConfig`. At least one node required. | +| `EntryNode` | string | no | Node ID of the first node to execute. Defaults to the first node when omitted. | +| `Nodes` | array | yes | Node definitions. Each binds an agent role to a named position in the graph. | +| `Edges` | array | yes | Directed edges. Evaluated in declaration order — the first matching edge fires. | +| `MaxRetries` | int | `4` | Maximum consecutive correction attempts per node before a `ValidatorStuckException` is thrown. | **`GraphNodeConfig` fields** | Field | Type | Default | Description | |-------|------|---------|-------------| -| `Id` | string | — | Unique node identifier. Referenced by edges' `To` field and by `Entry`. | +| `Id` | string | — | Unique node identifier. Referenced by `EntryNode` and by edges' `From`/`To` fields. | | `Agent` | string | — | Agent name from the `Agents` list to invoke at this node. Multiple nodes may share the same agent. | -| `Terminal` | bool | `false` | When `true`, the termination check fires before keyword detection. No outgoing edges are evaluated after termination fires. | -| `Edges` | array | `[]` | Outgoing edges from this node. Empty means the agent runs until the `Termination` strategy fires. | +| `Terminal` | bool | `false` | When `true`, the session terminates after the agent executes once. Outgoing edges are not evaluated. | +| `Parallel` | bool | `false` | When `true`, the node participates in a parallel fan-out group — runs concurrently with other `Parallel` nodes sharing the same triggering keyword. | +| `Validators` | array | — | Validators that must all pass before a `Terminal` node ends the session. Ignored on non-terminal nodes. | **`GraphEdgeConfig` fields** | Field | Type | Default | Description | |-------|------|---------|-------------| -| `To` | string | — | Target node ID. Must exist in `Graph.Nodes`. Forward vs. back-edge classification is computed automatically from BFS layer topology. | +| `From` | string | — | Source node ID. Must match a `GraphNodeConfig.Id`. | +| `To` | string | — | Target node ID. Must match a `GraphNodeConfig.Id`. Forward vs. back-edge classification is computed automatically from BFS layer topology. | | `Keyword` | string | — | Routing keyword. Must appear alone on its own line. When omitted, the edge is *unconditional* — it fires after the agent's turn without keyword scanning. | | `Validator` | string | — | Optional single validator. Blocks the edge until validation passes. | -| `Validators` | array | — | Optional multiple validators (AND semantics). | +| `Validators` | array | — | Optional multiple validators (AND semantics). Takes precedence over `Validator` when both are set. | | `SourceAgents` | array | any | Optional. Edge only fires when the message author is in this list. | | `RequiredCommandPattern` | string | — | Used with `RequireShellPass`. The passing command must contain at least one pipe-separated substring. | -| `ShellFallbackPattern` | string | — | Fallback command pattern if `RequiredCommandPattern` fails. | +| `ShellFallbackPattern` | string | — | Used with `RequireWriteFile`. A shell command matching this pattern is accepted in lieu of `write_file`. | | `RequireHumanApproval` | bool | `false` | When `true`, the operator must explicitly approve (`y`) before this edge fires. If rejected, the source agent is re-invoked with a "route blocked" message. Applies to both forward edges and back-edges. | | `RecoveryAgent` | string | — | Optional. Agent to invoke for one intervention turn when a validator has failed two or more consecutive times on this edge. The recovery agent receives a diagnostic message and may fix the blocking issue. Activates at most once per edge per session. | @@ -663,7 +670,7 @@ Graph fits naturally when: - The pipeline is a directed graph, not a strict linear sequence — phases fan out or converge in ways that are cleaner to express as nodes and edges than as a flat route table - You still want validators on individual edges (graph edges support the full `Validators` / `RequiredCommandPattern` surface, the same as keyword routes) -Graph and keyword routing use the same signal mechanism (keyword on own line, or `handoff()` plugin). Migrating an existing keyword config to graph requires mapping agents to node IDs and routes to edges. The main addition is the explicit `Entry` node and the `Id`/`To` structure on each edge. +Graph and keyword routing use the same `handoff()` plugin for typed signalling, but their text-scan rules differ: keyword routing uses relaxed matching (keyword at start of line followed by whitespace or punctuation also fires), while graph uses strict matching (keyword must be alone on its own line, no trailing text). Migrating an existing keyword config to graph requires mapping agents to node IDs and routes to edges. The main addition is the explicit `EntryNode` and the flat `Edges` list with `From`/`To` fields. **What graph trades away:** lossless compaction and Verifier integration. For hallucination-resistant routing where agents cannot route themselves to an unexpected node, state machine remains the stronger choice. @@ -673,7 +680,7 @@ Graph and keyword routing use the same signal mechanism (keyword on own line, or | | Keyword | State machine | Structured | Graph | |---|---|---|---|---| -| Handoff signal | Keyword on own line | Signal on own line (same matching) | JSON field value | Keyword on own line (same matching) | +| Handoff signal | Keyword on own line (relaxed) | Signal on own line (same as keyword) | JSON field value | Keyword alone on own line (strict) | | Evidence gating | Validators (per-route) | Contracts (per-transition, typed) | Instructions only | Validators (per-edge) | | Routing topology | All routes active at once | Only current state's transitions active | All routes active at once | Only current node's edges active | | Ghost signals | Possible — any agent can emit any keyword | Impossible — wrong-state signals are ignored | N/A | Reduced — wrong-node keywords are ignored | diff --git a/src/Cli/Commands/InitTemplates.Designer.cs b/src/Cli/Commands/InitTemplates.Designer.cs index 25814ba..c7a623e 100644 --- a/src/Cli/Commands/InitTemplates.Designer.cs +++ b/src/Cli/Commands/InitTemplates.Designer.cs @@ -1,3 +1,4 @@ +#nullable enable using fuseraft.Core; namespace fuseraft.Cli.Commands; diff --git a/src/Cli/Commands/Repl/ReplCommand.cs b/src/Cli/Commands/Repl/ReplCommand.cs index a9b307f..d689798 100644 --- a/src/Cli/Commands/Repl/ReplCommand.cs +++ b/src/Cli/Commands/Repl/ReplCommand.cs @@ -194,6 +194,7 @@ private static string BuildSystemPrompt( "- Prefer tools over guessing.\n" + "- Read before writing or mutating.\n" + "- Avoid destructive actions (rm, overwrite, force-push) unless explicitly requested.\n" + + "- Only write files the user explicitly requests — never create unsolicited summaries, changelogs, or status files.\n" + "- For multi-step work, briefly state intent first.\n" + "- If a command fails due to missing project/config file: search subdirs for the entry point, then run `cd && ` in one shell_run call. Note the directory used.\n" + "- Always return to the original working directory for subsequent commands unless the task explicitly requires otherwise.\n" diff --git a/src/Cli/Commands/Repl/ReplCommands.cs b/src/Cli/Commands/Repl/ReplCommands.cs index a9aa0e4..427cd7d 100644 --- a/src/Cli/Commands/Repl/ReplCommands.cs +++ b/src/Cli/Commands/Repl/ReplCommands.cs @@ -30,6 +30,7 @@ internal static async Task HandleAsync( case "/events": await CmdEventsAsync(ctx, arg); return CommandResult.Continue; case "/safe-mode": return await CmdSafeModeAsync(ctx, arg); case "/memory": return await CmdMemoryAsync(ctx, arg, cancellationToken); + case "/max-tokens": return CmdMaxTokens(ctx, arg); default: AnsiConsole.MarkupLine( $"[yellow]Unknown command:[/] {Markup.Escape(command)} [dim](type /help for commands)[/]"); @@ -410,6 +411,37 @@ private static CommandResult CmdRecover(ReplSessionContext ctx) return CommandResult.Continue; } + private static CommandResult CmdMaxTokens(ReplSessionContext ctx, string arg) + { + if (string.IsNullOrEmpty(arg)) + { + AnsiConsole.MarkupLine(ctx.MaxOutputTokens > 0 + ? $"[dim]Max output tokens:[/] [bold]{ctx.MaxOutputTokens:N0}[/]" + : "[dim]Max output tokens:[/] provider default"); + AnsiConsole.MarkupLine("[dim]Run[/] [bold]/max-tokens [/] [dim]to set, or[/] [bold]/max-tokens reset[/] [dim]to restore the provider default.[/]"); + return CommandResult.Continue; + } + + if (arg.Equals("reset", StringComparison.OrdinalIgnoreCase)) + { + ctx.MaxOutputTokens = 0; + ctx.ChatOptions = ctx.BuildChatOptions(); + AnsiConsole.MarkupLine("[dim]Max output tokens reset to provider default.[/]"); + return CommandResult.Continue; + } + + if (!int.TryParse(arg, out var n) || n <= 0) + { + AnsiConsole.MarkupLine($"[yellow]Invalid value:[/] {Markup.Escape(arg)} [dim](must be a positive integer)[/]"); + return CommandResult.Continue; + } + + ctx.MaxOutputTokens = n; + ctx.ChatOptions = ctx.BuildChatOptions(); + AnsiConsole.MarkupLine($"[dim]Max output tokens set to[/] [bold]{n:N0}[/][dim].[/]"); + return CommandResult.Continue; + } + private static async Task CmdEventsAsync(ReplSessionContext ctx, string arg) { if (!File.Exists(ctx.EventsPath)) @@ -697,6 +729,8 @@ private static void PrintHelp() AnsiConsole.MarkupLine(" [bold cyan]/memory show [/] Show full body of a memory"); AnsiConsole.MarkupLine(" [bold cyan]/memory delete [/] Delete a stored memory"); AnsiConsole.MarkupLine(" [bold cyan]/memory save[/] Extract and save memories from the current session now"); + AnsiConsole.MarkupLine(" [bold cyan]/max-tokens [/] Set max output tokens for each response"); + AnsiConsole.MarkupLine(" [bold cyan]/max-tokens reset[/] Restore provider default max output tokens"); AnsiConsole.MarkupLine(" [bold cyan]/exit[/] Exit the REPL (auto-saves memories)"); } diff --git a/src/Cli/Commands/Repl/ReplSessionContext.cs b/src/Cli/Commands/Repl/ReplSessionContext.cs index f0ab5b6..c328f44 100644 --- a/src/Cli/Commands/Repl/ReplSessionContext.cs +++ b/src/Cli/Commands/Repl/ReplSessionContext.cs @@ -76,6 +76,9 @@ public IChatClient StepClient public bool SafeMode; public HashSet? PreSafeDisabled; + // Max output tokens (0 = provider default) + public int MaxOutputTokens; + // Context growth tracking public int PrevCtxEstimate; public readonly List TurnTokenDeltas = []; @@ -131,9 +134,13 @@ public List GetActiveTools() => [.. ToolsByCategory public ChatOptions? BuildChatOptions() { var active = GetActiveTools(); - return active.Count > 0 - ? new ChatOptions { Tools = active.Cast().ToList() } - : null; + var hasTools = active.Count > 0; + var hasMax = MaxOutputTokens > 0; + if (!hasTools && !hasMax) return null; + var opts = new ChatOptions(); + if (hasTools) opts.Tools = active.Cast().ToList(); + if (hasMax) opts.MaxOutputTokens = MaxOutputTokens; + return opts; } public int EstimateTokens() => diff --git a/src/Cli/ConsoleHumanApprovalService.cs b/src/Cli/ConsoleHumanApprovalService.cs index 3466e14..9c5e001 100644 --- a/src/Cli/ConsoleHumanApprovalService.cs +++ b/src/Cli/ConsoleHumanApprovalService.cs @@ -10,7 +10,7 @@ public sealed class ConsoleHumanApprovalService : IHumanApprovalService { public Task PromptContinueAsync() { - AnsiConsole.Markup("[dim] ↩ Enter to continue · type a message to redirect · q to stop:[/] "); + AnsiConsole.Markup("[dim] ↩ Enter or y to continue · type a message to redirect · q to stop:[/] "); var input = Console.ReadLine()?.Trim() ?? string.Empty; if (string.IsNullOrEmpty(input)) return Task.FromResult(null); @@ -18,6 +18,13 @@ public sealed class ConsoleHumanApprovalService : IHumanApprovalService input.Equals("quit", StringComparison.OrdinalIgnoreCase)) return Task.FromResult("\x00"); + // Treat affirmative short inputs as "continue" so that a 'y' typed for + // a preceding shell-approval prompt that was consumed before the user could + // respond doesn't accidentally inject "y" as a redirect message. + if (input.Equals("y", StringComparison.OrdinalIgnoreCase) || + input.Equals("yes", StringComparison.OrdinalIgnoreCase)) + return Task.FromResult(null); + return Task.FromResult(input); } @@ -39,8 +46,7 @@ public Task PromptShellCommandAsync(string command) var input = Console.ReadLine()?.Trim() ?? string.Empty; var allowed = input.Equals("y", StringComparison.OrdinalIgnoreCase) || input.Equals("yes", StringComparison.OrdinalIgnoreCase); - if (!allowed) - AnsiConsole.MarkupLine("[dim]Command blocked.[/]"); + AnsiConsole.MarkupLine(allowed ? "[dim]Command allowed.[/]" : "[dim]Command blocked.[/]"); return Task.FromResult(allowed); } diff --git a/src/Infrastructure/Plugins/FileSystemPlugin.cs b/src/Infrastructure/Plugins/FileSystemPlugin.cs index 33bd57e..79c25ba 100644 --- a/src/Infrastructure/Plugins/FileSystemPlugin.cs +++ b/src/Infrastructure/Plugins/FileSystemPlugin.cs @@ -25,6 +25,11 @@ public sealed class FileSystemPlugin : ITurnResettable // content into the model's context. private readonly HashSet _readThisTurn = new(StringComparer.OrdinalIgnoreCase); + // Paths that were successfully patch_file'd this turn. A write_file to any of these + // paths is blocked: the write is derived from the agent's stale mental model, not the + // current disk state, so it would silently clobber the patch that was just applied. + private readonly HashSet _patchedThisTurn = new(StringComparer.OrdinalIgnoreCase); + // Per-turn cumulative read budget (chars). Prevents individual tool calls from // individually respecting the per-call size limit while still collectively flooding // the in-turn context with hundreds of thousands of chars of file content — the @@ -48,6 +53,7 @@ public FileSystemPlugin(string? sandboxRoot = null, int readFileSizeLimit = 20_0 void ITurnResettable.BeginTurn() { _readThisTurn.Clear(); + _patchedThisTurn.Clear(); _readBudgetUsed = 0; } @@ -287,6 +293,9 @@ public async Task PatchFileAsync( // Invalidate the read cache — content has changed. _readThisTurn.Remove(resolved); + // Record that this path was patched so write_file can detect the pattern. + _patchedThisTurn.Add(resolved); + var oldLines = normalOld.Split('\n').Length; var newLines = normalNew.Split('\n').Length; return PluginResult.Ok( @@ -418,6 +427,15 @@ public async Task WriteFileAsync( var denial = ResolveSafe(path, out var resolved); if (denial is not null) return denial; + // Block write_file on a path that was already patch_file'd this turn. The agent's + // full-file content is derived from its pre-patch mental model and would silently + // overwrite the patch that was just applied. + if (_patchedThisTurn.Contains(resolved)) + return PluginResult.Error( + $"WRITE BLOCKED — '{resolved}' was already patched this turn. " + + $"Calling write_file now would overwrite that patch with stale content. " + + $"Use patch_file again for any additional edits."); + // Version conflict check: when baseVersion > 0, reject the write if the current // stored version differs so agents cannot silently overwrite concurrent changes. if (baseVersion > 0 && _versionStore is not null)