feat(acp): ACP session protocol — init, lifecycle, modes, loadSession, streaming, tool display, command output by marton78 · Pull Request #116 · dirac-run/dirac

marton78 · 2026-05-31T20:39:09Z

Summary

This branch implements and stabilises the ACP (Agent Communication Protocol) session layer, taking the server from a bare skeleton to a fully functional ACP endpoint. The branch builds on top of fix-tests (PR #115) and contains 11 logical commits.

1. `feat(acp)` — acp-probe CLI script for end-to-end testing

Adds cli/scripts/acp-probe.mts, a standalone ACP client that drives the --acp server without an editor. Speaks JSON-RPC 2.0 over stdin/stdout — same transport path as Zed.

prompt subcommand: session/create → session/prompt, streams updates, exits on stopReason
load-prompt subcommand: loadSession → session/prompt, exercises the session continuation path
--pre-set key=value: apply session config (e.g. mode, apiProvider) before the prompt
--approve-delay-ms: configurable delay before auto-approving permissions
DIRAC_ACP_DEBUG=1: log raw JSON-RPC to stderr

2. `fix(acp)` — init refactor: shared helper, ErrorService, surface I/O errors

Extracts cli/src/initCoreServices.ts called by both ACP and interactive paths. Surfaces init failures (missing API key, bad config path, etc.) as JSON-RPC errors instead of silent EOF. Adds test coverage for the init-failure path.

3. `fix(acp)` — robust turn lifecycle: error paths, error semantics, cancel, retry cleanup

Terminate every error path: unhandled runPromise rejections now resolve the pending prompt via handleUnhandledHandlerError() instead of hanging. Exposes task.runError from core.
Errors as failed tool_call: failures emit tool_call("error") + tool_call_update(status:"failed") instead of looking like model text.
session/cancel → stopReason:"cancelled": as required by the ACP spec.
Retry/terminal-failure state reset at the start of each new turn.

4. `feat(acp)` — session mode picker with per-session Plan/Act/Auto/YOLO overrides

session/setMode RPC lets clients switch modes per session. Auto-approve and YOLO flags are now stored in a per-session SessionOverrides map in StateManager instead of global state — no more cross-session leakage.

5. `fix(acp)` — atomic cancel flag claim and listener leak on task reinit

Cancel atomicity: cancel() claims pending.resolved.value = true synchronously before await cancelTask() so no concurrent path can steal the slot.
Listener leak: diracMessagesChanged listener now captures the task reference once at registration, not via controller.task which can be replaced by cancelTask().

6. `fix(acp)` — file edits apply after permission approval

ACPDiffViewProvider.saveChanges() was never called on approval, so edits were silently discarded. Fixed.

7. `fix(acp)` — loadSession continuity: sessionId as taskId, history replay, user messages, timing fix

sessionId = taskId: tasks are now findable after restart. New acp-session-tasks.ts helper; dead code in backfill.ts / utils.ts removed.
History replay on loadSession: re-emits all prior tool_call / agent_message_chunk events so reconnecting clients can reconstruct state.
User messages in replay: previously skipped.
Timing fix: handleWebviewAskResponse was called before ask("resume_task") fired, causing TaskMessenger.ask() to wipe the pre-set response. Now waits for the diracMessagesChanged event before delivering the prompt.

8. `fix(acp)` — format tool results as markdown

Read tools (readFile, getFunction, etc., newFileCreated): content replaced with * path: [File Hash: xxx] bullets; full source still in rawOutput.
List tools (listFilesTopLevel, etc.): converted to * item bullet lists.
Tool call titles: file paths deduplicated (preserving insertion order).

9. `fix(acp)` — streaming text deduplication

Doubled completion_result text
Delta tracker race producing duplicate streamed chunks
execute_command output leaking through the generic streaming path as plain text
Duplicate followup / plan_mode_respond text in streaming turns

10. `fix(acp)` — multi-command permission: variable shadowing and wrong tool attribution

Permission storm: shadowed command variable caused approval of one command to trigger requests for all. Fixed + refactored ask:command handling in messageTranslator.ts.
Wrong toolCallId on permission requests in multi-tool turns: now correctly attributed per command.

11. `fix(acp)` — surface execute_command output and completion in ACP stream

execute_command mutates ask:command in place (no say:command_output); the translator previously ignored commandCompleted and output entirely. Now emits tool_call_update with aggregated output and terminal status (completed/failed) once done.

Also fixes a Zed rendering issue: the in_progress update no longer includes a placeholder content string when there is no output yet — Zed renders the first content it sees and ignores subsequent updates, so the placeholder was permanently shown.

…ctor Tool-call label changed from "Tool:" to "Tool Call", command and browser_action_launch ask labels changed to "Use tool?", and browser_action moved to the verbose-only fallback path. Update the four affected assertions in display.test.ts to match the current output.

ink v7 causes TerminalInfoProvider to emit �[16t (terminal pixel-size query) during tests, corrupting stdout and breaking assertions. Add vi.mock("ink-picture", ...) to App, App.resize-initial-prompt, ChatView, and QuitCommand tests. Also rewrite QuitCommand's stdin.write-based keyboard simulation to use a useChatInputHandler wrapper mock, which the ink v7 event model requires.

…mock ink v7 no longer delivers stdin events to useInput handlers via ink-testing-library, so all keyboard-interaction tests in SkillsPanelContent.test.tsx were failing. Replace stdin.write() calls with a vi.mock("ink") that captures the registered useInput handler into a ref, then invoke it directly via a pressKey() helper.

…pt text changes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…arameters Previously only added in strict mode, causing the OpenAI no-params tool test to fail. Also switches tools snapshot comparison to deep object equality so JSON property ordering no longer causes spurious failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sage output openRouterDefaultModelInfo now has non-zero pricing, so stub models with zero prices and assert totalCost:0 explicitly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- debug_syntax: add missing messageState stub (syncCache reads conversation history) - RenameSymbol: resolve displayPath to absolute before writing in saveChanges mock, and add showReview/hideReview stubs required by the non-auto-approve path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ialMessageIfExistsWithType The wrapper was always passing the optional third argument explicitly, even when undefined. sinon's calledWithExactly treats f("a","b",undefined) and f("a","b") as different calls, causing SubagentToolHandler partial-streaming tests to fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds cli/scripts/acp-probe.mts, a standalone ACP client that drives the --acp server without an editor, enabling reproducible end-to-end tests of the full ACP session lifecycle. The probe speaks JSON-RPC 2.0 over stdin/stdout, exercising the same transport path as a real ACP client (e.g. Zed). Subcommands: - prompt <text>: session/create → session/prompt, streams all session updates to stdout and exits once a stopReason is received. - load-prompt <sessionId> <text>: loadSession → session/prompt, exercising the session continuation path (resumeTaskFromHistory). Flags: - --pre-set <key=value>: apply session-config entries (e.g. mode, apiProvider) before the first prompt, so tests can exercise different session modes without interactive setup. - --approve-delay-ms <ms>: insert a configurable delay before auto-approving permission requests, useful for testing long-running permission waits. - DIRAC_ACP_DEBUG=1: emit raw JSON-RPC messages to stderr for tracing.

…rors Extracts a new cli/src/initCoreServices.ts helper called by both the ACP path and the interactive CLI path, ensuring ErrorService, workspace paths, and core service registrations are initialised consistently regardless of which mode is active. Also surfaces initialisation I/O failures (missing API key, bad config path, workspace resolution errors, etc.) as structured JSON-RPC error responses instead of crashing the process or hanging on stdin. Clients receive an error object with a descriptive message rather than a silent EOF. Adds DiracAgent.review.test.ts with test coverage for the init-failure path.

…l, retry cleanup Several fixes to the prompt-turn state machine in DiracAgent: 1. Terminate every error path: previously, certain runPromise rejections and unhandled handler errors left the prompt promise permanently unresolved, causing the client to hang. A universal catch handler plus handleUnhandledHandlerError() now resolves the pending prompt with a structured failure on any un-caught exception. Also exposes task.runError from src/core/task/index.ts and src/core/controller/index.ts so the agent can distinguish provider errors from task errors. 2. Render errors as a failed tool_call: instead of emitting error text as an agent_message_chunk (which looks like model output in the UI), failed turns now emit a tool_call("error") + tool_call_update(status: "failed") pair. Clients can distinguish structured failures from real model output. 3. session/cancel responds with stopReason "cancelled": the cancel() handler now resolves the pending prompt with { stopReason: "cancelled" } as required by the ACP spec, rather than leaving the turn unresolved. 4. Clean up retry/terminal-failure state across turns: retry and terminal-failure tracking variables are reset at the start of each new message and turn, preventing stale state from a previous turn bleeding into the next. The relevant tracking fields are added to public-types.ts and the message-handling logic in messageTranslator.ts.

…errides Exposes a session/setMode RPC that clients use to switch between Plan, Act, Auto, and YOLO modes on a per-session basis. - DiracAgent handles a new "setMode" action in the session request handler, mapping it through setSessionOverride() so the mode is stored as a per-session override rather than a global state mutation. - Auto-approve and YOLO flag overrides are now scoped per session via a SessionOverrides map in StateManager. Previously these were written to global state, meaning a YOLO session leaked its approvals to any subsequent session on the same process. Each session now carries its own overrides; cancelling or ending a session clears them.

…reinit Two correctness fixes in DiracAgent: 1. Atomic cancel flag claim: cancel() now claims pending.resolved.value = true synchronously before awaiting cancelTask(), ensuring no concurrent resolution path can steal the slot during the async gap. The handler then calls pending.resolve({ stopReason: "cancelled" }) after the await, so the response is sent once the task is truly stopped. Satisfies the invariant: claim-then-act, never act-then-claim. 2. Listener leak on cancel→reinit: the diracMessagesChanged listener previously closed over controller.task, but cancelTask() can replace controller.task with a freshly initialised task. The stale reference caused the listener to fire against the old task and never subscribe to the new one. The fix captures the task reference once at listener registration time.

ACPDiffViewProvider.saveChanges() was not being called when the user approved a file-edit permission request in the ACP flow. AcpAgent now invokes saveChanges() on the provider after the permission is granted, so edits are actually written to disk instead of being silently discarded. Also removes a stale event-listener registration that was no longer needed.

…y, user messages, timing fix Four related fixes to make loadSession + prompt work correctly after a process restart: 1. sessionId as taskId: after a restart the agent must locate the task by ID. Previously taskId was derived from a controller-internal counter, causing a mismatch on reload. Now the ACP sessionId IS the taskId, routed through a new acp-session-tasks.ts helper and stored in state-keys. Dead code in backfill.ts / utils.ts that worked around the old mismatch is removed. 2. Replay session history on loadSession: AcpAgent.loadSession now calls replayHistory() in DiracAgent to re-emit all prior tool_call / agent_message_chunk events, so a client that reconnects after a restart can reconstruct the full conversation state. 3. Restore user messages in history replay: the replay loop previously skipped "user" role messages, causing gaps in clients that display the full thread. All message types are now emitted. 4. Timing fix for continuation: the previous implementation called handleWebviewAskResponse immediately after reinitExistingTaskFromId, but resumeTaskFromHistory() runs asynchronously — TaskMessenger.ask() clears taskState.askResponse at its start, wiping the pre-set response before pWaitFor sees it and causing the task to hang. The fix subscribes to task.messageStateHandler's diracMessagesChanged event and delivers the user's prompt only once ask("resume_task" | "resume_completed_task") actually fires.

…s, dedup paths Three display fixes so tool results render usefully in clients like Zed: 1. Format read-tool content as hash bullets: instead of passing full file source verbatim into the content field, read tools (readFile, readLineRange, getFunction, getFileSkeleton, newFileCreated) now produce "* path: [File Hash: xxx]" bullets. Full source is still available in rawOutput for programmatic use. 2. Format list-tool content as markdown bullets: directory listings from listFilesTopLevel, listFilesRecursive, and listCodeDefinitionNames are converted to "* item" bullet lists, stripping the header/count lines. 3. Deduplicate file paths in tool call titles: multi-file operations (e.g. readFile with paths=[a, b, a]) previously showed duplicate paths in the title. Paths are now deduplicated while preserving insertion order. Adds test coverage for all three cases.

…ecute_command leakage, duplicate followup Four related streaming/deduplication fixes: 1. Prevent doubled completion_result text: completion_result content was emitted twice — once as an agent_message_chunk and again as a tool_call_update content field. Added a guard that skips re-emitting text already surfaced as a completion_result. 2. Fix streaming text duplication from delta tracker race: two concurrent delta-tracker flushes each emitted the full accumulated text, producing doubled output on the client. The tracker is now reset correctly after each flush. 3. Stop execute_command streaming from leaking as raw text: the command output branch in messageTranslator.ts was inadvertently falling through to the generic streaming path, showing raw command text as plain agent text in the client. The command-output branch now handles its own content and explicitly skips the generic path. 4. Fix duplicate followup/plan_mode_respond text in streaming turns: followup and plan_mode_respond messages appeared twice in streaming turns (once streamed, once as a final agent_message_chunk). DiracAgent now tracks whether text was already streamed and skips the terminal re-emit.

… tool attribution Two bugs in the command-execution permission path: 1. Permission storm from variable shadowing: a let/const shadowing bug in the command handler caused the approval callback to close over the outer `command` binding instead of the inner loop variable, so approving any single command triggered permission requests for all pending commands simultaneously. Fixed by renaming the inner binding. Also refactors the ask:command handling in messageTranslator.ts to correctly distinguish the approval, preview, executing, and completed states. 2. Wrong tool call for permission on multi-command turns: in turns with multiple tool calls the permission request was always attributed to the first toolCallId, not the one actually requesting permission. The handler now tracks and surfaces the correct toolCallId per command. Adds test coverage in acp/index.ts and messageTranslator.test.ts.

execute_command results were invisible in the ACP stream: the tool_call stayed permanently "pending" with no output and no terminal status. Root cause: execute_command does not emit a say:command_output message — it mutates the ask:command message in place, accumulating output in multiCommandState.commands[].output and flipping commandCompleted false→true. The ask:command translator ignored those fields entirely. Fix (messageTranslator.ts): once commandCompleted is defined, emit tool_call_update(s) carrying aggregated output and a terminal status (in_progress → completed/failed). New helpers formatMultiCommandOutput and buildCommandExecutionUpdates produce the content path plus terminal_output/terminal_exit metadata for terminal-capable clients. A pendingToolCalls status guard prevents a late update from regressing a completed command back to in_progress. Also fixes a Zed rendering issue: the initial in_progress update no longer includes a placeholder content string for commands with no output yet. Only include content when output is non-empty; Zed renders the first content it sees and ignores subsequent updates, so the placeholder was permanently shown instead of the actual output. Adds 5 unit tests for the command-output path in messageTranslator.test.ts.

marton78 · 2026-05-31T20:40:27Z

@pleibers I spent the few days testing ACP in Zed, this branch makes the ACP connection pretty much functional!

pleibers · 2026-06-01T09:24:55Z

Amazing!
Thank you

dirac-run · 2026-06-05T11:50:27Z

tried to merge this but it conflicts with a just PR #114

also, a lot of this will change fundamentally after the modular tooling changes are pushed today. I will pin this for now since the change is quite large. is there any way to redo some of those after pulling the latest changes? if not, i will do another pass in the new master and try to absorb these later

pleibers · 2026-06-05T13:25:39Z

I think this PR does it better than what I did, since it makes the ACP fully functional and not just polish what was there. So if possible prefer this over #114.

dirac-run · 2026-06-05T13:31:41Z

thanks! okay so I just finished with the merge conflicts from other PRs. I think what i will do is, just update the master first and then separately work on absorbing this PR. it is valuable i agree

marton78 and others added 19 commits May 30, 2026 15:06

test(snapshots): update system prompt snapshots for OS/shell and prom…

6ada44e

…pt text changes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(providers): fix usage chunk tests after totalCost was added to u…

c639367

…sage output openRouterDefaultModelInfo now has non-zero pricing, so stub models with zero prices and assert totalCost:0 explicitly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

pleibers mentioned this pull request Jun 1, 2026

Polish ACP #114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(acp): ACP session protocol — init, lifecycle, modes, loadSession, streaming, tool display, command output#116

feat(acp): ACP session protocol — init, lifecycle, modes, loadSession, streaming, tool display, command output#116
marton78 wants to merge 19 commits into
dirac-run:masterfrom
marton78:acp-fixes

marton78 commented May 31, 2026

Uh oh!

marton78 commented May 31, 2026

Uh oh!

pleibers commented Jun 1, 2026

Uh oh!

dirac-run commented Jun 5, 2026

Uh oh!

pleibers commented Jun 5, 2026

Uh oh!

dirac-run commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

marton78 commented May 31, 2026

Summary

1. feat(acp) — acp-probe CLI script for end-to-end testing

2. fix(acp) — init refactor: shared helper, ErrorService, surface I/O errors

3. fix(acp) — robust turn lifecycle: error paths, error semantics, cancel, retry cleanup

4. feat(acp) — session mode picker with per-session Plan/Act/Auto/YOLO overrides

5. fix(acp) — atomic cancel flag claim and listener leak on task reinit

6. fix(acp) — file edits apply after permission approval

7. fix(acp) — loadSession continuity: sessionId as taskId, history replay, user messages, timing fix

8. fix(acp) — format tool results as markdown

9. fix(acp) — streaming text deduplication

10. fix(acp) — multi-command permission: variable shadowing and wrong tool attribution

11. fix(acp) — surface execute_command output and completion in ACP stream

Uh oh!

marton78 commented May 31, 2026

Uh oh!

pleibers commented Jun 1, 2026

Uh oh!

dirac-run commented Jun 5, 2026

Uh oh!

pleibers commented Jun 5, 2026

Uh oh!

dirac-run commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. `feat(acp)` — acp-probe CLI script for end-to-end testing

2. `fix(acp)` — init refactor: shared helper, ErrorService, surface I/O errors

3. `fix(acp)` — robust turn lifecycle: error paths, error semantics, cancel, retry cleanup

4. `feat(acp)` — session mode picker with per-session Plan/Act/Auto/YOLO overrides

5. `fix(acp)` — atomic cancel flag claim and listener leak on task reinit

6. `fix(acp)` — file edits apply after permission approval

7. `fix(acp)` — loadSession continuity: sessionId as taskId, history replay, user messages, timing fix

8. `fix(acp)` — format tool results as markdown

9. `fix(acp)` — streaming text deduplication

10. `fix(acp)` — multi-command permission: variable shadowing and wrong tool attribution

11. `fix(acp)` — surface execute_command output and completion in ACP stream