feat(acp): ACP session protocol — init, lifecycle, modes, loadSession, streaming, tool display, command output#116
feat(acp): ACP session protocol — init, lifecycle, modes, loadSession, streaming, tool display, command output#116marton78 wants to merge 19 commits into
Conversation
…ctor Tool-call label changed from "Tool:" to "Tool Call", command and browser_action_launch ask labels changed to "Use tool?", and browser_action moved to the verbose-only fallback path. Update the four affected assertions in display.test.ts to match the current output.
ink v7 causes TerminalInfoProvider to emit �[16t (terminal pixel-size
query) during tests, corrupting stdout and breaking assertions. Add
vi.mock("ink-picture", ...) to App, App.resize-initial-prompt, ChatView,
and QuitCommand tests. Also rewrite QuitCommand's stdin.write-based
keyboard simulation to use a useChatInputHandler wrapper mock, which the
ink v7 event model requires.
…mock
ink v7 no longer delivers stdin events to useInput handlers via
ink-testing-library, so all keyboard-interaction tests in
SkillsPanelContent.test.tsx were failing. Replace stdin.write() calls with
a vi.mock("ink") that captures the registered useInput handler into a ref,
then invoke it directly via a pressKey() helper.
…pt text changes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…arameters Previously only added in strict mode, causing the OpenAI no-params tool test to fail. Also switches tools snapshot comparison to deep object equality so JSON property ordering no longer causes spurious failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sage output openRouterDefaultModelInfo now has non-zero pricing, so stub models with zero prices and assert totalCost:0 explicitly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- debug_syntax: add missing messageState stub (syncCache reads conversation history) - RenameSymbol: resolve displayPath to absolute before writing in saveChanges mock, and add showReview/hideReview stubs required by the non-auto-approve path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ialMessageIfExistsWithType
The wrapper was always passing the optional third argument explicitly, even
when undefined. sinon's calledWithExactly treats f("a","b",undefined) and
f("a","b") as different calls, causing SubagentToolHandler partial-streaming
tests to fail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds cli/scripts/acp-probe.mts, a standalone ACP client that drives the --acp server without an editor, enabling reproducible end-to-end tests of the full ACP session lifecycle. The probe speaks JSON-RPC 2.0 over stdin/stdout, exercising the same transport path as a real ACP client (e.g. Zed). Subcommands: - prompt <text>: session/create → session/prompt, streams all session updates to stdout and exits once a stopReason is received. - load-prompt <sessionId> <text>: loadSession → session/prompt, exercising the session continuation path (resumeTaskFromHistory). Flags: - --pre-set <key=value>: apply session-config entries (e.g. mode, apiProvider) before the first prompt, so tests can exercise different session modes without interactive setup. - --approve-delay-ms <ms>: insert a configurable delay before auto-approving permission requests, useful for testing long-running permission waits. - DIRAC_ACP_DEBUG=1: emit raw JSON-RPC messages to stderr for tracing.
…rors Extracts a new cli/src/initCoreServices.ts helper called by both the ACP path and the interactive CLI path, ensuring ErrorService, workspace paths, and core service registrations are initialised consistently regardless of which mode is active. Also surfaces initialisation I/O failures (missing API key, bad config path, workspace resolution errors, etc.) as structured JSON-RPC error responses instead of crashing the process or hanging on stdin. Clients receive an error object with a descriptive message rather than a silent EOF. Adds DiracAgent.review.test.ts with test coverage for the init-failure path.
…l, retry cleanup
Several fixes to the prompt-turn state machine in DiracAgent:
1. Terminate every error path: previously, certain runPromise rejections
and unhandled handler errors left the prompt promise permanently
unresolved, causing the client to hang. A universal catch handler plus
handleUnhandledHandlerError() now resolves the pending prompt with a
structured failure on any un-caught exception. Also exposes
task.runError from src/core/task/index.ts and
src/core/controller/index.ts so the agent can distinguish provider
errors from task errors.
2. Render errors as a failed tool_call: instead of emitting error text as
an agent_message_chunk (which looks like model output in the UI),
failed turns now emit a tool_call("error") + tool_call_update(status:
"failed") pair. Clients can distinguish structured failures from real
model output.
3. session/cancel responds with stopReason "cancelled": the cancel()
handler now resolves the pending prompt with { stopReason: "cancelled" }
as required by the ACP spec, rather than leaving the turn unresolved.
4. Clean up retry/terminal-failure state across turns: retry and
terminal-failure tracking variables are reset at the start of each new
message and turn, preventing stale state from a previous turn bleeding
into the next. The relevant tracking fields are added to public-types.ts
and the message-handling logic in messageTranslator.ts.
…errides Exposes a session/setMode RPC that clients use to switch between Plan, Act, Auto, and YOLO modes on a per-session basis. - DiracAgent handles a new "setMode" action in the session request handler, mapping it through setSessionOverride() so the mode is stored as a per-session override rather than a global state mutation. - Auto-approve and YOLO flag overrides are now scoped per session via a SessionOverrides map in StateManager. Previously these were written to global state, meaning a YOLO session leaked its approvals to any subsequent session on the same process. Each session now carries its own overrides; cancelling or ending a session clears them.
…reinit
Two correctness fixes in DiracAgent:
1. Atomic cancel flag claim: cancel() now claims pending.resolved.value =
true synchronously before awaiting cancelTask(), ensuring no concurrent
resolution path can steal the slot during the async gap. The handler
then calls pending.resolve({ stopReason: "cancelled" }) after the await,
so the response is sent once the task is truly stopped. Satisfies the
invariant: claim-then-act, never act-then-claim.
2. Listener leak on cancel→reinit: the diracMessagesChanged listener
previously closed over controller.task, but cancelTask() can replace
controller.task with a freshly initialised task. The stale reference
caused the listener to fire against the old task and never subscribe to
the new one. The fix captures the task reference once at listener
registration time.
ACPDiffViewProvider.saveChanges() was not being called when the user approved a file-edit permission request in the ACP flow. AcpAgent now invokes saveChanges() on the provider after the permission is granted, so edits are actually written to disk instead of being silently discarded. Also removes a stale event-listener registration that was no longer needed.
…y, user messages, timing fix
Four related fixes to make loadSession + prompt work correctly after a
process restart:
1. sessionId as taskId: after a restart the agent must locate the task by
ID. Previously taskId was derived from a controller-internal counter,
causing a mismatch on reload. Now the ACP sessionId IS the taskId,
routed through a new acp-session-tasks.ts helper and stored in
state-keys. Dead code in backfill.ts / utils.ts that worked around the
old mismatch is removed.
2. Replay session history on loadSession: AcpAgent.loadSession now calls
replayHistory() in DiracAgent to re-emit all prior tool_call /
agent_message_chunk events, so a client that reconnects after a restart
can reconstruct the full conversation state.
3. Restore user messages in history replay: the replay loop previously
skipped "user" role messages, causing gaps in clients that display the
full thread. All message types are now emitted.
4. Timing fix for continuation: the previous implementation called
handleWebviewAskResponse immediately after reinitExistingTaskFromId,
but resumeTaskFromHistory() runs asynchronously — TaskMessenger.ask()
clears taskState.askResponse at its start, wiping the pre-set response
before pWaitFor sees it and causing the task to hang. The fix subscribes
to task.messageStateHandler's diracMessagesChanged event and delivers
the user's prompt only once ask("resume_task" | "resume_completed_task")
actually fires.
…s, dedup paths Three display fixes so tool results render usefully in clients like Zed: 1. Format read-tool content as hash bullets: instead of passing full file source verbatim into the content field, read tools (readFile, readLineRange, getFunction, getFileSkeleton, newFileCreated) now produce "* path: [File Hash: xxx]" bullets. Full source is still available in rawOutput for programmatic use. 2. Format list-tool content as markdown bullets: directory listings from listFilesTopLevel, listFilesRecursive, and listCodeDefinitionNames are converted to "* item" bullet lists, stripping the header/count lines. 3. Deduplicate file paths in tool call titles: multi-file operations (e.g. readFile with paths=[a, b, a]) previously showed duplicate paths in the title. Paths are now deduplicated while preserving insertion order. Adds test coverage for all three cases.
…ecute_command leakage, duplicate followup Four related streaming/deduplication fixes: 1. Prevent doubled completion_result text: completion_result content was emitted twice — once as an agent_message_chunk and again as a tool_call_update content field. Added a guard that skips re-emitting text already surfaced as a completion_result. 2. Fix streaming text duplication from delta tracker race: two concurrent delta-tracker flushes each emitted the full accumulated text, producing doubled output on the client. The tracker is now reset correctly after each flush. 3. Stop execute_command streaming from leaking as raw text: the command output branch in messageTranslator.ts was inadvertently falling through to the generic streaming path, showing raw command text as plain agent text in the client. The command-output branch now handles its own content and explicitly skips the generic path. 4. Fix duplicate followup/plan_mode_respond text in streaming turns: followup and plan_mode_respond messages appeared twice in streaming turns (once streamed, once as a final agent_message_chunk). DiracAgent now tracks whether text was already streamed and skips the terminal re-emit.
… tool attribution Two bugs in the command-execution permission path: 1. Permission storm from variable shadowing: a let/const shadowing bug in the command handler caused the approval callback to close over the outer `command` binding instead of the inner loop variable, so approving any single command triggered permission requests for all pending commands simultaneously. Fixed by renaming the inner binding. Also refactors the ask:command handling in messageTranslator.ts to correctly distinguish the approval, preview, executing, and completed states. 2. Wrong tool call for permission on multi-command turns: in turns with multiple tool calls the permission request was always attributed to the first toolCallId, not the one actually requesting permission. The handler now tracks and surfaces the correct toolCallId per command. Adds test coverage in acp/index.ts and messageTranslator.test.ts.
execute_command results were invisible in the ACP stream: the tool_call stayed permanently "pending" with no output and no terminal status. Root cause: execute_command does not emit a say:command_output message — it mutates the ask:command message in place, accumulating output in multiCommandState.commands[].output and flipping commandCompleted false→true. The ask:command translator ignored those fields entirely. Fix (messageTranslator.ts): once commandCompleted is defined, emit tool_call_update(s) carrying aggregated output and a terminal status (in_progress → completed/failed). New helpers formatMultiCommandOutput and buildCommandExecutionUpdates produce the content path plus terminal_output/terminal_exit metadata for terminal-capable clients. A pendingToolCalls status guard prevents a late update from regressing a completed command back to in_progress. Also fixes a Zed rendering issue: the initial in_progress update no longer includes a placeholder content string for commands with no output yet. Only include content when output is non-empty; Zed renders the first content it sees and ignores subsequent updates, so the placeholder was permanently shown instead of the actual output. Adds 5 unit tests for the command-output path in messageTranslator.test.ts.
|
@pleibers I spent the few days testing ACP in Zed, this branch makes the ACP connection pretty much functional! |
|
Amazing! |
|
tried to merge this but it conflicts with a just PR #114 also, a lot of this will change fundamentally after the modular tooling changes are pushed today. I will pin this for now since the change is quite large. is there any way to redo some of those after pulling the latest changes? if not, i will do another pass in the new master and try to absorb these later |
|
I think this PR does it better than what I did, since it makes the ACP fully functional and not just polish what was there. So if possible prefer this over #114. |
|
thanks! okay so I just finished with the merge conflicts from other PRs. I think what i will do is, just update the master first and then separately work on absorbing this PR. it is valuable i agree |
Summary
This branch implements and stabilises the ACP (Agent Communication Protocol) session layer, taking the server from a bare skeleton to a fully functional ACP endpoint. The branch builds on top of
fix-tests(PR #115) and contains 11 logical commits.1.
feat(acp)— acp-probe CLI script for end-to-end testingAdds
cli/scripts/acp-probe.mts, a standalone ACP client that drives the--acpserver without an editor. Speaks JSON-RPC 2.0 over stdin/stdout — same transport path as Zed.promptsubcommand:session/create → session/prompt, streams updates, exits onstopReasonload-promptsubcommand:loadSession → session/prompt, exercises the session continuation path--pre-set key=value: apply session config (e.g.mode,apiProvider) before the prompt--approve-delay-ms: configurable delay before auto-approving permissionsDIRAC_ACP_DEBUG=1: log raw JSON-RPC to stderr2.
fix(acp)— init refactor: shared helper, ErrorService, surface I/O errorsExtracts
cli/src/initCoreServices.tscalled by both ACP and interactive paths. Surfaces init failures (missing API key, bad config path, etc.) as JSON-RPC errors instead of silent EOF. Adds test coverage for the init-failure path.3.
fix(acp)— robust turn lifecycle: error paths, error semantics, cancel, retry cleanuprunPromiserejections now resolve the pending prompt viahandleUnhandledHandlerError()instead of hanging. Exposestask.runErrorfrom core.tool_call: failures emittool_call("error") + tool_call_update(status:"failed")instead of looking like model text.session/cancel→stopReason:"cancelled": as required by the ACP spec.4.
feat(acp)— session mode picker with per-session Plan/Act/Auto/YOLO overridessession/setModeRPC lets clients switch modes per session. Auto-approve and YOLO flags are now stored in a per-sessionSessionOverridesmap inStateManagerinstead of global state — no more cross-session leakage.5.
fix(acp)— atomic cancel flag claim and listener leak on task reinitcancel()claimspending.resolved.value = truesynchronously beforeawait cancelTask()so no concurrent path can steal the slot.diracMessagesChangedlistener now captures the task reference once at registration, not viacontroller.taskwhich can be replaced bycancelTask().6.
fix(acp)— file edits apply after permission approvalACPDiffViewProvider.saveChanges()was never called on approval, so edits were silently discarded. Fixed.7.
fix(acp)— loadSession continuity: sessionId as taskId, history replay, user messages, timing fixacp-session-tasks.tshelper; dead code inbackfill.ts/utils.tsremoved.tool_call/agent_message_chunkevents so reconnecting clients can reconstruct state.handleWebviewAskResponsewas called beforeask("resume_task")fired, causingTaskMessenger.ask()to wipe the pre-set response. Now waits for thediracMessagesChangedevent before delivering the prompt.8.
fix(acp)— format tool results as markdownreadFile,getFunction, etc.,newFileCreated): content replaced with* path: [File Hash: xxx]bullets; full source still inrawOutput.listFilesTopLevel, etc.): converted to* itembullet lists.9.
fix(acp)— streaming text deduplicationcompletion_resulttextexecute_commandoutput leaking through the generic streaming path as plain textfollowup/plan_mode_respondtext in streaming turns10.
fix(acp)— multi-command permission: variable shadowing and wrong tool attributioncommandvariable caused approval of one command to trigger requests for all. Fixed + refactoredask:commandhandling inmessageTranslator.ts.toolCallIdon permission requests in multi-tool turns: now correctly attributed per command.11.
fix(acp)— surface execute_command output and completion in ACP streamexecute_commandmutatesask:commandin place (nosay:command_output); the translator previously ignoredcommandCompletedandoutputentirely. Now emitstool_call_updatewith aggregated output and terminal status (completed/failed) once done.Also fixes a Zed rendering issue: the
in_progressupdate no longer includes a placeholder content string when there is no output yet — Zed renders the first content it sees and ignores subsequent updates, so the placeholder was permanently shown.