Skip to content

refactor(harnesses): extract Harness interface + thread store/events refactor#3381

Open
tlgimenes wants to merge 32 commits into
mainfrom
tlgimenes/harness-shared-interface
Open

refactor(harnesses): extract Harness interface + thread store/events refactor#3381
tlgimenes wants to merge 32 commits into
mainfrom
tlgimenes/harness-shared-interface

Conversation

@tlgimenes
Copy link
Copy Markdown
Contributor

@tlgimenes tlgimenes commented May 15, 2026

Summary

Re-opens the work from #3358 (accidentally merged then reverted) plus the follow-up thread-store and chat refactors.

  • Harness interface — introduces a shared Harness abstraction with factory-based registry and localDispatch; migrates decopilot, claude-code, and codex into the new harnesses layout (system-prompt, built-in-tools, agents/prompts/connections blocks, title-generator, tool/prompt assembly, streamText loop). stream-core now dispatches through the harness registry.
  • Threads module — new thread-filter, thread-store, thread-actions, useThreads, useThreadActions, and ThreadEventsBridge SSE patcher. useTaskManager collapses to a thin facade; useTasksAutoRefresh/useTasks/updateTaskInCache removed.
  • ThreadChatStore — new client-side store: snapshot/subscribe, optimistic sendMessage, sub-stream demuxer with finish promotion, SSE finish backstop, continuation seeding on start.messageId match, tool output/approval handlers + auto-send, lifecycle (connect/disconnect/stop) with reconnect discard. useThreadChat collapses to a thin wrapper.
  • Chat fixes — thread toolApprovalLevel through the approval continuation cascade; patch server-snapshot assistant when opening mid requires_action.

Test plan

  • bun run check
  • bun run lint
  • bun test
  • Smoke: open a chat mid requires_action and confirm the assistant snapshot patches correctly
  • Smoke: tool-approval level persists through the accept/continuation cascade
  • Smoke: tasks panel + thread list still update via ThreadEventsBridge

🤖 Generated with Claude Code


Summary by cubic

Introduces a shared Harness interface and registry with localDispatch, moving decopilot, claude-code, and codex into modular harnesses. Rebuilds thread/chat flow with a single thread store and SSE bridge, and fixes tool-approval continuation and mid-requires_action open cases.

  • New Features

    • harnesses: factory-based registry, localDispatch(id, input, ctx), shared types and usage accumulator.
    • decopilot harness: extracted system-prompt, built-in tools, prompt assembly, and streamText loop.
    • CLI harnesses: claude-code and codex wrappers with UI→model message conversion, resume (Claude) and safe provider cleanup.
    • language-model factory: builds models from provider + ModelInfo, auto-enables reasoning when supported.
    • Threads: ThreadEventsBridge SSE patcher and useThreads + useThreadActions for fast, optimistic updates.
    • Chat: ThreadChatStore with snapshot/subscribe, optimistic sendMessage, sub-stream demux, continuation seeding, and lifecycle management.
    • Models: Codex list updated (adds codex:gpt-5.5, revises tiers).
  • Migration

    • Replace useTasks/useTasksAutoRefresh with useThreads and ThreadEventsBridge (mounted in org shell).
    • Use useThreadActions for thread CRUD; legacy updateTaskInCache and related hooks are removed.
    • Switch query keys to KEYS.threads(...)/KEYS.threadsPrefix(...); filter dimensions are client-side only.

Written for commit 84239e5. Summary will update on new commits. Review in cubic

tlgimenes and others added 30 commits May 15, 2026 12:31
…ecopilot/claude-code/codex

Establishes a Harness abstraction with factory-based registry and localDispatch.
Migrates decopilot, claude-code, and codex into the new harnesses layout
(system-prompt, built-in-tools, agents/prompts/connections blocks, title-generator,
tool/prompt assembly, streamText loop). stream-core now dispatches through the
harness registry.

Includes:
- CLI harness fixes (UIMessage->ModelMessage conversion, usage tracking, thread_id emission)
- Shared stream-error helpers
- AI SDK system-in-messages warning suppression
- Codex tier defaults and model list sync
- Sandbox image corepack retry
- Various dependency bumps (React, AI SDK, OpenRouter, Anthropic, MCP SDK)

Squashed from 11 commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cade

Two coupled changes that together preserve the user's tool-approval-level
selection through the synchronous accept/continuation cascade:

* highlight/approval.tsx + highlight/index.tsx: pass the freshly-selected
  approval level explicitly when the yolo dropdown triggers Accept All.
  setPreferences hasn't propagated to React state (or, in mutation-timing
  edge cases, to localStorage) by the time the cascade fires this same
  tick, so the explicit argument carries the value the user just chose.
* chat-context.tsx: trust requestMetadata.toolApprovalLevel when supplied;
  fall back to localStorage only for legacy callers.
* use-thread-chat.ts: add continuation seeding for tool output / approval
  responses. When the backend resumes the same assistant message after
  addToolOutput / addToolApprovalResponse, the next sub-stream's start
  chunk carries the existing messageId — promote the local assistant
  into streamingStore so readUIMessageStream seeds from it instead of
  throwing "No tool invocation found" on the tool-output-available chunk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…egacy KEYS.tasks

Migrate 4 useTaskActions callers to useThreadActions (absorbs the full low-level
CRUD surface). Replace tasksPrefix cache invalidations with targeted prepends
(for creates) or bridge-handles comments (for updates). Remove legacy tasks() and
tasksPrefix() keys from query-keys.ts. Delete use-tasks.ts hook. Update test
assertions to use threadsPrefix. Fix unused variable in use-task-expanded-tools.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tlgimenes and others added 2 commits May 15, 2026 12:31
…tStore

The hook now owns only React-level concerns:
- useRef for latest-handler closures
- useState for the stable ThreadChatStore instance
- one useEffect for connect/disconnect lifecycle
- useSyncExternalStore for snapshot subscription
- mergeWithServer in React because it depends on the initialMessages prop

All imperative state — SSE persistent loop, sub-stream demuxer, SSE-finish
backstop counter, continuation seeding, optimistic message buffer — moves
into ThreadChatStore (apps/mesh/src/web/components/chat/hooks/thread-chat-store.ts)
where it is unit-testable without a DOM.

The single useEffect is justified for connect/disconnect lifecycle (the
imperative connection cannot be modeled as derived state). Suppressed
ban-use-effect via oxlint-disable-next-line.

File shrinks from 843 to 88 lines. Public hook signature unchanged;
chat-context.tsx consumer typechecks without modification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coupled fixes for the case where a thread is opened while a
`requires_action` is already pending (no in-flight streaming, no local
assistant — the message we need to patch lives only in initialMessages
delivered by useTaskMessages).

* ThreadChatStore now accepts an optional `initialMessagesRef` and reads
  it as a fallback in patchLastAssistant: if streaming is empty and
  local has no trailing assistant, promote a patched copy of the server
  snapshot's last assistant into local. The store remains prop-agnostic
  — initialMessagesRef is just a live read, not a coupling.
* useThreadChat passes the ref and rewrites the render-time merge so a
  streaming message that shares an id with the server snapshot replaces
  the matching entry in place instead of duplicating it. The previous
  code appended streaming unconditionally, which produced "Encountered
  two children with the same key" once T13 made the continuation path
  reachable.

Verified end-to-end in the browser: opening a thread with 3 pending
approvals, clicking Accept All, the backend resumes the run, tool calls
execute, and the final report renders with zero React warnings and zero
console errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Benchmark

Should we run the Virtual MCP strategy benchmark for this PR?

React with 👍 to run the benchmark.

Reaction Action
👍 Run quick benchmark (10 & 128 tools)

Benchmark will run on the next push after you react.

@github-actions
Copy link
Copy Markdown
Contributor

Release Options

Suggested: Patch (2.327.6) — based on refactor: prefix

React with an emoji to override the release type:

Reaction Type Next Version
👍 Prerelease 2.327.6-alpha.1
🎉 Patch 2.327.6
❤️ Minor 2.328.0
🚀 Major 3.0.0

Current version: 2.327.5

Note: If multiple reactions exist, the smallest bump wins. If no reactions, the suggested bump is used (default: patch).

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 102 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/mesh/src/harnesses/decopilot/stream-error.ts">

<violation number="1" location="apps/mesh/src/harnesses/decopilot/stream-error.ts:29">
P2: When all sentences are filtered, sanitization falls back to the original message and leaks provider-specific content.</violation>

<violation number="2" location="apps/mesh/src/harnesses/decopilot/stream-error.ts:57">
P2: Non-credit errors are returned unsanitized, which can leak provider-specific details despite this helper’s sanitization contract.</violation>
</file>

<file name="apps/mesh/src/web/hooks/use-ensure-task.ts">

<violation number="1" location="apps/mesh/src/web/hooks/use-ensure-task.ts:102">
P2: Avoid unconditionally prepending `createdRow` to threads caches; add an `id` dedupe guard to prevent duplicate thread entries.</violation>
</file>

<file name="apps/mesh/src/web/components/chat/task/thread-events.tsx">

<violation number="1" location="apps/mesh/src/web/components/chat/task/thread-events.tsx:33">
P1: Avoid inserting synthetic rows for unknown thread IDs when patching all thread scopes; this can leak a thread into unrelated agent-scoped caches.</violation>

<violation number="2" location="apps/mesh/src/web/components/chat/task/thread-events.tsx:59">
P2: Re-sort the cached thread list after updating `updated_at`; otherwise recent threads can stay out of order.</violation>
</file>

<file name="apps/mesh/src/web/components/chat/task/thread-store.ts">

<violation number="1" location="apps/mesh/src/web/components/chat/task/thread-store.ts:45">
P2: Include `status` in the React Query key; otherwise open and archived thread lists collide in cache and can show stale/wrong data.</violation>
</file>

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.
Re-trigger cubic

if (!data) return data;
const idx = data.items.findIndex((t) => t.id === patch.id);

if (idx === -1) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Avoid inserting synthetic rows for unknown thread IDs when patching all thread scopes; this can leak a thread into unrelated agent-scoped caches.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/mesh/src/web/components/chat/task/thread-events.tsx, line 33:

<comment>Avoid inserting synthetic rows for unknown thread IDs when patching all thread scopes; this can leak a thread into unrelated agent-scoped caches.</comment>

<file context>
@@ -0,0 +1,87 @@
+  if (!data) return data;
+  const idx = data.items.findIndex((t) => t.id === patch.id);
+
+  if (idx === -1) {
+    // Thread not in this cache yet (created in another tab, or never
+    // loaded). Insert at the top with a minimal synthetic row; the next
</file context>

const cleaned = sentences.filter(
(s) => !/https?:\/\//i.test(s) && !/openrouter/i.test(s),
);
if (cleaned.length === 0) return message;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: When all sentences are filtered, sanitization falls back to the original message and leaks provider-specific content.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/mesh/src/harnesses/decopilot/stream-error.ts, line 29:

<comment>When all sentences are filtered, sanitization falls back to the original message and leaks provider-specific content.</comment>

<file context>
@@ -0,0 +1,60 @@
+  const cleaned = sentences.filter(
+    (s) => !/https?:\/\//i.test(s) && !/openrouter/i.test(s),
+  );
+  if (cleaned.length === 0) return message;
+  const result = cleaned.join(". ").trim();
+  return result.endsWith(".") ? result : `${result}.`;
</file context>

// without fragile string matching on provider-specific messages.
return `[CREDITS] ${stripProviderSpecificDetails(error.message)}`;
}
return error.message;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Non-credit errors are returned unsanitized, which can leak provider-specific details despite this helper’s sanitization contract.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/mesh/src/harnesses/decopilot/stream-error.ts, line 57:

<comment>Non-credit errors are returned unsanitized, which can leak provider-specific details despite this helper’s sanitization contract.</comment>

<file context>
@@ -0,0 +1,60 @@
+      // without fragile string matching on provider-specific messages.
+      return `[CREDITS] ${stripProviderSpecificDetails(error.message)}`;
+    }
+    return error.message;
+  }
+  return stringifyError(error);
</file context>

if (!cached) return cached;
return {
...cached,
items: [createdRow, ...cached.items],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Avoid unconditionally prepending createdRow to threads caches; add an id dedupe guard to prevent duplicate thread entries.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/mesh/src/web/hooks/use-ensure-task.ts, line 102:

<comment>Avoid unconditionally prepending `createdRow` to threads caches; add an `id` dedupe guard to prevent duplicate thread entries.</comment>

<file context>
@@ -83,20 +83,26 @@ export function useEnsureTask(id: string, virtualMcpId: string): State {
+          if (!cached) return cached;
+          return {
+            ...cached,
+            items: [createdRow, ...cached.items],
+          };
+        },
</file context>

Comment on lines +59 to +60
items[idx] = next;
return { ...data, items };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Re-sort the cached thread list after updating updated_at; otherwise recent threads can stay out of order.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/mesh/src/web/components/chat/task/thread-events.tsx, line 59:

<comment>Re-sort the cached thread list after updating `updated_at`; otherwise recent threads can stay out of order.</comment>

<file context>
@@ -0,0 +1,87 @@
+    branch: "branch" in patch ? (patch.branch ?? null) : current.branch,
+  };
+  const items = [...data.items];
+  items[idx] = next;
+  return { ...data, items };
+}
</file context>
Suggested change
items[idx] = next;
return { ...data, items };
items[idx] = next;
items.sort((a, b) => b.updated_at.localeCompare(a.updated_at));
return { ...data, items };

});

const { data, refetch } = useSuspenseQuery<TasksQueryData>({
queryKey: KEYS.threads(locator, scope),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Include status in the React Query key; otherwise open and archived thread lists collide in cache and can show stale/wrong data.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/mesh/src/web/components/chat/task/thread-store.ts, line 45:

<comment>Include `status` in the React Query key; otherwise open and archived thread lists collide in cache and can show stale/wrong data.</comment>

<file context>
@@ -0,0 +1,86 @@
+  });
+
+  const { data, refetch } = useSuspenseQuery<TasksQueryData>({
+    queryKey: KEYS.threads(locator, scope),
+    queryFn: async () => {
+      if (!client) throw new Error("MCP client is not available");
</file context>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant