test(multi-pod): cross-pod /attach + mock-ai provider scaffolding by viktormarinho · Pull Request #3392 · decocms/studio

viktormarinho · 2026-05-17T18:57:38Z

What is this contribution about?

Builds out the LLM-dependent half of the multi-pod test framework added in #3391. A ~140-LOC Bun mock-ai service speaks the OpenAI streaming wire protocol so mesh's existing openai-compatible adapter can drive streamText against it — no real provider budget needed. Setup helpers gain `wireMockProvider`, `createTestAgent`, and `createTestThread` for the three calls every dispatch scenario needs.

The headline scenario, `attach-cross-pod`, directly validates the deliverPolicy fix from #3387: POST a slowed-down run on pod-1, attach on pods 2 and 3, assert both see the buffered prefix. Passes locally.

The second scenario, `pod-death-dbos-replay`, is fully wired but `.skip`ped — it surfaced a real architectural finding (see below) that needs its own follow-up to fix.

Architectural finding (worth a separate issue)

The pod-death scenario times out at 120s. Two compounding gaps:

DBOS replay alone can't recover the run. The recovery executor re-runs `dispatchRunAndWaitStep` without `isResume: true`, which hits `claimRunStart`'s strict CAS. The dead pod still owns `run_owner_pod`, so every replay attempt fails with `RunClaimError: ... already running on another pod`.
Heartbeat watcher (the path that WOULD handle this via `claimOrphanedRun` + `isResume: true`) never fires. The `KV_POD_HEARTBEATS` JetStream bucket is missing from NATS, even though other JS resources initialize fine. Looks like bucket creation fails silently at `app.ts:867`.

Net: permanent pod death is currently only recovered when the dead pod itself restarts. The PR #3387 architectural claim ("DBOS replay covers what /attach orphan-resume used to do") is incomplete — full repro + diagnostic notes are in the test's docstring.

How to Test

Start Docker Desktop.
`./tests/multi-pod/run.sh` — boots cluster, runs all scenarios, tears down.
Expected: `6 pass / 1 skip / 0 fail` across 5 files (~5s after cluster up).

For iteration: `docker compose -f tests/multi-pod/docker-compose.yml up -d` once, then `bun test tests/multi-pod/scenarios/` repeatedly.

Migration Notes

None. Self-contained under `tests/multi-pod/` and the existing CI workflow picks up the new scenarios automatically.

Review Checklist

PR title is clear and descriptive
Changes are tested and working (6/6 active scenarios pass locally)
Documentation is updated (if needed) — none; per-file docstrings carry the structure
No breaking changes

Summary by cubic

Adds a mock OpenAI-compatible provider and a cross-pod /attach test to run the dispatch pipeline end-to-end across multiple pods without a real LLM, plus reliability improvements and clearer failure surfacing in the test harness.

New Features
- mock-ai service (OpenAI chat-completions streaming) at http://mock-ai:9000/v1; used by openai-compatible.
- Setup helpers: wireMockProvider, createTestAgent, createTestThread; DB helper getThreadRunOwnerPod; compose POD_NAME mapping to target specific pods.
- Scenarios: attach-cross-pod validates buffered-prefix replay across pods; pod-death-dbos-replay remains skipped with an updated note on three gaps (no cross-pod DBOS recovery scan, resume purges JetStream, heartbeat bucket fragility).
Bug Fixes
- mcpCall now prefers non-empty structuredContent and correctly falls back to text payloads.
- dbQuery CLI/docstring synced to --csv; attach-cross-pod header timing corrected to slow:5x500 (~2.5s).
- attach-cross-pod replaces a racy sleep with a JetStream-confirmation probe to deterministically test buffered replay.
- Test hooks auto-restore stopped pods and now fail fast when docker compose ps errors, avoiding misleading waitReady timeouts.

^{Written for commit d0b9e8a. Summary will update on new commits. Review in cubic}

Builds out the LLM-dependent half of the multi-pod framework so we can exercise the decopilot dispatch pipeline end-to-end without burning real provider budget. **Mock-ai service** (tests/multi-pod/mock-ai/) is a ~140-LOC Bun HTTP server that speaks the OpenAI chat-completions wire protocol — enough for mesh's `openai-compatible` adapter to drive `streamText`. Test-time controls (chunk count + delay) come from the user message text because mesh strips request headers before calling the provider; the mock parses "slow:NxMS" / "many:N" hints out of the prompt. **Setup helpers** (lib/setup.ts) gain `wireMockProvider`, `createTestAgent`, `createTestThread` for the three calls every dispatch scenario needs: register an openai-compatible credential pointing at mock-ai, pin the org's "smart" tier to it, create a virtual MCP, pre-create the thread row (without which /attach 404s before the workflow's prepareRun gets to insert it). **DB helper** (lib/db.ts) shells out to `docker compose exec postgres psql` for the rare cases where a scenario needs internal state not surfaced by the public API — currently used to look up the actual dispatch owner from `threads.run_owner_pod` for targeted-kill scenarios. **POD_NAME wired into compose** so `run_owner_pod` matches the compose service name ("mesh-1") and tests can map straight from a thread row to a `docker compose kill` target. **Scenarios:** - `attach-cross-pod` (✅ passing) — the headline test that directly validates the deliverPolicy fix from PR #3387. POSTs on pod-1, attaches on pods 2 and 3, asserts both see the buffered prefix of a slowed-down run. Catches the bug regardless of which pod DBOS picks for dispatch. - `pod-death-dbos-replay` (⚠️ skipped, fully wired) — would validate that SIGKILLing the run-owning pod still completes the run via DBOS replay. Surfaced a real architectural finding: DBOS replay on another pod fails `claimRunStart`'s strict CAS because the dead pod still owns `run_owner_pod`; the heartbeat watcher that DOES handle this (via `claimOrphanedRun`) never fires because the `KV_POD_HEARTBEATS` JetStream bucket is missing — bucket creation appears to fail silently at app.ts:867. Full repro + diagnostic notes in the test's docstring; remove `.skip` to verify once one of those is fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cubic-dev-ai

1 issue found across 7 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tests/multi-pod/scenarios/attach-cross-pod.test.ts">

<violation number="1" location="tests/multi-pod/scenarios/attach-cross-pod.test.ts:121">
P2: Wait longer than the mock’s first chunk delay before attaching, otherwise this scenario can pass before any buffered content exists.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Re-trigger cubic}

Three review-flagged issues: - **dbQuery docstring/spawn cleanup**: psql was invoked with both `-A -F "|"` and `--csv`. `--csv` wins, so output was actually CSV (parser already split on `,`); the docstring's "pipe separator" warning would have misled callers picking column lists. Drop the dead flags and rewrite the warning around comma-safety. - **attach-cross-pod header timing**: header claimed "300ms × 5 chunks ≈ 1.5s", actual hint is `slow:5x500` (≈ 2.5s). Sync the header. - **mcpCall structuredContent short-circuit**: `if (sc) return sc` returned on `{}`, skipping the text-content fallback that some MCP tools rely on. Tighten to require a non-empty object so callers get the real payload (or a deliberate empty echo at the end of the chain) instead of a phantom `{}`. Also revises the pod-death-dbos-replay docstring with the corrected architectural diagnosis: original write-up blamed the heartbeat bucket in isolation; deeper investigation shows three compounding gaps (no cross-pod DBOS recovery scan because `executor_id="local"` for all pods, unconditional `streamBuffer.purge()` on resume wiping survivor buffers, and the heartbeat bucket reliability issue). Test stays `.skip`ped pending the architectural fix. Pulls in two helpers the pod-death scenario was already using: - `lib/db.ts` for direct postgres inspection of `run_owner_pod` - `lib/hooks.ts` auto-restores any pod left stopped by a previous scenario, so kill-style tests don't poison subsequent runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The 500ms sleep between POST and the cross-pod attach was right at the edge of the dispatch chain's typical latency. If DBOS dequeue + prepareRun + streamText init completed in under 500ms (the mock's per-chunk delay), the test's attaches would open BEFORE chunk-1 was published, and the `deliverPolicy: "new"` bug path would still deliver chunk-1 live — the regression we claim to catch wouldn't manifest. Fix: open a throwaway /attach on pod-1 first as a synchronization probe. Only proceed to opening the real test watchers once the probe has actually seen chunk-1 in its stream — at that point chunk-1 is provably already in JetStream, so receiving it from a cross-pod attach requires the `deliverPolicy: "all"` fix to work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cubic-dev-ai

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tests/multi-pod/lib/hooks.ts">

<violation number="1" location="tests/multi-pod/lib/hooks.ts:33">
P2: Check the `docker compose ps -a` exit code here; otherwise compose failures are swallowed and the hook times out later with a less useful error.</violation>
</file>

_{Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic}

Before, a failing `docker compose ps -a` (daemon down, compose file moved, permission issue) silently produced empty output. The hook then treated it as "no pods need restoring" and handed control to waitReady, which would time out 2 minutes later with a misleading "mesh-1 not healthy" — turning the real diagnosis into a needle in a haystack. Now we read stderr too and surface the actual compose error on a non-zero exit code, so the developer sees the root cause immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cubic-dev-ai Bot reviewed May 17, 2026

View reviewed changes

Comment thread tests/multi-pod/scenarios/attach-cross-pod.test.ts Outdated

viktormarinho and others added 2 commits May 17, 2026 19:13

cubic-dev-ai Bot reviewed May 17, 2026

View reviewed changes

Comment thread tests/multi-pod/lib/hooks.ts Outdated

viktormarinho merged commit 3eee70b into main May 17, 2026
12 checks passed

viktormarinho deleted the viktormarinho/multi-pod-llm-scenarios branch May 17, 2026 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(multi-pod): cross-pod /attach + mock-ai provider scaffolding#3392

test(multi-pod): cross-pod /attach + mock-ai provider scaffolding#3392
viktormarinho merged 4 commits into
mainfrom
viktormarinho/multi-pod-llm-scenarios

viktormarinho commented May 17, 2026 •

edited by cubic-dev-ai Bot

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

viktormarinho commented May 17, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is this contribution about?

Architectural finding (worth a separate issue)

How to Test

Migration Notes

Review Checklist

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

viktormarinho commented May 17, 2026 •

edited by cubic-dev-ai Bot

Loading