Skip to content

test(multi-pod): cross-pod /attach + mock-ai provider scaffolding#3392

Merged
viktormarinho merged 4 commits into
mainfrom
viktormarinho/multi-pod-llm-scenarios
May 17, 2026
Merged

test(multi-pod): cross-pod /attach + mock-ai provider scaffolding#3392
viktormarinho merged 4 commits into
mainfrom
viktormarinho/multi-pod-llm-scenarios

Conversation

@viktormarinho
Copy link
Copy Markdown
Contributor

@viktormarinho viktormarinho commented May 17, 2026

What is this contribution about?

Builds out the LLM-dependent half of the multi-pod test framework added in #3391. A ~140-LOC Bun mock-ai service speaks the OpenAI streaming wire protocol so mesh's existing openai-compatible adapter can drive streamText against it — no real provider budget needed. Setup helpers gain `wireMockProvider`, `createTestAgent`, and `createTestThread` for the three calls every dispatch scenario needs.

The headline scenario, `attach-cross-pod`, directly validates the deliverPolicy fix from #3387: POST a slowed-down run on pod-1, attach on pods 2 and 3, assert both see the buffered prefix. Passes locally.

The second scenario, `pod-death-dbos-replay`, is fully wired but `.skip`ped — it surfaced a real architectural finding (see below) that needs its own follow-up to fix.

Architectural finding (worth a separate issue)

The pod-death scenario times out at 120s. Two compounding gaps:

  1. DBOS replay alone can't recover the run. The recovery executor re-runs `dispatchRunAndWaitStep` without `isResume: true`, which hits `claimRunStart`'s strict CAS. The dead pod still owns `run_owner_pod`, so every replay attempt fails with `RunClaimError: ... already running on another pod`.
  2. Heartbeat watcher (the path that WOULD handle this via `claimOrphanedRun` + `isResume: true`) never fires. The `KV_POD_HEARTBEATS` JetStream bucket is missing from NATS, even though other JS resources initialize fine. Looks like bucket creation fails silently at `app.ts:867`.

Net: permanent pod death is currently only recovered when the dead pod itself restarts. The PR #3387 architectural claim ("DBOS replay covers what /attach orphan-resume used to do") is incomplete — full repro + diagnostic notes are in the test's docstring.

How to Test

  1. Start Docker Desktop.
  2. `./tests/multi-pod/run.sh` — boots cluster, runs all scenarios, tears down.
  3. Expected: `6 pass / 1 skip / 0 fail` across 5 files (~5s after cluster up).

For iteration: `docker compose -f tests/multi-pod/docker-compose.yml up -d` once, then `bun test tests/multi-pod/scenarios/` repeatedly.

Migration Notes

None. Self-contained under `tests/multi-pod/` and the existing CI workflow picks up the new scenarios automatically.

Review Checklist

  • PR title is clear and descriptive
  • Changes are tested and working (6/6 active scenarios pass locally)
  • Documentation is updated (if needed) — none; per-file docstrings carry the structure
  • No breaking changes

Summary by cubic

Adds a mock OpenAI-compatible provider and a cross-pod /attach test to run the dispatch pipeline end-to-end across multiple pods without a real LLM, plus reliability improvements and clearer failure surfacing in the test harness.

  • New Features

    • mock-ai service (OpenAI chat-completions streaming) at http://mock-ai:9000/v1; used by openai-compatible.
    • Setup helpers: wireMockProvider, createTestAgent, createTestThread; DB helper getThreadRunOwnerPod; compose POD_NAME mapping to target specific pods.
    • Scenarios: attach-cross-pod validates buffered-prefix replay across pods; pod-death-dbos-replay remains skipped with an updated note on three gaps (no cross-pod DBOS recovery scan, resume purges JetStream, heartbeat bucket fragility).
  • Bug Fixes

    • mcpCall now prefers non-empty structuredContent and correctly falls back to text payloads.
    • dbQuery CLI/docstring synced to --csv; attach-cross-pod header timing corrected to slow:5x500 (~2.5s).
    • attach-cross-pod replaces a racy sleep with a JetStream-confirmation probe to deterministically test buffered replay.
    • Test hooks auto-restore stopped pods and now fail fast when docker compose ps errors, avoiding misleading waitReady timeouts.

Written for commit d0b9e8a. Summary will update on new commits. Review in cubic

Builds out the LLM-dependent half of the multi-pod framework so we can
exercise the decopilot dispatch pipeline end-to-end without burning real
provider budget.

**Mock-ai service** (tests/multi-pod/mock-ai/) is a ~140-LOC Bun HTTP
server that speaks the OpenAI chat-completions wire protocol — enough
for mesh's `openai-compatible` adapter to drive `streamText`. Test-time
controls (chunk count + delay) come from the user message text because
mesh strips request headers before calling the provider; the mock
parses "slow:NxMS" / "many:N" hints out of the prompt.

**Setup helpers** (lib/setup.ts) gain `wireMockProvider`,
`createTestAgent`, `createTestThread` for the three calls every
dispatch scenario needs: register an openai-compatible credential
pointing at mock-ai, pin the org's "smart" tier to it, create a
virtual MCP, pre-create the thread row (without which /attach 404s
before the workflow's prepareRun gets to insert it).

**DB helper** (lib/db.ts) shells out to `docker compose exec postgres
psql` for the rare cases where a scenario needs internal state not
surfaced by the public API — currently used to look up the actual
dispatch owner from `threads.run_owner_pod` for targeted-kill
scenarios.

**POD_NAME wired into compose** so `run_owner_pod` matches the compose
service name ("mesh-1") and tests can map straight from a thread row
to a `docker compose kill` target.

**Scenarios:**

- `attach-cross-pod` (✅ passing) — the headline test that directly
  validates the deliverPolicy fix from PR #3387. POSTs on pod-1,
  attaches on pods 2 and 3, asserts both see the buffered prefix of a
  slowed-down run. Catches the bug regardless of which pod DBOS picks
  for dispatch.

- `pod-death-dbos-replay` (⚠️ skipped, fully wired) — would validate
  that SIGKILLing the run-owning pod still completes the run via DBOS
  replay. Surfaced a real architectural finding: DBOS replay on
  another pod fails `claimRunStart`'s strict CAS because the dead pod
  still owns `run_owner_pod`; the heartbeat watcher that DOES handle
  this (via `claimOrphanedRun`) never fires because the
  `KV_POD_HEARTBEATS` JetStream bucket is missing — bucket creation
  appears to fail silently at app.ts:867. Full repro + diagnostic
  notes in the test's docstring; remove `.skip` to verify once one of
  those is fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 7 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tests/multi-pod/scenarios/attach-cross-pod.test.ts">

<violation number="1" location="tests/multi-pod/scenarios/attach-cross-pod.test.ts:121">
P2: Wait longer than the mock’s first chunk delay before attaching, otherwise this scenario can pass before any buffered content exists.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Re-trigger cubic

Comment thread tests/multi-pod/scenarios/attach-cross-pod.test.ts Outdated
viktormarinho and others added 2 commits May 17, 2026 19:13
Three review-flagged issues:

- **dbQuery docstring/spawn cleanup**: psql was invoked with both
  `-A -F "|"` and `--csv`. `--csv` wins, so output was actually CSV
  (parser already split on `,`); the docstring's "pipe separator" warning
  would have misled callers picking column lists. Drop the dead flags and
  rewrite the warning around comma-safety.

- **attach-cross-pod header timing**: header claimed "300ms × 5 chunks ≈
  1.5s", actual hint is `slow:5x500` (≈ 2.5s). Sync the header.

- **mcpCall structuredContent short-circuit**: `if (sc) return sc` returned
  on `{}`, skipping the text-content fallback that some MCP tools rely
  on. Tighten to require a non-empty object so callers get the real
  payload (or a deliberate empty echo at the end of the chain) instead
  of a phantom `{}`.

Also revises the pod-death-dbos-replay docstring with the corrected
architectural diagnosis: original write-up blamed the heartbeat bucket
in isolation; deeper investigation shows three compounding gaps (no
cross-pod DBOS recovery scan because `executor_id="local"` for all
pods, unconditional `streamBuffer.purge()` on resume wiping survivor
buffers, and the heartbeat bucket reliability issue). Test stays
`.skip`ped pending the architectural fix.

Pulls in two helpers the pod-death scenario was already using:
- `lib/db.ts` for direct postgres inspection of `run_owner_pod`
- `lib/hooks.ts` auto-restores any pod left stopped by a previous
  scenario, so kill-style tests don't poison subsequent runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 500ms sleep between POST and the cross-pod attach was right at the
edge of the dispatch chain's typical latency. If DBOS dequeue + prepareRun
+ streamText init completed in under 500ms (the mock's per-chunk delay),
the test's attaches would open BEFORE chunk-1 was published, and the
`deliverPolicy: "new"` bug path would still deliver chunk-1 live — the
regression we claim to catch wouldn't manifest.

Fix: open a throwaway /attach on pod-1 first as a synchronization probe.
Only proceed to opening the real test watchers once the probe has actually
seen chunk-1 in its stream — at that point chunk-1 is provably already in
JetStream, so receiving it from a cross-pod attach requires the
`deliverPolicy: "all"` fix to work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tests/multi-pod/lib/hooks.ts">

<violation number="1" location="tests/multi-pod/lib/hooks.ts:33">
P2: Check the `docker compose ps -a` exit code here; otherwise compose failures are swallowed and the hook times out later with a less useful error.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic

Comment thread tests/multi-pod/lib/hooks.ts Outdated
Before, a failing `docker compose ps -a` (daemon down, compose file
moved, permission issue) silently produced empty output. The hook then
treated it as "no pods need restoring" and handed control to waitReady,
which would time out 2 minutes later with a misleading "mesh-1 not
healthy" — turning the real diagnosis into a needle in a haystack.

Now we read stderr too and surface the actual compose error on a
non-zero exit code, so the developer sees the root cause immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@viktormarinho viktormarinho merged commit 3eee70b into main May 17, 2026
12 checks passed
@viktormarinho viktormarinho deleted the viktormarinho/multi-pod-llm-scenarios branch May 17, 2026 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant