test(multi-pod): cross-pod /attach + mock-ai provider scaffolding#3392
Merged
Conversation
Builds out the LLM-dependent half of the multi-pod framework so we can
exercise the decopilot dispatch pipeline end-to-end without burning real
provider budget.
**Mock-ai service** (tests/multi-pod/mock-ai/) is a ~140-LOC Bun HTTP
server that speaks the OpenAI chat-completions wire protocol — enough
for mesh's `openai-compatible` adapter to drive `streamText`. Test-time
controls (chunk count + delay) come from the user message text because
mesh strips request headers before calling the provider; the mock
parses "slow:NxMS" / "many:N" hints out of the prompt.
**Setup helpers** (lib/setup.ts) gain `wireMockProvider`,
`createTestAgent`, `createTestThread` for the three calls every
dispatch scenario needs: register an openai-compatible credential
pointing at mock-ai, pin the org's "smart" tier to it, create a
virtual MCP, pre-create the thread row (without which /attach 404s
before the workflow's prepareRun gets to insert it).
**DB helper** (lib/db.ts) shells out to `docker compose exec postgres
psql` for the rare cases where a scenario needs internal state not
surfaced by the public API — currently used to look up the actual
dispatch owner from `threads.run_owner_pod` for targeted-kill
scenarios.
**POD_NAME wired into compose** so `run_owner_pod` matches the compose
service name ("mesh-1") and tests can map straight from a thread row
to a `docker compose kill` target.
**Scenarios:**
- `attach-cross-pod` (✅ passing) — the headline test that directly
validates the deliverPolicy fix from PR #3387. POSTs on pod-1,
attaches on pods 2 and 3, asserts both see the buffered prefix of a
slowed-down run. Catches the bug regardless of which pod DBOS picks
for dispatch.
- `pod-death-dbos-replay` (⚠️ skipped, fully wired) — would validate
that SIGKILLing the run-owning pod still completes the run via DBOS
replay. Surfaced a real architectural finding: DBOS replay on
another pod fails `claimRunStart`'s strict CAS because the dead pod
still owns `run_owner_pod`; the heartbeat watcher that DOES handle
this (via `claimOrphanedRun`) never fires because the
`KV_POD_HEARTBEATS` JetStream bucket is missing — bucket creation
appears to fail silently at app.ts:867. Full repro + diagnostic
notes in the test's docstring; remove `.skip` to verify once one of
those is fixed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
1 issue found across 7 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="tests/multi-pod/scenarios/attach-cross-pod.test.ts">
<violation number="1" location="tests/multi-pod/scenarios/attach-cross-pod.test.ts:121">
P2: Wait longer than the mock’s first chunk delay before attaching, otherwise this scenario can pass before any buffered content exists.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Re-trigger cubic
Three review-flagged issues:
- **dbQuery docstring/spawn cleanup**: psql was invoked with both
`-A -F "|"` and `--csv`. `--csv` wins, so output was actually CSV
(parser already split on `,`); the docstring's "pipe separator" warning
would have misled callers picking column lists. Drop the dead flags and
rewrite the warning around comma-safety.
- **attach-cross-pod header timing**: header claimed "300ms × 5 chunks ≈
1.5s", actual hint is `slow:5x500` (≈ 2.5s). Sync the header.
- **mcpCall structuredContent short-circuit**: `if (sc) return sc` returned
on `{}`, skipping the text-content fallback that some MCP tools rely
on. Tighten to require a non-empty object so callers get the real
payload (or a deliberate empty echo at the end of the chain) instead
of a phantom `{}`.
Also revises the pod-death-dbos-replay docstring with the corrected
architectural diagnosis: original write-up blamed the heartbeat bucket
in isolation; deeper investigation shows three compounding gaps (no
cross-pod DBOS recovery scan because `executor_id="local"` for all
pods, unconditional `streamBuffer.purge()` on resume wiping survivor
buffers, and the heartbeat bucket reliability issue). Test stays
`.skip`ped pending the architectural fix.
Pulls in two helpers the pod-death scenario was already using:
- `lib/db.ts` for direct postgres inspection of `run_owner_pod`
- `lib/hooks.ts` auto-restores any pod left stopped by a previous
scenario, so kill-style tests don't poison subsequent runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 500ms sleep between POST and the cross-pod attach was right at the edge of the dispatch chain's typical latency. If DBOS dequeue + prepareRun + streamText init completed in under 500ms (the mock's per-chunk delay), the test's attaches would open BEFORE chunk-1 was published, and the `deliverPolicy: "new"` bug path would still deliver chunk-1 live — the regression we claim to catch wouldn't manifest. Fix: open a throwaway /attach on pod-1 first as a synchronization probe. Only proceed to opening the real test watchers once the probe has actually seen chunk-1 in its stream — at that point chunk-1 is provably already in JetStream, so receiving it from a cross-pod attach requires the `deliverPolicy: "all"` fix to work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
1 issue found across 5 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="tests/multi-pod/lib/hooks.ts">
<violation number="1" location="tests/multi-pod/lib/hooks.ts:33">
P2: Check the `docker compose ps -a` exit code here; otherwise compose failures are swallowed and the hook times out later with a less useful error.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
Before, a failing `docker compose ps -a` (daemon down, compose file moved, permission issue) silently produced empty output. The hook then treated it as "no pods need restoring" and handed control to waitReady, which would time out 2 minutes later with a misleading "mesh-1 not healthy" — turning the real diagnosis into a needle in a haystack. Now we read stderr too and surface the actual compose error on a non-zero exit code, so the developer sees the root cause immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is this contribution about?
Builds out the LLM-dependent half of the multi-pod test framework added in #3391. A ~140-LOC Bun mock-ai service speaks the OpenAI streaming wire protocol so mesh's existing
openai-compatibleadapter can drivestreamTextagainst it — no real provider budget needed. Setup helpers gain `wireMockProvider`, `createTestAgent`, and `createTestThread` for the three calls every dispatch scenario needs.The headline scenario, `attach-cross-pod`, directly validates the deliverPolicy fix from #3387: POST a slowed-down run on pod-1, attach on pods 2 and 3, assert both see the buffered prefix. Passes locally.
The second scenario, `pod-death-dbos-replay`, is fully wired but `.skip`ped — it surfaced a real architectural finding (see below) that needs its own follow-up to fix.
Architectural finding (worth a separate issue)
The pod-death scenario times out at 120s. Two compounding gaps:
Net: permanent pod death is currently only recovered when the dead pod itself restarts. The PR #3387 architectural claim ("DBOS replay covers what /attach orphan-resume used to do") is incomplete — full repro + diagnostic notes are in the test's docstring.
How to Test
For iteration: `docker compose -f tests/multi-pod/docker-compose.yml up -d` once, then `bun test tests/multi-pod/scenarios/` repeatedly.
Migration Notes
None. Self-contained under `tests/multi-pod/` and the existing CI workflow picks up the new scenarios automatically.
Review Checklist
Summary by cubic
Adds a mock OpenAI-compatible provider and a cross-pod /attach test to run the dispatch pipeline end-to-end across multiple pods without a real LLM, plus reliability improvements and clearer failure surfacing in the test harness.
New Features
mock-aiservice (OpenAI chat-completions streaming) athttp://mock-ai:9000/v1; used byopenai-compatible.wireMockProvider,createTestAgent,createTestThread; DB helpergetThreadRunOwnerPod; composePOD_NAMEmapping to target specific pods.attach-cross-podvalidates buffered-prefix replay across pods;pod-death-dbos-replayremains skipped with an updated note on three gaps (no cross-pod DBOS recovery scan, resume purges JetStream, heartbeat bucket fragility).Bug Fixes
mcpCallnow prefers non-emptystructuredContentand correctly falls back to text payloads.dbQueryCLI/docstring synced to--csv;attach-cross-podheader timing corrected toslow:5x500(~2.5s).attach-cross-podreplaces a racy sleep with a JetStream-confirmation probe to deterministically test buffered replay.docker compose pserrors, avoiding misleading waitReady timeouts.Written for commit d0b9e8a. Summary will update on new commits. Review in cubic