fix: surface bundle startup failures to workspace log, SSE, and health (#7) by mgoldsborough · Pull Request #23 · NimbleBrainInc/nimblebrain

mgoldsborough · 2026-04-16T17:27:32Z

Summary

When `startBundleSource` threw inside `startWorkspaceBundles`, the error hit `process.stderr` and was dropped. The failed bundle vanished from workspace JSONL logs, from SSE clients, and from `/v1/health` — operators had to tail container logs to know a bundle was down.
Root cause: the catch block had no reference to the `EventSink` and no way to inform `HealthMonitor` (the failed bundle never became an `McpSource`, which is `HealthMonitor`'s only input).

Changes

New `bundle.start_failed` `EngineEventType`.
`startWorkspaceBundles` takes the `EventSink`, emits `bundle.start_failed` on catch, and returns a `BundleStartFailure[]`.
`Runtime.start` stores failures and exposes `getStartFailures()`; the API server constructs `HealthMonitor` with those records so `/v1/health` shows the bundle as `dead`.
Added `bundle.start_failed` to `WORKSPACE_EVENTS` (JSONL log) and to the SSE forwarding allow-list.
`HealthMonitor.getStatus()` merges live records with start-failure records; when a source with the same name later comes up, the live record hides the earlier failure so operators don't see stale dead entries.

Startup-continues-on-failure behavior is preserved — the workspace registry is still created, platform tools still work, and other bundles in the same workspace still start.

Test plan

New `workspace-runtime` tests: bundle.start_failed is emitted on catch and returned in `startFailures`; `registries` still contains the workspace; no event is emitted when everything succeeds.
New `health-monitor` tests: `startFailures` are reported as `dead` in `getStatus()`; they merge with live source records; a startFailure is suppressed when a live source with the same name exists.
`workspace-log-sink` test updated: `bundle.start_failed` is in the workspace events set.
`bun test test/unit/` — 1730 pass, 0 fail.
`bun run check` / `bun run lint` — clean.

Closes #7

#7) Previously, when startBundleSource threw inside startWorkspaceBundles, the error was written to process.stderr and silently dropped — the bundle simply vanished from the registry, from workspace JSONL logs, from SSE clients, and from /v1/health. Operators had to tail container logs to know a bundle was down; users saw "App X is not available" with no context. Root cause: the catch block had no reference to the EventSink and no way to inform HealthMonitor (the failed bundle never became an McpSource, which is HealthMonitor's only input). Fix: - Add a bundle.start_failed EngineEventType. - startWorkspaceBundles now accepts the EventSink and, on catch, emits bundle.start_failed and returns a BundleStartFailure[] that the caller can forward to HealthMonitor. - Runtime.start stores the failure list and exposes it via getStartFailures(); the API server constructs HealthMonitor with those records so /v1/health shows the bundle as `dead`. - bundle.start_failed is added to WORKSPACE_EVENTS (JSONL log) and to the SSE forwarding list. - HealthMonitor.getStatus() merges live records with start-failure records, suppressing a failure record when a source with the same name later came up (so a successful retry hides the earlier attempt). Startup-continues-on-failure behavior is preserved — the workspace registry is still created, platform tools still work, and other bundles still start.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: surface bundle startup failures to workspace log, SSE, and health (#7)#23

fix: surface bundle startup failures to workspace log, SSE, and health (#7)#23
mgoldsborough wants to merge 1 commit intomainfrom
fix/issue-7-bundle-start-failed

mgoldsborough commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mgoldsborough commented Apr 16, 2026

Summary

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant