Skip to content

Crash: SIGBUS/SIGTRAP in onnxruntime-node ReleaseIoBinding during local embedding load (concurrent opencode sessions, macOS arm64) #21

@vinayakkulkarni

Description

@vinayakkulkarni

Description

Both opencode TUI sessions crash with a native SIGBUS / SIGTRAP inside onnxruntime-node while the magic-context plugin loads the local embedding model (Xenova/all-MiniLM-L6-v2). The crashes occurred in two concurrently-running opencode processes (separate sessions, separate cwd) within seconds of each other — strongly suggests a concurrency / double-free bug in the local embedding loader when two processes touch the same cached model at the same time.

Environment

  • Plugin: @cortexkit/opencode-magic-context@0.9.1
  • OpenCode: 1.4.7
  • OS: macOS 26.4.1 (build 25E253)
  • Arch: arm64 (Apple Silicon, M2 Max — Mac14,6)
  • Node: v24.15.0
  • onnxruntime-node: 1.21.0 (npm) — bundles libonnxruntime.1.14.0.dylib (native C++ lib)
  • Transitive dep via: @huggingface/transformers@~3.7.6

Configuration

~/.config/opencode/magic-context.jsonc:

{
  "$schema": "https://raw.githubusercontent.com/cortexkit/opencode-magic-context/master/assets/magic-context.schema.json",
  "enabled": true,
  "historian": {
    "model": "anthropic/claude-haiku-4-5",
    "fallback_models": ["opencode-go/glm-5"]
  },
  "dreamer": {
    "enabled": true,
    "model": "anthropic/claude-sonnet-4-6"
  },
  "sidekick": {
    "model": "anthropic/claude-haiku-4-5"
  },
  "experimental": {
    "user_memories": { "enabled": false },
    "pin_key_files": { "enabled": false }
  },
  "compaction_markers": false
}

No explicit embedding block, so the default (provider: "local", model Xenova/all-MiniLM-L6-v2) is used.

Reproduction

  1. macOS 26.4.1 arm64, opencode 1.4.7, magic-context 0.9.1 (default local embedding config).
  2. Cached model already present at ~/.local/share/opencode/storage/plugin/magic-context/models/Xenova/all-MiniLM-L6-v2/onnx/model.onnx (90 MB, md5 = 84f837de2a0f667784facf2ba0f36b22).
  3. Start two opencode sessions in separate terminals, in different project directories:
    • opencode -s ses_267e0c216ffeI6nQIAWBnt6G20
    • opencode -s ses_26616a332ffenMsmj6S59ueIH1
  4. Work in both simultaneously (both sessions presumably trigger embedding/ctx_search/dreamer init near the same time).
  5. Within ~5 minutes, both processes die silently, no TUI error, parent shell prints nothing.

Three crash reports were generated in 17 minutes (10:56, 11:12:07, 11:12:12), all with the same native-frame signature.

Crash signatures (from ~/Library/Logs/DiagnosticReports/opencode-*.ips)

Crash 1 — opencode-2026-04-17-105636.ips

exception  : EXC_BAD_ACCESS (SIGBUS)
subtype    : KERN_PROTECTION_FAILURE at 0x00000004138ed418
termination: Bus error: 10
faultingThread: "Worker"
frames (top):
  OrtApis::ReleaseIoBinding(OrtIoBinding*) + 24
  InferenceSessionWrap::LoadModel(Napi::CallbackInfo const&) + 3220
  Napi::InstanceWrap<InferenceSessionWrap>::InstanceMethodCallbackWrapper::lambda()
  Napi::InstanceWrap<InferenceSessionWrap>::InstanceMethodCallbackWrapper

Crash 2 — opencode-2026-04-17-111207.ips

exception  : EXC_BREAKPOINT (SIGTRAP)
termination: Trace/BPT trap: 5
faultingThread: "Worker"
frames (top):
  (native, unsymbolicated + _sigtramp)
  InferenceSessionWrap::LoadModel(Napi::CallbackInfo const&) + 3184
  Napi::InstanceWrap<InferenceSessionWrap>::InstanceMethodCallbackWrapper::lambda()
  Napi::InstanceWrap<InferenceSessionWrap>::InstanceMethodCallbackWrapper

Crash 3 — opencode-2026-04-17-111212.ips

exception  : EXC_BREAKPOINT (SIGTRAP)
termination: Trace/BPT trap: 5
faultingThread: "Worker"
frames (top):
  (native, unsymbolicated + _sigtramp)
  OrtApis::ReleaseIoBinding(OrtIoBinding*) + 36   ← double frame!
  OrtApis::ReleaseIoBinding(OrtIoBinding*) + 36   ← double-free signature
  InferenceSessionWrap::LoadModel(Napi::CallbackInfo const&) + 3220
  Napi::InstanceWrap<InferenceSessionWrap>::InstanceMethodCallbackWrapper::lambda()

Loaded native modules at crash time (from usedImages):

/…/onnxruntime_binding.node
/…/libonnxruntime.1.14.0.dylib

Root-cause hypothesis

All three crashes share:

  1. The same faulting thread name ("Worker") — likely a transformers.js / onnxruntime-node worker pool thread.
  2. The same top-of-stack function: OrtApis::ReleaseIoBinding inside InferenceSessionWrap::LoadModel.
  3. Crash 3 shows the ReleaseIoBinding frame twice → double-free; Crash 1 shows KERN_PROTECTION_FAILURE on a write → use-after-free.

The most plausible cause: when two opencode processes share the same on-disk ONNX model cache and both initialize InferenceSession at roughly the same wall-clock time, ONNX Runtime 1.14.0's session-initialization path releases/cleans up an IoBinding that the other process (or its own abort path) also releases — double-free → ORT_ENFORCEbrk 1 → SIGTRAP. The 1.14.0 native lib is quite old (Feb 2023) relative to macOS 26.4.1.

Suggested fixes (in order of effort)

  1. Lazy + singleton-initialized embedding session — ensure only one InferenceSession per process, behind an async-locked init. If there's already a lock, it may not be covering the full LoadModel → ready path.
  2. Per-process model cache path (or file-lock the first-time extraction) — avoid two processes mmap-ing / opening the same .onnx file during a warm-start window.
  3. Bump @huggingface/transformers → newer versions pull onnxruntime-node ≥ 1.18 which carries native lib ≥ 1.18 with many post-1.14 stability fixes on Apple Silicon + newer macOS.
  4. Graceful fallback — wrap await pipeline(...) / InferenceSession.create in try/catch; on native crash signal or init error, auto-disable embeddings for this process instead of letting it SIGTRAP-kill the entire opencode TUI. (Currently there's zero user-visible error — the whole TUI just vanishes.)
  5. Document the workaroundembedding: { provider: "off" } in magic-context.jsonc prevents the crash (ctx_search still works via FTS5 fallback). Worth noting in README until the root fix lands.

Workaround applied locally

Setting embedding.provider: "off" in ~/.config/opencode/magic-context.jsonc bypasses the crash entirely. FTS5 full-text fallback keeps ctx_search functional; only semantic ranking is lost.

Additional context

Happy to attach the full .ips files, run with any DEBUG=* env var, or test a patched build. Let me know if you want the full thread dumps — I can upload them as a gist.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions