Skip to content

Include exe-path fingerprint in VM log sentinels#133

Merged
JAORMX merged 1 commit intomainfrom
sec/sentinel-fingerprint
Apr 17, 2026
Merged

Include exe-path fingerprint in VM log sentinels#133
JAORMX merged 1 commit intomainfrom
sec/sentinel-fingerprint

Conversation

@JAORMX
Copy link
Copy Markdown
Contributor

@JAORMX JAORMX commented Apr 17, 2026

Summary

Closes a MEDIUM correctness finding: CleanupStaleLogs used plain kill -0 liveness to decide whether to remove a per-VM log directory. That check cannot distinguish "bbox still alive at PID X" from "PID X recycled to an unrelated process". Two failure modes:

  • Crashed bbox whose PID is now held by some other process → log dir never cleaned up (cleanup assumes bbox is still alive).
  • Fresh bbox whose PID was previously held by something → incorrectly skips cleanup of its own stale dir from a prior boot.

Neither is a privilege boundary, but both cause reliability/hygiene problems — the same class go-microvm v0.0.32 closed for terminateStaleRunner via procutil.IsExpectedProcess.

Changes

  • Sentinel format extended from <pid> to <pid>\n<exe-path>. WriteSentinel uses os.Executable() to record the absolute bbox binary path.
  • New helper process.IsExpectedProcess(pid, exe):
    • Linux: reads /proc/<pid>/exe symlink and compares (strips " (deleted)" suffix). Handles permission-denied and ENOENT as "not a match" → dir removed.
    • Darwin: falls back to plain liveness (no /proc; avoiding cgo). Documented as a known limitation, mirrors go-microvm's darwin behavior.
  • CleanupStaleLogs branches on sentinel format: v2 (PID+exe) uses IsExpectedProcess, v1 (legacy PID-only) uses the old IsAlive fallback.
  • Backward compatible: existing v1 sentinels on disk continue to work transparently; new writes produce v2.
  • Sentinel size cap (4 KiB): a same-UID attacker cannot force bbox to OOM at startup by planting a giant .bbox-sentinel. Files over the cap are skipped (not nuked — the content is suspicious, leave it alone).
  • Structured reason on the stale-removal warn log: "pid-dead" vs "fingerprint-mismatch-or-dead" makes triage obvious.
  • Debug log distinguishes "sentinel missing" from "sentinel unreadable" (permission-denied on shared hosts).

Test plan

  • task fmt && task lint && task test — all green
  • process.IsExpectedProcess unit tests (Linux): self absolute-path, self base-name, wrong binary, non-existent PID, invalid inputs
  • CleanupStaleLogs tests: fingerprint match preserves, fingerprint mismatch removes (Linux-only skip), legacy PID-only still works, oversized sentinel left alone
  • Full e2e: bbox claude-code --no-mcp --exec /bin/true writes v2 sentinel (inspected), cleans up on next run

Follow-ups (not this PR)

  • Darwin could implement real fingerprinting via golang.org/x/sys/unix.Sysctl("kern.proc.pathname.<pid>") to close the PID-reuse gap on macOS. Currently degrade-gracefully.
  • Pre-existing quirk: BBOX_KEEP_VM_DATA=1 preserves debug data only until the next bbox run. Worth a separate discussion about scoping.

🤖 Generated with Claude Code

CleanupStaleLogs decided whether to remove a per-VM log directory
based on whether the PID in its sentinel file was still alive
(kill -0). That check cannot distinguish "bbox still running at PID X"
from "PID X was recycled onto a completely unrelated process". Two
failure modes:

- A bbox crash whose PID gets recycled to a live unrelated process
  leaves the log dir behind forever (cleanup skips it, assuming bbox
  is still alive).
- A fresh bbox invocation whose PID was previously held by another
  process incorrectly skips cleanup of its own stale-from-prior-boot
  log dir.

Neither is a privilege boundary, but both cause reliability/hygiene
problems — the same class go-microvm v0.0.32 closed for
`terminateStaleRunner` via `procutil.IsExpectedProcess`.

Mirror that pattern: sentinels now carry the bbox exe path as a
fingerprint on a second line:

    <pid>
    /absolute/path/to/bbox

CleanupStaleLogs reads the fingerprint and, via the new
`process.IsExpectedProcess(pid, exe)` helper, verifies both that
the PID is alive AND that /proc/<pid>/exe (Linux) matches the
recorded bbox path. A PID that's been recycled to another binary
is correctly treated as stale and the dir is removed.

Darwin falls back to plain liveness (no /proc); this matches
go-microvm's own procutil.IsExpectedProcess on darwin.

Backward compatible: legacy single-line PID sentinels continue to
use the old liveness check via parseSentinel's two-format parser.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX merged commit 30250c6 into main Apr 17, 2026
8 checks passed
@JAORMX JAORMX deleted the sec/sentinel-fingerprint branch April 17, 2026 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant