Include exe-path fingerprint in VM log sentinels#133
Merged
Conversation
CleanupStaleLogs decided whether to remove a per-VM log directory
based on whether the PID in its sentinel file was still alive
(kill -0). That check cannot distinguish "bbox still running at PID X"
from "PID X was recycled onto a completely unrelated process". Two
failure modes:
- A bbox crash whose PID gets recycled to a live unrelated process
leaves the log dir behind forever (cleanup skips it, assuming bbox
is still alive).
- A fresh bbox invocation whose PID was previously held by another
process incorrectly skips cleanup of its own stale-from-prior-boot
log dir.
Neither is a privilege boundary, but both cause reliability/hygiene
problems — the same class go-microvm v0.0.32 closed for
`terminateStaleRunner` via `procutil.IsExpectedProcess`.
Mirror that pattern: sentinels now carry the bbox exe path as a
fingerprint on a second line:
<pid>
/absolute/path/to/bbox
CleanupStaleLogs reads the fingerprint and, via the new
`process.IsExpectedProcess(pid, exe)` helper, verifies both that
the PID is alive AND that /proc/<pid>/exe (Linux) matches the
recorded bbox path. A PID that's been recycled to another binary
is correctly treated as stale and the dir is removed.
Darwin falls back to plain liveness (no /proc); this matches
go-microvm's own procutil.IsExpectedProcess on darwin.
Backward compatible: legacy single-line PID sentinels continue to
use the old liveness check via parseSentinel's two-format parser.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes a MEDIUM correctness finding:
CleanupStaleLogsused plainkill -0liveness to decide whether to remove a per-VM log directory. That check cannot distinguish "bbox still alive at PID X" from "PID X recycled to an unrelated process". Two failure modes:Neither is a privilege boundary, but both cause reliability/hygiene problems — the same class go-microvm v0.0.32 closed for
terminateStaleRunnerviaprocutil.IsExpectedProcess.Changes
<pid>to<pid>\n<exe-path>.WriteSentinelusesos.Executable()to record the absolute bbox binary path.process.IsExpectedProcess(pid, exe):/proc/<pid>/exesymlink and compares (strips" (deleted)"suffix). Handles permission-denied and ENOENT as "not a match" → dir removed./proc; avoiding cgo). Documented as a known limitation, mirrors go-microvm's darwin behavior.CleanupStaleLogsbranches on sentinel format: v2 (PID+exe) usesIsExpectedProcess, v1 (legacy PID-only) uses the oldIsAlivefallback..bbox-sentinel. Files over the cap are skipped (not nuked — the content is suspicious, leave it alone).reasonon the stale-removal warn log:"pid-dead"vs"fingerprint-mismatch-or-dead"makes triage obvious.Test plan
task fmt && task lint && task test— all greenprocess.IsExpectedProcessunit tests (Linux): self absolute-path, self base-name, wrong binary, non-existent PID, invalid inputsCleanupStaleLogstests: fingerprint match preserves, fingerprint mismatch removes (Linux-only skip), legacy PID-only still works, oversized sentinel left alonebbox claude-code --no-mcp --exec /bin/truewrites v2 sentinel (inspected), cleans up on next runFollow-ups (not this PR)
golang.org/x/sys/unix.Sysctl("kern.proc.pathname.<pid>")to close the PID-reuse gap on macOS. Currently degrade-gracefully.BBOX_KEEP_VM_DATA=1preserves debug data only until the next bbox run. Worth a separate discussion about scoping.🤖 Generated with Claude Code