Close out the audit: mode refactor, ownership tokens, process identity, tmux spawning, extra_ports, docker e2e#25
Merged
Conversation
ModeHandler::bring_up took 14 positional parameters under a silenced too_many_arguments lint, and container/hybrid duplicated ~100 lines of docker startup (port maps, overlay generation, compose up, rollback registration, pair recording) that had already drifted apart once. bring_up now takes a BringUpRequest plus config/root/log. The shared logic lives in modes/mod.rs: start_docker_services (group loop, scoped or whole-file), ensure_worktree, native_ports_for_slot (filter-aware; hybrid honors --services, host keeps its historical allocate-all behavior), and spawn_native_services. The whole-compose-file and labeled-data fallbacks stay mode-private. No behavior change; the clippy allow is gone. Fixes #12 https://claude.ai/code/session_017UcuvzMKHVfyBCcq8ipAko
…th hardening
Audit follow-ups to the pending-session work:
Every operation that marks a session Pending now records a PendingOp
{id, since}; finalize paths write state only while they still own the
entry (State::still_owned). A finalize that lost ownership — the session
was removed or taken over by a concurrent down/shutdown/flush — stands
down and tears down the resources it created instead of resurrecting a
deleted entry. restore_session is a no-op for lost ownership; sync
refuses sessions that are mid-operation; ls warns about pending entries
older than 15 minutes (slot leak after a crashed command) with the
recovery hint.
The branch-name prompt in cmd_up now runs on a cloned snapshot with no
lock held — even a shared lock blocks exclusive acquirers, so a human at
the prompt stalled every concurrent up/down for their 10s timeout.
The TERM->KILL escalation existed twice (terminate_with_grace in main,
kill_process_group in process); both now share signal_with_grace.
check_processes_alive warns when a nohup pid file is missing entirely
instead of staying silent about a killed service.
https://claude.ai/code/session_017UcuvzMKHVfyBCcq8ipAko
… as real panes Service identification stops relying on fuzzy matching: PID files now record the process start token (ps lstart) next to the pid. Readers treat a live PID with a mismatched token as recycled: kill paths refuse to signal it (and its group), whose-pid refuses to attribute it, and health checks report the service as down. Legacy single-line pid files still parse. Containers are matched by the compose project label (exact) instead of name substrings, so session feat-a can no longer adopt feat-ab's containers or hand redis postgres's port. match_services keeps a claimed-PID set so two services with overlapping command tokens cannot both own one process. Resume health checks and status now use the session's own identity (token-verified pid files, tmux windows) instead of a depth-1 lsof scan that missed servers with a cwd in a subdirectory and then spawned duplicates. tmux services are spawned as the window's own process (no send-keys typing): pane PIDs are recorded as token-verified pid files, dead panes are kept via a session-scoped remain-on-exit hook, and a command that dies within the 1.5s grace window fails the spawn with its exit status and last output instead of reporting a ready session. Window 0 stays a plain shell for interactive attach. Env sourcing uses POSIX '.' so it works under dash. Fixes #10 Fixes #13 https://claude.ai/code/session_017UcuvzMKHVfyBCcq8ipAko
…obing; explicit publish_primary extra_ports were computed as base + slot everywhere while primary ports use base + slot*stride — with slot_stride configured, secondary ports ignored the spacing the stride exists to provide. They now share the primary spacing rule (ServiceConfig::extra_port_for_slot), in the env, the compose overlay, and the compose interpolation vars. build_env takes the stride and a single merged service-config slice. Extra ports are deterministic (no auto-bump), so an occupied one used to surface only as a raw bind failure from the service or container. bring_up now probes them up front: strict_port makes it a hard error with the holding PID; otherwise it warns before starting. Skipped (already-running) services are exempt — they hold their own ports. The implicit rule "an extra_port with container_port suppresses the primary publish" gains an explicit escape hatch: publish_primary on the service wins when set, and ecluse validate warns when the implicit rule is being relied on (deprecated). Fixes #14 https://claude.ai/code/session_017UcuvzMKHVfyBCcq8ipAko
…eprecated hooks, changelog staged tests/docker_e2e.rs runs real docker compose lifecycles where a daemon is available (CI ubuntu, dev machines) and skips elsewhere: hybrid up/down, rollback after a failed post_up leaves no containers/worktree/state, the hyphenated-slug multi-compose teardown that motivated compose_overlays, and container mode whole-file startup. Images come from the ECR public mirror to dodge Docker Hub's unauthenticated rate limits on shared CI IPs. SKILL.md and docs/limits.md document the agent-visible behavior added in this wave (pending sessions and their errors, stale-pending recovery, identity-based kills, tmux instant-failure detection, --force ownership warnings). All examples migrate off the deprecated on_up/on_down hook names — the migration-style hooks they demonstrate need session env, so they now use post_up/pre_down. CHANGELOG gains the Unreleased section covering everything since 0.2.17, as the release process requires. Fixes #16 https://claude.ai/code/session_017UcuvzMKHVfyBCcq8ipAko
skip_serializing_if dropped the empty vec on save while the missing field deserialized to the default — an explicit opt-out silently came back as ['.env', '.env.local'] after any config rewrite. inherit_env is now always serialized. https://claude.ai/code/session_017UcuvzMKHVfyBCcq8ipAko
Four tests pulling the image in parallel tripped the registry's per-IP rate limit on shared CI runners (even on the ECR mirror). The first caller pulls with retries behind a OnceLock; an unobtainable image skips the suite like an absent daemon instead of failing on registry weather. https://claude.ai/code/session_017UcuvzMKHVfyBCcq8ipAko
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #10
Fixes #12
Fixes #13
Fixes #14
Fixes #16
Closes out every remaining open issue plus the five findings from the post-merge audit, as five reviewable commits:
1.
refactor(modes)— #12ModeHandler::bring_uptakes aBringUpRequest(was 14 positional params under a silenced clippy lint, now zero allows in the tree). The duplicated docker-startup/worktree/port-allocation/spawn blocks move to shared building blocks inmodes/mod.rs; container/hybrid keep only their genuine differences (whole-file vs label-split fallbacks). Net −117 lines, no behavior change.2.
fix(state)— audit findings 1–5PendingOp {id, since}): every finalize/restore path writes state only while it still owns the entry. Adown/flushracing a liveupnow makes the loser tear down its own resources and exit non-zero instead of resurrecting a deleted session.syncrefuses mid-operation sessions.lswarns about sessions pending >15 min (crashed command ⇒ leaked slot) with thedown <slug>recovery hint.signal_with_grace), not two copies; missing pid files produce a warning instead of silence.3.
fix(process,sync)— #10 + #13ps lstart); recycled PIDs are never signaled, never attributed bywhose-pid, and report as down. Legacy files still parse.match_serviceskeeps a claimed-PID set; resume/status health is identity-based (no more depth-1 lsof scan → no more duplicate spawns).send-keystyping. Pane PIDs are recorded as token-verified pid files; a command dying within 1.5 s failsupwith its exit status and last output (verified by tmux-gated tests, run against tmux 3.4); dead panes are kept for inspection; windowshellstays interactive. Env sourcing uses POSIX..HOME— they were silently reading the developer's real~/.config/ecluse/config.toml(this surfaced as real failures the moment tmux behavior changed).4.
fix(config)— #14extra_portsuse the samebase + slot*striderule as primaries (unchanged for the default stride 1), and are probed up front: occupied → hard error understrict_port, warning otherwise (they don't auto-bump, so the alternative was a raw bind failure). Newpublish_primaryfield replaces the implicit "extra_port with container_port suppresses the primary" rule, whichvalidatenow flags as deprecated.5.
test(e2e)+ docs — #16 + remaining process debttests/docker_e2e.rs: hybrid lifecycle, rollback leaves no containers/worktree/state, the hyphenated-slug multi-compose teardown (Overlay→compose mapping is reconstructed by filename parsing at teardown — ambiguous with hyphenated slugs; store the pairs in state #9's scenario), container mode — all verified against a live daemon here; they skip cleanly where docker is absent (macOS runners). Images via the ECR public mirror (Docker Hub rate limits hit shared CI IPs).--forceownership warnings).examples/andskills/examples/migrate off deprecatedon_up/on_down(their migration hooks need session env — nowpost_up/pre_down).[Unreleased]section covering everything since 0.2.17.Verification
Locally:
cargo fmt --check,cargo clippy -- -D warnings, 428 unit + 25 integration + 4 docker-e2e + 2 tmux-gated tests, all green, plus live-daemon docker runs. CI gates the merge.https://claude.ai/code/session_017UcuvzMKHVfyBCcq8ipAko
Generated by Claude Code