Skip to content

fix(modes): enforce rollback-on-failure with an undo-stack guard#19

Merged
hefgi merged 1 commit into
mainfrom
claude/magical-einstein-7m4g40-rollback
Jun 11, 2026
Merged

fix(modes): enforce rollback-on-failure with an undo-stack guard#19
hefgi merged 1 commit into
mainfrom
claude/magical-einstein-7m4g40-rollback

Conversation

@hefgi

@hefgi hefgi commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Fixes #2

Problem

"Rollback on failure" was enforced by hand-rolled cleanup blocks copy-pasted at each failure site in the three bring_up implementations, and the sites had drifted apart:

  • spawn_services failure: no rollback at all (bare ?) — containers stayed up, worktree stayed, nothing in state to find them with
  • compose group N failure (multi-compose): earlier groups' containers kept running while their overlay files — the artifacts teardown needs — were deleted
  • pre_spawn hook failure: full rollback in container mode, none in hybrid/host
  • post_up failure on a resumed session: rollback ran compose down with remove_volumes=true, deleting the session's data volumes
  • symlink_env_files / write_env_file / native port-allocation failures after containers were up: no rollback

Fix

New rollback::Rollback drop-guard: every provisioning step pushes its inverse immediately after succeeding; any early return (?) unwinds the stack in reverse order on drop. disarm() once the session is fully up.

Ordering and data-safety details:

  • compose-down undo is pushed after the overlay-file-removal undo, so LIFO unwinding stops containers before deleting the overlay they're torn down with
  • rollback_volumes = !reuse_worktree: rollback only removes volumes for sessions this up created — a failed re-up of an existing session keeps its data
  • reused worktrees are never deleted by rollback (only the freshly-created-worktree path pushes a worktree-removal undo)
  • the duplicated if/else around spawn_services (both arms made the identical call; only logging differed) is collapsed in host/hybrid

All ~6 manual cleanup blocks are deleted; every failure path now goes through the same guard.

Tests

  • rollback.rs: LIFO order, disarm semantics, push-after-disarm, empty guard
  • host.rs (end-to-end through real bring_up with failing hooks):
    • failed_post_up_hook_rolls_back_fresh_worktree
    • failed_pre_spawn_hook_rolls_back_fresh_worktree — this path previously had no cleanup in host mode
    • failed_post_up_hook_keeps_reused_worktree
    • successful_bring_up_keeps_worktree

Docker-dependent rollback paths (container/hybrid compose groups) use the identical guard mechanism; end-to-end docker coverage is tracked separately in #16.

cargo fmt --check, cargo clippy -- -D warnings (CI invocation), cargo test (377 + 18) green. Note: clippy --all-targets reports 5 pre-existing items_after_test_module errors that exist on main as well — untouched here, related to the test-layout note in #16.

https://claude.ai/code/session_017UcuvzMKHVfyBCcq8ipAko


Generated by Claude Code

bring_up cleaned up after failures with hand-rolled blocks repeated at
each failure site, and the sites had drifted: spawn_services failures
rolled back nothing, a failed compose group left earlier groups'
containers running while deleting the overlay files teardown needs,
pre_spawn failures rolled back in container mode but not hybrid/host,
and post_up failures on a resumed session deleted the session's data
volumes.

Each provisioning step now registers its inverse with a Rollback guard
immediately after it succeeds; any early return unwinds the stack in
reverse order (stop containers before deleting their overlay, kill
spawned services, remove a freshly created worktree). disarm() keeps
everything once the session is fully up. Rollback only removes volumes
for sessions it created (not on resume), and reused worktrees are never
deleted by rollback.

Fixes #2

https://claude.ai/code/session_017UcuvzMKHVfyBCcq8ipAko
@hefgi hefgi merged commit ba558da into main Jun 11, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"Rollback on failure" is inconsistently enforced — several bring_up failure paths orphan containers, worktrees, or delete data volumes

2 participants