Skip to content

fix: harden container shutdown cleanup#408

Open
cristibleotiu wants to merge 5 commits into
developfrom
fix/shutdown-cleanup-hardening
Open

fix: harden container shutdown cleanup#408
cristibleotiu wants to merge 5 commits into
developfrom
fix/shutdown-cleanup-hardening

Conversation

@cristibleotiu
Copy link
Copy Markdown
Contributor

@cristibleotiu cristibleotiu commented May 14, 2026

What changed: abort container restarts when runtime cleanup fails; preserve process, thread, and fixed-volume handles for retry; stop Docker log readers after container stop to avoid false cleanup failures.

Why: prevent stuck or duplicate runtimes during restart and shutdown paths.

PR summary:

  • Made container shutdown failures retryable instead of getting stuck forever after one failed cleanup.
  • Made manual STOP safer: it only persists “paused” after cleanup actually succeeds.
  • Added explicit RESTART behavior that cancels a pending failed manual stop.
  • Prevented config changes from accidentally overriding a pending manual stop.
  • Hardened Docker cleanup so failed stop/remove keeps the container handle for retry.
  • Hardened fixed-volume cleanup so mounted loop devices are not leaked when metadata is missing, malformed, or stale.
  • Added compatibility fallback code so edge can still clean tunnel subprocess trees even if deployed before the
    matching core change.
  • Increased the plugin stop timeout for container apps because Docker, tunnels, log readers, and fixed volumes can
    legitimately need longer cleanup time.
  • Resolved the develop conflict by preserving the newer volume-sync lifecycle behavior and integrating it with the
    shutdown hardening.

What changed: abort container restarts when runtime cleanup fails; preserve process, thread, and fixed-volume handles for retry; stop Docker log readers after container stop to avoid false cleanup failures.

Why: prevent stuck or duplicate runtimes during restart and shutdown paths.
What changed:
- keep failed container cleanup candidates retryable instead of dropping handles
- make manual STOP/RESTART/config handling preserve cleanup state safely
- restore sync support files from develop and add lifecycle/fixed-volume/sync regression coverage

Why:
- avoid leaked container subprocesses and preserve existing sync behavior while resolving the PR branch against develop
What changed:
- merge origin/develop into the shutdown cleanup branch
- keep hardened runtime cleanup/restart behavior while accepting develop sync updates
- make the lifecycle timeout assertion robust to deadline-based joins

Why:
- clear the PR merge conflict against develop without weakening cleanup safeguards
What changed:
- log extra tunnel cleanup success only when every tunnel stopped
- add coverage for failed extra tunnel cleanup logging

Why:
- avoid misleading success logs in hardened cleanup paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants