Skip to content

fix: resolve race conditions and goroutine leaks in tea.Cmd closures#232

Open
darksworm wants to merge 5 commits into
mainfrom
fix/race-conditions-and-goroutine-leaks
Open

fix: resolve race conditions and goroutine leaks in tea.Cmd closures#232
darksworm wants to merge 5 commits into
mainfrom
fix/race-conditions-and-goroutine-leaks

Conversation

@darksworm
Copy link
Copy Markdown
Owner

@darksworm darksworm commented Mar 29, 2026

Summary

  • Fix races (HIGH): Capture m.state.Server at call time in all tea.Cmd closures across api_integration.go and model_init.go — eliminates data races between Bubble Tea goroutines and concurrent SetServerMsg/context-switch writes
  • Fix races (HIGH): Remove m.state.Diff mutations from startDiffSession and startResourceDiffSession goroutine closures; clear Diff.Loading in the SetModeMsg handler instead (safe, runs in single-threaded Update)
  • Fix goroutine leak (MED): Add treeStreamDone escape channel, close+recreate it in cleanupTreeWatchers, capture both channels at call time in consumeTreeEvent — goroutines now unblock when leaving tree view or switching context
  • Remove dead code (MED): Drop replaceOldWatch field from watchStartedMsg and its parameter — the old stream is already cancelled via cleanupAppWatcher
  • Fix double cleanup (LOW): Remove redundant m.watchCleanup() call in context switch — cleanupAppWatcher already calls upstreamCleanup
  • Fix index out of range (LOW): Guard rollback "bottom" navigation against empty Rows to prevent SelectedIdx = -1

Test Plan

  • go test -race ./... -count=1 — all 16 packages pass, zero DATA RACE lines
  • go test -tags e2e ./e2e/... -count=3 — 3/3 runs pass
  • New test files: race_conditions_test.go, goroutine_leak_test.go cover the race and leak fixes
  • New/updated tests in context_switch_test.go, rollback_namespace_test.go cover the remaining fixes

Summary by CodeRabbit

  • Bug Fixes

    • Prevented duplicate cleanup during context switches.
    • Fixed rollback navigation to avoid negative index when no rows exist.
    • Improved diff loading/state and resource-stream shutdown to avoid hung operations.
  • Tests

    • Added tests for race conditions, goroutine termination/leaks, rollback navigation, and context-switch cleanup.
  • Chores

    • Updated .gitignore to ignore worktree artifacts.

- Capture m.state.Server at call time in all tea.Cmd closures across
  api_integration.go and model_init.go to eliminate data races between
  Bubble Tea goroutines and concurrent SetServerMsg/context switch writes
- Remove m.state.Diff mutations from startDiffSession and
  startResourceDiffSession goroutine closures; clear Diff.Loading in
  the SetModeMsg handler instead (safe, single-threaded Update)
- Remove dead replaceOldWatch field from watchStartedMsg and its
  parameter from startWatchingApplicationsWithConfig — the old stream
  is already cancelled correctly through cleanupAppWatcher
- Remove double cleanup in context switch: cleanupAppWatcher already
  calls upstreamCleanup via appWatchCleanup, so the explicit
  m.watchCleanup() call below it was redundant
- Fix goroutine leak in consumeTreeEvent: add treeStreamDone escape
  channel, close and re-create it in cleanupTreeWatchers, capture both
  channels at call time so goroutines unblock on view/context changes
- Guard rollback bottom navigation against empty Rows to prevent
  SelectedIdx being set to -1

Tests added for all fixes; full suite passes with -race.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 731d3a86-57b5-43c2-a55a-ede138cdd149

📥 Commits

Reviewing files that changed from the base of the PR and between 9276800 and b0edb4b.

📒 Files selected for processing (2)
  • cmd/app/model_init.go
  • cmd/app/model_tree_stream.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • cmd/app/model_init.go
  • cmd/app/model_tree_stream.go

Walkthrough

Capture server state in command closures, add a tree-stream done signal and adjust watcher cleanup, remove a context-switch upstream watch cleanup call, move diff-loading clears into mode transitions, introduce rollback diff message, and add concurrency and edge-case tests.

Changes

Cohort / File(s) Summary
Git configuration
/.gitignore
Added .worktrees/ to ignore Git worktree artifacts.
Command closures & server capture
cmd/app/api_integration.go, cmd/app/model_init.go
Snapshot m.state.Server (and some mode/epoch values) into local variables when constructing tea.Cmd closures; use captured server for API client creation and calls.
Watch/watchers & tree stream lifecycle
cmd/app/api_integration.go, cmd/app/model_tree_stream.go, cmd/app/model_watchers.go, cmd/app/goroutine_leak_test.go
Introduce treeStreamDone chan struct{}; consumeTreeEvent now selects between stream events and done; cleanupTreeWatchers closes/recreates done channel; adjust resource-tree watch goroutine to use select over event/done and terminate cleanly; add tests ensuring goroutine termination and done-channel behavior.
Context switch cleanup
cmd/app/context_switch.go, cmd/app/context_switch_test.go
Removed the conditional m.watchCleanup() call from context-switch teardown; added test to assert upstream cleanup invoked exactly once.
Diff / rollback flow changes
cmd/app/api_integration.go, cmd/app/model.go
Refactor diff session commands to capture server/epoch up-front; stop mutating m.state.Diff.Loading from async closures; add rollbackDiffReadyMsg to carry computed rollback diffs and transition to ModeDiff via message.
Delete flow adjustments
cmd/app/api_integration.go
deleteApplicationHelper now accepts an explicit server *model.Server; call sites pass captured server.
Rollback navigation edge case
cmd/app/model.go, cmd/app/rollback_namespace_test.go
Guard bottom-navigation against empty rollback rows to avoid negative index; added test to verify behavior.
Race-condition tests
cmd/app/race_conditions_test.go
Added tests that intentionally exercise concurrent reads/writes of m.state.Server from goroutines to surface races when closures read state lazily.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Tests ⚠️ Warning Race condition tests lack assertions to verify fixes; missing tests for rollbackDiffReadyMsg and SetModeMsg handler behavior. Add assertions verifying captured server usage, test rollbackDiffReadyMsg and SetModeMsg handlers, and verify refactored Diff.Loading cleanup.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main objective of the PR: fixing race conditions and goroutine leaks in tea.Cmd closures through data capture and lifecycle management improvements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/race-conditions-and-goroutine-leaks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
cmd/app/api_integration.go (3)

1213-1218: ⚠️ Potential issue | 🔴 Critical

Data race: Closure directly mutates m.state.Diff from goroutine.

Lines 1213-1218 write directly to m.state.Diff from within the closure, which executes in a Bubble Tea goroutine. This creates a data race with the main Update goroutine that may be reading or writing m.state concurrently.

Other diff session functions (startDiffSession, startResourceDiffSession) avoid this by returning messages that let the single-threaded Update handler mutate state. Refactor this function to follow the same pattern: return a message containing the diff data and handle the state mutation in Update.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/app/api_integration.go` around lines 1213 - 1218, The closure currently
mutates m.state.Diff (setting model.DiffState with Title/Content/Offset/Loading)
directly inside the Bubble Tea goroutine, causing a data race; refactor to
instead create and return a message struct (e.g., DiffReadyMsg or reuse an
existing diff message type) from the closure carrying the Title, lines
(Content), Offset and Loading flag, then update m.state.Diff inside the
single-threaded Update handler when it receives that message—follow the pattern
used by startDiffSession/startResourceDiffSession where the goroutine emits a
message and Update performs m.state mutations on receipt.

422-504: ⚠️ Potential issue | 🔴 Critical

Diff.Loading remains stuck as true after successful diff pager exit.

The pagerDoneMsg handler (model.go:703-725) on success directly sets m.state.Mode = model.ModeNormal and returns without clearing Diff.Loading. The clearing only happens in the SetModeMsg handler (line 325), which is never triggered on the success path. When the pager closes without error, Diff.Loading is left true for subsequent operations.

The fix should have pagerDoneMsg return SetModeMsg{Mode: model.ModeNormal} on success to match the error path logic, or explicitly clear Diff.Loading before returning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/app/api_integration.go` around lines 422 - 504, pagerDoneMsg currently
mutates m.state.Mode directly on success and never clears Diff.Loading; change
the success path in pagerDoneMsg to return model.SetModeMsg{Mode:
model.ModeNormal} (instead of setting m.state.Mode = model.ModeNormal and
returning) so the existing SetModeMsg handler clears Diff.Loading, matching the
error path; locate pagerDoneMsg in model.go and replace the direct state
mutation with a returned SetModeMsg to ensure Diff.Loading is reset.

688-700: ⚠️ Potential issue | 🟠 Major

Add treeStreamDone signal to goroutine in startWatchingResourceTree to prevent goroutine leaks.

The goroutine at line 688 reads from ch without a secondary exit mechanism. While the cleanup function should close the channel via context cancellation, the goroutine lacks a direct listener on treeStreamDone for consistency and robustness. This pattern is used elsewhere (e.g., consumeTreeEvent); apply it here:

Suggested fix
 go func() {
   eventCount := 0
-  for t := range ch {
+  for {
+    select {
+    case t, ok := <-ch:
+      if !ok {
+        cblog.With("component", "ui").Info("Tree watch channel closed", "app", app.Name, "events", eventCount)
+        return
+      }
       if t == nil {
         continue
       }
       eventCount++
       cblog.With("component", "ui").Debug("Received tree event", "app", app.Name, "event", eventCount)
       data, _ := json.Marshal(t)
       m.watchTreeDeliver(model.ResourceTreeStreamMsg{AppName: app.Name, TreeJSON: data})
+    case <-m.treeStreamDone:
+      cblog.With("component", "ui").Info("Tree watch cancelled via done signal", "app", app.Name)
+      return
+    }
   }
-  cblog.With("component", "ui").Info("Tree watch channel closed", "app", app.Name, "events", eventCount)
 }()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/app/api_integration.go` around lines 688 - 700, The goroutine that
consumes from channel ch in startWatchingResourceTree can leak because it only
ranges ch; modify it to also select on the treeStreamDone signal so it exits
promptly when that done channel is closed: change the loop to a select that
listens for either a receive from ch (handle nil as before, increment
eventCount, json.Marshal and call m.watchTreeDeliver) or a receive from
treeStreamDone (break/return to stop the goroutine), and keep the final Info log
using eventCount; reference startWatchingResourceTree, ch, treeStreamDone,
eventCount, and m.watchTreeDeliver when making the change.
🧹 Nitpick comments (3)
cmd/app/race_conditions_test.go (1)

20-29: Race trigger timing is nondeterministic in current form.

started only confirms goroutine start, not that the read of m.state.Server is in-flight. These tests can false-pass; consider adding a deterministic synchronization point around the read path (or repeating the pattern in a bounded loop) to harden race detection.

Also applies to: 44-52

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/app/race_conditions_test.go` around lines 20 - 29, The test's
synchronization only ensures the goroutine started, not that the read of
m.state.Server inside cmd() is concurrently in-flight, making the race trigger
nondeterministic; modify the test to add a deterministic handshake channel
(e.g., a readReady/readStart channel closed or sent from inside the goroutine
immediately before the line where cmd() reads m.state.Server) and have the main
test wait on that signal before performing the concurrent write to
m.state.Server (or alternatively repeat the start/write pattern in a bounded
loop) so the write definitely races with the in-goroutine read; apply the same
pattern to the other occurrence around lines 44-52.
cmd/app/goroutine_leak_test.go (1)

31-33: Consider loosening the hard timeout to reduce CI flakiness.

A fixed 300ms can fail under noisy runners. Using a slightly larger bound (or retry loop with deadline) will keep this leak test stable without reducing signal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/app/goroutine_leak_test.go` around lines 31 - 33, The hard 300ms timeout
in the goroutine leak assertion is too tight; update the wait logic in
goroutine_leak_test.go (where the test waits for consumeTreeEvent to exit after
calling cleanupTreeWatchers) to use a looser bound or retry-until-deadline
pattern — e.g., replace the fixed time.After(300 * time.Millisecond) check with
a larger duration (e.g., 1s) or a loop that polls for goroutine exit until a
deadline to reduce CI flakiness while keeping the same failure semantics.
cmd/app/model.go (1)

323-326: Diff.Loading reset is too broad on mode transitions.

This unconditionally clears m.state.Diff.Loading for all mode changes, including transitions into diff mode. The diff state should be preserved when entering ModeDiff, not blanked on arrival. Consider narrowing to non-diff transitions:

Proposed tightening
-		// Clear diff loading whenever mode changes — the closure no longer does it.
-		if m.state.Diff != nil {
+		// Clear diff loading only when moving away from diff mode.
+		if m.state.Diff != nil && msg.Mode != model.ModeDiff {
 			m.state.Diff.Loading = false
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/app/model.go` around lines 323 - 326, The code unconditionally clears
m.state.Diff.Loading on any mode change, which erases the diff-loading state
when transitioning into ModeDiff; update the mode-transition logic so you only
set m.state.Diff.Loading = false when the new mode is NOT ModeDiff (e.g., change
the snippet to check the target mode before clearing: if m.state.Diff != nil &&
newMode != ModeDiff { m.state.Diff.Loading = false }), keeping Diff.Loading
intact when entering ModeDiff.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/app/model_init.go`:
- Around line 210-213: The closure in validateAuthentication reads m.state.Mode
directly causing a race; capture the mode into a local variable at closure
construction time (similar to how epoch and server are captured) and use that
local (e.g., capturedMode) inside the closure instead of m.state.Mode to avoid
concurrent reads while Model.Update() may modify state.

In `@cmd/app/model_tree_stream.go`:
- Around line 27-36: Add a non-blocking check for the done channel before the
main select so teardown always wins: before the select that reads from ch,
insert a quick non-blocking receive on done (e.g., select { case <-done: return
nil default: }) to ensure when both ch and done are ready the done path is
taken; update the block that currently reads from ch (the select handling ch, ev
:= <-ch and case <-done) to rely on that pre-check so the consumer in this
function (the section logging "Consumed tree event" and returning ev) cannot be
re-armed after cleanup.

---

Outside diff comments:
In `@cmd/app/api_integration.go`:
- Around line 1213-1218: The closure currently mutates m.state.Diff (setting
model.DiffState with Title/Content/Offset/Loading) directly inside the Bubble
Tea goroutine, causing a data race; refactor to instead create and return a
message struct (e.g., DiffReadyMsg or reuse an existing diff message type) from
the closure carrying the Title, lines (Content), Offset and Loading flag, then
update m.state.Diff inside the single-threaded Update handler when it receives
that message—follow the pattern used by
startDiffSession/startResourceDiffSession where the goroutine emits a message
and Update performs m.state mutations on receipt.
- Around line 422-504: pagerDoneMsg currently mutates m.state.Mode directly on
success and never clears Diff.Loading; change the success path in pagerDoneMsg
to return model.SetModeMsg{Mode: model.ModeNormal} (instead of setting
m.state.Mode = model.ModeNormal and returning) so the existing SetModeMsg
handler clears Diff.Loading, matching the error path; locate pagerDoneMsg in
model.go and replace the direct state mutation with a returned SetModeMsg to
ensure Diff.Loading is reset.
- Around line 688-700: The goroutine that consumes from channel ch in
startWatchingResourceTree can leak because it only ranges ch; modify it to also
select on the treeStreamDone signal so it exits promptly when that done channel
is closed: change the loop to a select that listens for either a receive from ch
(handle nil as before, increment eventCount, json.Marshal and call
m.watchTreeDeliver) or a receive from treeStreamDone (break/return to stop the
goroutine), and keep the final Info log using eventCount; reference
startWatchingResourceTree, ch, treeStreamDone, eventCount, and
m.watchTreeDeliver when making the change.

---

Nitpick comments:
In `@cmd/app/goroutine_leak_test.go`:
- Around line 31-33: The hard 300ms timeout in the goroutine leak assertion is
too tight; update the wait logic in goroutine_leak_test.go (where the test waits
for consumeTreeEvent to exit after calling cleanupTreeWatchers) to use a looser
bound or retry-until-deadline pattern — e.g., replace the fixed time.After(300 *
time.Millisecond) check with a larger duration (e.g., 1s) or a loop that polls
for goroutine exit until a deadline to reduce CI flakiness while keeping the
same failure semantics.

In `@cmd/app/model.go`:
- Around line 323-326: The code unconditionally clears m.state.Diff.Loading on
any mode change, which erases the diff-loading state when transitioning into
ModeDiff; update the mode-transition logic so you only set m.state.Diff.Loading
= false when the new mode is NOT ModeDiff (e.g., change the snippet to check the
target mode before clearing: if m.state.Diff != nil && newMode != ModeDiff {
m.state.Diff.Loading = false }), keeping Diff.Loading intact when entering
ModeDiff.

In `@cmd/app/race_conditions_test.go`:
- Around line 20-29: The test's synchronization only ensures the goroutine
started, not that the read of m.state.Server inside cmd() is concurrently
in-flight, making the race trigger nondeterministic; modify the test to add a
deterministic handshake channel (e.g., a readReady/readStart channel closed or
sent from inside the goroutine immediately before the line where cmd() reads
m.state.Server) and have the main test wait on that signal before performing the
concurrent write to m.state.Server (or alternatively repeat the start/write
pattern in a bounded loop) so the write definitely races with the in-goroutine
read; apply the same pattern to the other occurrence around lines 44-52.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 132336a2-f76d-42f1-a1ac-ee34e69a3250

📥 Commits

Reviewing files that changed from the base of the PR and between 9c3f332 and 671dd6f.

📒 Files selected for processing (11)
  • .gitignore
  • cmd/app/api_integration.go
  • cmd/app/context_switch.go
  • cmd/app/context_switch_test.go
  • cmd/app/goroutine_leak_test.go
  • cmd/app/model.go
  • cmd/app/model_init.go
  • cmd/app/model_tree_stream.go
  • cmd/app/model_watchers.go
  • cmd/app/race_conditions_test.go
  • cmd/app/rollback_namespace_test.go
💤 Files with no reviewable changes (1)
  • cmd/app/context_switch.go

Comment thread cmd/app/model_init.go
Comment thread cmd/app/model_tree_stream.go
- startRollbackDiffSession: replace direct m.state.Diff mutation in
  goroutine with rollbackDiffReadyMsg; state is now set in Update
  (single-threaded) eliminating the data race
- pagerDoneMsg success path: clear Diff.Loading explicitly since this
  path bypasses SetModeMsg and Diff.Loading = true was left behind
- startWatchingResourceTree: add treeStreamDone select to inner goroutine
  so it exits promptly on cleanupTreeWatchers, matching consumeTreeEvent
- goroutine_leak_test: bump timeout 300ms → 1s to reduce CI flakiness
…oritize done in consumeTreeEvent

- validateAuthentication: capture m.state.Mode into currentMode at
  closure construction time to avoid concurrent read while Update
  may be writing m.state.Mode
- consumeTreeEvent: add non-blocking pre-check on done before the main
  select so teardown always wins when both ch and done are simultaneously
  ready (Go select chooses pseudo-randomly without the guard)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant