fix: resolve race conditions and goroutine leaks in tea.Cmd closures#232
fix: resolve race conditions and goroutine leaks in tea.Cmd closures#232darksworm wants to merge 5 commits into
Conversation
- Capture m.state.Server at call time in all tea.Cmd closures across api_integration.go and model_init.go to eliminate data races between Bubble Tea goroutines and concurrent SetServerMsg/context switch writes - Remove m.state.Diff mutations from startDiffSession and startResourceDiffSession goroutine closures; clear Diff.Loading in the SetModeMsg handler instead (safe, single-threaded Update) - Remove dead replaceOldWatch field from watchStartedMsg and its parameter from startWatchingApplicationsWithConfig — the old stream is already cancelled correctly through cleanupAppWatcher - Remove double cleanup in context switch: cleanupAppWatcher already calls upstreamCleanup via appWatchCleanup, so the explicit m.watchCleanup() call below it was redundant - Fix goroutine leak in consumeTreeEvent: add treeStreamDone escape channel, close and re-create it in cleanupTreeWatchers, capture both channels at call time so goroutines unblock on view/context changes - Guard rollback bottom navigation against empty Rows to prevent SelectedIdx being set to -1 Tests added for all fixes; full suite passes with -race.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
WalkthroughCapture server state in command closures, add a tree-stream done signal and adjust watcher cleanup, remove a context-switch upstream watch cleanup call, move diff-loading clears into mode transitions, introduce rollback diff message, and add concurrency and edge-case tests. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
cmd/app/api_integration.go (3)
1213-1218:⚠️ Potential issue | 🔴 CriticalData race: Closure directly mutates
m.state.Difffrom goroutine.Lines 1213-1218 write directly to
m.state.Difffrom within the closure, which executes in a Bubble Tea goroutine. This creates a data race with the mainUpdategoroutine that may be reading or writingm.stateconcurrently.Other diff session functions (
startDiffSession,startResourceDiffSession) avoid this by returning messages that let the single-threadedUpdatehandler mutate state. Refactor this function to follow the same pattern: return a message containing the diff data and handle the state mutation inUpdate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/app/api_integration.go` around lines 1213 - 1218, The closure currently mutates m.state.Diff (setting model.DiffState with Title/Content/Offset/Loading) directly inside the Bubble Tea goroutine, causing a data race; refactor to instead create and return a message struct (e.g., DiffReadyMsg or reuse an existing diff message type) from the closure carrying the Title, lines (Content), Offset and Loading flag, then update m.state.Diff inside the single-threaded Update handler when it receives that message—follow the pattern used by startDiffSession/startResourceDiffSession where the goroutine emits a message and Update performs m.state mutations on receipt.
422-504:⚠️ Potential issue | 🔴 Critical
Diff.Loadingremains stuck astrueafter successful diff pager exit.The
pagerDoneMsghandler (model.go:703-725) on success directly setsm.state.Mode = model.ModeNormaland returns without clearingDiff.Loading. The clearing only happens in theSetModeMsghandler (line 325), which is never triggered on the success path. When the pager closes without error,Diff.Loadingis lefttruefor subsequent operations.The fix should have
pagerDoneMsgreturnSetModeMsg{Mode: model.ModeNormal}on success to match the error path logic, or explicitly clearDiff.Loadingbefore returning.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/app/api_integration.go` around lines 422 - 504, pagerDoneMsg currently mutates m.state.Mode directly on success and never clears Diff.Loading; change the success path in pagerDoneMsg to return model.SetModeMsg{Mode: model.ModeNormal} (instead of setting m.state.Mode = model.ModeNormal and returning) so the existing SetModeMsg handler clears Diff.Loading, matching the error path; locate pagerDoneMsg in model.go and replace the direct state mutation with a returned SetModeMsg to ensure Diff.Loading is reset.
688-700:⚠️ Potential issue | 🟠 MajorAdd
treeStreamDonesignal to goroutine instartWatchingResourceTreeto prevent goroutine leaks.The goroutine at line 688 reads from
chwithout a secondary exit mechanism. While the cleanup function should close the channel via context cancellation, the goroutine lacks a direct listener ontreeStreamDonefor consistency and robustness. This pattern is used elsewhere (e.g.,consumeTreeEvent); apply it here:Suggested fix
go func() { eventCount := 0 - for t := range ch { + for { + select { + case t, ok := <-ch: + if !ok { + cblog.With("component", "ui").Info("Tree watch channel closed", "app", app.Name, "events", eventCount) + return + } if t == nil { continue } eventCount++ cblog.With("component", "ui").Debug("Received tree event", "app", app.Name, "event", eventCount) data, _ := json.Marshal(t) m.watchTreeDeliver(model.ResourceTreeStreamMsg{AppName: app.Name, TreeJSON: data}) + case <-m.treeStreamDone: + cblog.With("component", "ui").Info("Tree watch cancelled via done signal", "app", app.Name) + return + } } - cblog.With("component", "ui").Info("Tree watch channel closed", "app", app.Name, "events", eventCount) }()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/app/api_integration.go` around lines 688 - 700, The goroutine that consumes from channel ch in startWatchingResourceTree can leak because it only ranges ch; modify it to also select on the treeStreamDone signal so it exits promptly when that done channel is closed: change the loop to a select that listens for either a receive from ch (handle nil as before, increment eventCount, json.Marshal and call m.watchTreeDeliver) or a receive from treeStreamDone (break/return to stop the goroutine), and keep the final Info log using eventCount; reference startWatchingResourceTree, ch, treeStreamDone, eventCount, and m.watchTreeDeliver when making the change.
🧹 Nitpick comments (3)
cmd/app/race_conditions_test.go (1)
20-29: Race trigger timing is nondeterministic in current form.
startedonly confirms goroutine start, not that the read ofm.state.Serveris in-flight. These tests can false-pass; consider adding a deterministic synchronization point around the read path (or repeating the pattern in a bounded loop) to harden race detection.Also applies to: 44-52
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/app/race_conditions_test.go` around lines 20 - 29, The test's synchronization only ensures the goroutine started, not that the read of m.state.Server inside cmd() is concurrently in-flight, making the race trigger nondeterministic; modify the test to add a deterministic handshake channel (e.g., a readReady/readStart channel closed or sent from inside the goroutine immediately before the line where cmd() reads m.state.Server) and have the main test wait on that signal before performing the concurrent write to m.state.Server (or alternatively repeat the start/write pattern in a bounded loop) so the write definitely races with the in-goroutine read; apply the same pattern to the other occurrence around lines 44-52.cmd/app/goroutine_leak_test.go (1)
31-33: Consider loosening the hard timeout to reduce CI flakiness.A fixed
300mscan fail under noisy runners. Using a slightly larger bound (or retry loop with deadline) will keep this leak test stable without reducing signal.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/app/goroutine_leak_test.go` around lines 31 - 33, The hard 300ms timeout in the goroutine leak assertion is too tight; update the wait logic in goroutine_leak_test.go (where the test waits for consumeTreeEvent to exit after calling cleanupTreeWatchers) to use a looser bound or retry-until-deadline pattern — e.g., replace the fixed time.After(300 * time.Millisecond) check with a larger duration (e.g., 1s) or a loop that polls for goroutine exit until a deadline to reduce CI flakiness while keeping the same failure semantics.cmd/app/model.go (1)
323-326: Diff.Loading reset is too broad on mode transitions.This unconditionally clears
m.state.Diff.Loadingfor all mode changes, including transitions into diff mode. The diff state should be preserved when enteringModeDiff, not blanked on arrival. Consider narrowing to non-diff transitions:Proposed tightening
- // Clear diff loading whenever mode changes — the closure no longer does it. - if m.state.Diff != nil { + // Clear diff loading only when moving away from diff mode. + if m.state.Diff != nil && msg.Mode != model.ModeDiff { m.state.Diff.Loading = false }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/app/model.go` around lines 323 - 326, The code unconditionally clears m.state.Diff.Loading on any mode change, which erases the diff-loading state when transitioning into ModeDiff; update the mode-transition logic so you only set m.state.Diff.Loading = false when the new mode is NOT ModeDiff (e.g., change the snippet to check the target mode before clearing: if m.state.Diff != nil && newMode != ModeDiff { m.state.Diff.Loading = false }), keeping Diff.Loading intact when entering ModeDiff.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/app/model_init.go`:
- Around line 210-213: The closure in validateAuthentication reads m.state.Mode
directly causing a race; capture the mode into a local variable at closure
construction time (similar to how epoch and server are captured) and use that
local (e.g., capturedMode) inside the closure instead of m.state.Mode to avoid
concurrent reads while Model.Update() may modify state.
In `@cmd/app/model_tree_stream.go`:
- Around line 27-36: Add a non-blocking check for the done channel before the
main select so teardown always wins: before the select that reads from ch,
insert a quick non-blocking receive on done (e.g., select { case <-done: return
nil default: }) to ensure when both ch and done are ready the done path is
taken; update the block that currently reads from ch (the select handling ch, ev
:= <-ch and case <-done) to rely on that pre-check so the consumer in this
function (the section logging "Consumed tree event" and returning ev) cannot be
re-armed after cleanup.
---
Outside diff comments:
In `@cmd/app/api_integration.go`:
- Around line 1213-1218: The closure currently mutates m.state.Diff (setting
model.DiffState with Title/Content/Offset/Loading) directly inside the Bubble
Tea goroutine, causing a data race; refactor to instead create and return a
message struct (e.g., DiffReadyMsg or reuse an existing diff message type) from
the closure carrying the Title, lines (Content), Offset and Loading flag, then
update m.state.Diff inside the single-threaded Update handler when it receives
that message—follow the pattern used by
startDiffSession/startResourceDiffSession where the goroutine emits a message
and Update performs m.state mutations on receipt.
- Around line 422-504: pagerDoneMsg currently mutates m.state.Mode directly on
success and never clears Diff.Loading; change the success path in pagerDoneMsg
to return model.SetModeMsg{Mode: model.ModeNormal} (instead of setting
m.state.Mode = model.ModeNormal and returning) so the existing SetModeMsg
handler clears Diff.Loading, matching the error path; locate pagerDoneMsg in
model.go and replace the direct state mutation with a returned SetModeMsg to
ensure Diff.Loading is reset.
- Around line 688-700: The goroutine that consumes from channel ch in
startWatchingResourceTree can leak because it only ranges ch; modify it to also
select on the treeStreamDone signal so it exits promptly when that done channel
is closed: change the loop to a select that listens for either a receive from ch
(handle nil as before, increment eventCount, json.Marshal and call
m.watchTreeDeliver) or a receive from treeStreamDone (break/return to stop the
goroutine), and keep the final Info log using eventCount; reference
startWatchingResourceTree, ch, treeStreamDone, eventCount, and
m.watchTreeDeliver when making the change.
---
Nitpick comments:
In `@cmd/app/goroutine_leak_test.go`:
- Around line 31-33: The hard 300ms timeout in the goroutine leak assertion is
too tight; update the wait logic in goroutine_leak_test.go (where the test waits
for consumeTreeEvent to exit after calling cleanupTreeWatchers) to use a looser
bound or retry-until-deadline pattern — e.g., replace the fixed time.After(300 *
time.Millisecond) check with a larger duration (e.g., 1s) or a loop that polls
for goroutine exit until a deadline to reduce CI flakiness while keeping the
same failure semantics.
In `@cmd/app/model.go`:
- Around line 323-326: The code unconditionally clears m.state.Diff.Loading on
any mode change, which erases the diff-loading state when transitioning into
ModeDiff; update the mode-transition logic so you only set m.state.Diff.Loading
= false when the new mode is NOT ModeDiff (e.g., change the snippet to check the
target mode before clearing: if m.state.Diff != nil && newMode != ModeDiff {
m.state.Diff.Loading = false }), keeping Diff.Loading intact when entering
ModeDiff.
In `@cmd/app/race_conditions_test.go`:
- Around line 20-29: The test's synchronization only ensures the goroutine
started, not that the read of m.state.Server inside cmd() is concurrently
in-flight, making the race trigger nondeterministic; modify the test to add a
deterministic handshake channel (e.g., a readReady/readStart channel closed or
sent from inside the goroutine immediately before the line where cmd() reads
m.state.Server) and have the main test wait on that signal before performing the
concurrent write to m.state.Server (or alternatively repeat the start/write
pattern in a bounded loop) so the write definitely races with the in-goroutine
read; apply the same pattern to the other occurrence around lines 44-52.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 132336a2-f76d-42f1-a1ac-ee34e69a3250
📒 Files selected for processing (11)
.gitignorecmd/app/api_integration.gocmd/app/context_switch.gocmd/app/context_switch_test.gocmd/app/goroutine_leak_test.gocmd/app/model.gocmd/app/model_init.gocmd/app/model_tree_stream.gocmd/app/model_watchers.gocmd/app/race_conditions_test.gocmd/app/rollback_namespace_test.go
💤 Files with no reviewable changes (1)
- cmd/app/context_switch.go
- startRollbackDiffSession: replace direct m.state.Diff mutation in goroutine with rollbackDiffReadyMsg; state is now set in Update (single-threaded) eliminating the data race - pagerDoneMsg success path: clear Diff.Loading explicitly since this path bypasses SetModeMsg and Diff.Loading = true was left behind - startWatchingResourceTree: add treeStreamDone select to inner goroutine so it exits promptly on cleanupTreeWatchers, matching consumeTreeEvent - goroutine_leak_test: bump timeout 300ms → 1s to reduce CI flakiness
…oritize done in consumeTreeEvent - validateAuthentication: capture m.state.Mode into currentMode at closure construction time to avoid concurrent read while Update may be writing m.state.Mode - consumeTreeEvent: add non-blocking pre-check on done before the main select so teardown always wins when both ch and done are simultaneously ready (Go select chooses pseudo-randomly without the guard)
Summary
m.state.Serverat call time in alltea.Cmdclosures acrossapi_integration.goandmodel_init.go— eliminates data races between Bubble Tea goroutines and concurrentSetServerMsg/context-switch writesm.state.Diffmutations fromstartDiffSessionandstartResourceDiffSessiongoroutine closures; clearDiff.Loadingin theSetModeMsghandler instead (safe, runs in single-threadedUpdate)treeStreamDoneescape channel, close+recreate it incleanupTreeWatchers, capture both channels at call time inconsumeTreeEvent— goroutines now unblock when leaving tree view or switching contextreplaceOldWatchfield fromwatchStartedMsgand its parameter — the old stream is already cancelled viacleanupAppWatcherm.watchCleanup()call in context switch —cleanupAppWatcheralready callsupstreamCleanupRowsto preventSelectedIdx = -1Test Plan
go test -race ./... -count=1— all 16 packages pass, zero DATA RACE linesgo test -tags e2e ./e2e/... -count=3— 3/3 runs passrace_conditions_test.go,goroutine_leak_test.gocover the race and leak fixescontext_switch_test.go,rollback_namespace_test.gocover the remaining fixesSummary by CodeRabbit
Bug Fixes
Tests
Chores