Skip to content

fix(orders): synchronize m.stderr writes from parallel dispatch goroutines#1776

Open
eric-jones wants to merge 1 commit intogastownhall:mainfrom
eric-jones:fix/order-dispatch-stderr-race
Open

fix(orders): synchronize m.stderr writes from parallel dispatch goroutines#1776
eric-jones wants to merge 1 commit intogastownhall:mainfrom
eric-jones:fix/order-dispatch-stderr-race

Conversation

@eric-jones
Copy link
Copy Markdown
Contributor

@eric-jones eric-jones commented May 7, 2026

Summary

memoryOrderDispatcher.dispatch spawns one dispatchOne goroutine per due order. Each goroutine may call logDispatchError(m.stderr, …) if its order errors out — but m.stderr is unsynchronized. go test -race flags concurrent writes on bytes.Buffer.Write (test fakes); production writes to os.Stderr are also undefined per Go docs (atomic per-write(2) for small payloads, unspecified inter-call ordering).

Wrap stderr in a small mutex-guarded writer at construction time so the call sites stay unchanged. Both construction paths get the wrap:

  • buildOrderDispatcher (production)
  • buildOrderDispatcherFromListExec (test helper that constructs memoryOrderDispatcher directly)

lockedStderr returns nil for nil input so logDispatchError's existing nil-guard keeps its semantics.

Testing

  • make check — 5 pre-existing failures, all documented EXPECT-FAIL on macOS, none introduced by this PR (verified by stashing the change and re-running the same tests in isolation):
  • go test -race -count=3 -run '^TestCityRuntimeManualReloadPanicAfterReloadKeepsReloadReplyAndClears$' ./cmd/gc/ — race gone (was flagged by the race detector before this change)
  • go test -race -count=1 -run '^TestLockedWriter|^TestLockedStderr' ./cmd/gc/ — green
  • go test -race -count=1 -run 'OrderDispatch|CityRuntimeShutdown|TestSweep|CityRuntimeManualReload|TestLockedWriter|TestLockedStderr' -skip '^TestOrderDispatchRejectsAmbiguousPackPool$' ./cmd/gc/ — green
  • make check-docs — no docs changed, skipped
  • make test-integration — not run; past attempts have hit the 30-min timeout with no saved partial output

Note: separate pre-existing race observed

TestOrderDispatchRejectsAmbiguousPackPool trips a different race under -race: the test fake memRecorder is read from the test goroutine while dispatchOne is still writing to it. Different surface (test fake field, not m.stderr), pre-existing on origin/main, out of scope for this PR. Skipped via -skip in the broader race check above.

Checklist

  • Linked an issue, or explained why one is not needed — internal stg-fxm; this is the externally-tracked half
  • Added or updated tests for behavior changes
  • Updated docs for user-facing changes — none, internal synchronization
  • Called out breaking changes or migration notes — none

View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

@eric-jones eric-jones requested a review from julianknutsen as a code owner May 7, 2026 01:52
@github-actions github-actions Bot added the status/needs-triage Inbox — we haven't looked at it yet label May 7, 2026
@randy-release-manager randy-release-manager Bot added kind/bug Broken behavior priority/p2 Medium — real problem, workaround exists and removed status/needs-triage Inbox — we haven't looked at it yet labels May 7, 2026
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…tines

memoryOrderDispatcher.dispatch spawns one dispatchOne goroutine per due
order; each may call logDispatchError(m.stderr, ...) on the shared
writer when its order errors out. Without synchronization, concurrent
log lines race on bytes.Buffer.Write (test fakes) or interleave bytes
on os.Stderr (production), and `go test -race` flags every concurrent
write.

Wrap stderr in a mutex-guarded writer at construction time. Production
goes through buildOrderDispatcher; the test helper
buildOrderDispatcherFromListExec also constructs memoryOrderDispatcher
directly, so it wraps too. lockedStderr preserves nil-input semantics
so logDispatchError's nil-guard keeps working.

Adds direct unit tests for the lockedWriter (16 goroutines x 100 writes
under -race; asserts no torn lines and no lost bytes) and for the
nil/non-nil construction paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eric-jones eric-jones force-pushed the fix/order-dispatch-stderr-race branch from cb92e8a to 0822061 Compare May 8, 2026 01:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Broken behavior priority/p2 Medium — real problem, workaround exists

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants