Skip to content

#1602 Fix thread-safety of _llmInvocations and _history in AbstractAgentProcess#1620

Merged
igordayen merged 1 commit intoembabel:mainfrom
azanux:fix/thread-safe-ll-invocations-concurent
Apr 20, 2026
Merged

#1602 Fix thread-safety of _llmInvocations and _history in AbstractAgentProcess#1620
igordayen merged 1 commit intoembabel:mainfrom
azanux:fix/thread-safe-ll-invocations-concurent

Conversation

@azanux
Copy link
Copy Markdown
Collaborator

@azanux azanux commented Apr 20, 2026

Problem

#1602 , With process-type=CONCURRENT, LLM calls run in parallel coroutines. Each call ends in AbstractAgentProcess._llmInvocations.add(...) via ToolLoopLlmOperations.recordUsage.

Both _llmInvocations and _history were backed by ArrayList (via mutableListOf()), which is not thread-safe.

Concurrent add silently loses entries → cost(), usage(), modelsUsed() under-report.
Measured: ~40% loss on 16 threads × 2,000 adds.. with AbstractAgentProcessThreadSafetyTest


Fix

Backed both fields with CopyOnWriteArrayList:


Test Plan

  • New AbstractAgentProcessThreadSafetyTest — fails on ArrayList, passes on CopyOnWriteArrayList
  • 66 existing tests green (Simple / Concurrent / CostAggregation / LlmInvocationHistory)

Follow-up

A distinct race condition on _terminationRequest (read-modify-write in DefaultToolLoop.checkForActionTerminationSignal) was discovered during this investigation.

Not fixed here to keep the PR focused. Two @Disabled tests (AbstractAgentProcessTerminationRaceTest) reproduce it deterministically - a tracking issue will be opened.

@azanux azanux requested review from alexheifetz and igordayen April 20, 2026 11:20
…tractAgentProcess

Signed-off-by: Charles <azanux@gmail.com>
@azanux azanux force-pushed the fix/thread-safe-ll-invocations-concurent branch from 85ceba6 to 730d381 Compare April 20, 2026 11:21
Copy link
Copy Markdown
Contributor

@igordayen igordayen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. good to have it somehow captured in ThreadSafetyRulesTest.kt too,

* consumer unconditionally resets → B is silently dropped.
*/
@Test
@Disabled("Outstanding race on _terminationRequest. Verified failure: expected <signal-B> but was <null>.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great diiscovery

@igordayen igordayen merged commit b4d8c80 into embabel:main Apr 20, 2026
7 checks passed
@azanux azanux deleted the fix/thread-safe-ll-invocations-concurent branch April 20, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants