Fix race in test_describe_fleet_function_count_matches_actual#22
Merged
atsyplikhin merged 3 commits intomainfrom Apr 24, 2026
Merged
Fix race in test_describe_fleet_function_count_matches_actual#22atsyplikhin merged 3 commits intomainfrom
atsyplikhin merged 3 commits intomainfrom
Conversation
aggregate_fleet() counted raw len(functions) while get_device_functions() normalizes via _normalize_functions(), which filters out entries with no name. The two paths could diverge when the registry carries malformed or nameless function entries, causing test_describe_fleet_function_count_matches_actual to fail intermittently on main (e.g. run 24867127751: individual=19 vs fleet=23). Apply the same nameless-entry filter inside aggregate_fleet so both paths agree. Add a unit test pinning the invariant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test took a fleet snapshot, then re-queried each device via
get_device_functions() and summed the results. If any device in the
snapshot was evicted (TTL expiry, disconnect, test-teardown from a
sibling test running in the same discovery provider) between the two
queries, get_device_functions() returned {"error": ...} and the loop
silently skipped it:
for dev in fleet["devices"]:
funcs = await asyncio.to_thread(get_device_functions, dev["device_id"])
if "error" not in funcs:
individual_total += len(funcs["functions"])
That's what caused run 24867127751 on main to see fleet=23 / per-device
sum=19. The two numbers can only diverge when a device disappears mid-
test; the aggregate_fleet / full_device code paths themselves cannot.
Fix by comparing total_functions against the snapshot's own devices
list (which was already normalized by full_device() under the patched
SMALL_FLEET_THRESHOLD). This tests the actual invariant we care about —
that describe_fleet's returned structure is internally consistent —
without opening a race window.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ions" This reverts commit 8034cc4.
There was a problem hiding this comment.
Pull request overview
This PR fixes a flaky integration test caused by a time-of-check/time-of-use race: the test previously took a fleet snapshot and then re-queried each device’s functions, allowing device eviction/TTL expiry to skew the “actual” count.
Changes:
- Remove the per-device
get_device_functions()re-query fromtest_describe_fleet_function_count_matches_actual. - Compute the per-device function total directly from the
describe_fleet()snapshot (fleet["devices"]) and compare it tofleet["total_functions"].
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Root cause
The post-merge failure of
test_describe_fleet_function_count_matches_actual[nats](run 24867127751:fleet=23vsper-device=19, diff=4) was a time-of-check/time-of-use race in the test itself.The test took a fleet snapshot, then re-queried every device via
get_device_functions(). If any device was evicted between the two queries (TTL expiry, disconnect, leaked registration from a sibling test in the same session), the re-query returned{"error": ...}and the test silently dropped that device's contribution:One missing device carrying 4 functions explains the exact diff.
Function entries are guaranteed a name by the
@rpcdecorator (base.py:379setsfunc_name = name or func.__name__), andaggregate_fleet/full_deviceboth read from the samecaps["functions"]list viaflatten_device. The two code paths can't diverge on a single snapshot.Fix
Compare
total_functionsagainst the snapshot's owndeviceslist (already normalized byfull_device()under the patchedSMALL_FLEET_THRESHOLD) instead of re-querying. Tests the real invariant — internal consistency ofdescribe_fleet's return — without a race window.(An earlier commit added defensive nameless-entry filtering inside
aggregate_fleet; reverted because the scenario can't occur by construction and keeping it obscured the actual diagnosis.)Test plan
test_describe_fleet_function_count_matches_actual[nats]passes post-merge, no longer flaky🤖 Generated with Claude Code