Skip to content

Fix race in test_describe_fleet_function_count_matches_actual#22

Merged
atsyplikhin merged 3 commits intomainfrom
fix/aggregate-fleet-function-count
Apr 24, 2026
Merged

Fix race in test_describe_fleet_function_count_matches_actual#22
atsyplikhin merged 3 commits intomainfrom
fix/aggregate-fleet-function-count

Conversation

@atsyplikhin
Copy link
Copy Markdown
Collaborator

@atsyplikhin atsyplikhin commented Apr 24, 2026

Summary

Root cause

The post-merge failure of test_describe_fleet_function_count_matches_actual[nats] (run 24867127751: fleet=23 vs per-device=19, diff=4) was a time-of-check/time-of-use race in the test itself.

The test took a fleet snapshot, then re-queried every device via get_device_functions(). If any device was evicted between the two queries (TTL expiry, disconnect, leaked registration from a sibling test in the same session), the re-query returned {"error": ...} and the test silently dropped that device's contribution:

for dev in fleet["devices"]:
    funcs = await asyncio.to_thread(get_device_functions, dev["device_id"])
    if "error" not in funcs:          # silent skip on eviction
        individual_total += len(funcs["functions"])

One missing device carrying 4 functions explains the exact diff.

Function entries are guaranteed a name by the @rpc decorator (base.py:379 sets func_name = name or func.__name__), and aggregate_fleet / full_device both read from the same caps["functions"] list via flatten_device. The two code paths can't diverge on a single snapshot.

Fix

Compare total_functions against the snapshot's own devices list (already normalized by full_device() under the patched SMALL_FLEET_THRESHOLD) instead of re-querying. Tests the real invariant — internal consistency of describe_fleet's return — without a race window.

(An earlier commit added defensive nameless-entry filtering inside aggregate_fleet; reverted because the scenario can't occur by construction and keeping it obscured the actual diagnosis.)

Test plan

  • Python syntax checks pass
  • Net diff: single test file, no production code changes
  • CI: test_describe_fleet_function_count_matches_actual[nats] passes post-merge, no longer flaky

🤖 Generated with Claude Code

atsyplikhin and others added 2 commits April 23, 2026 18:24
aggregate_fleet() counted raw len(functions) while get_device_functions()
normalizes via _normalize_functions(), which filters out entries with no
name. The two paths could diverge when the registry carries malformed or
nameless function entries, causing
test_describe_fleet_function_count_matches_actual to fail intermittently
on main (e.g. run 24867127751: individual=19 vs fleet=23).

Apply the same nameless-entry filter inside aggregate_fleet so both
paths agree. Add a unit test pinning the invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test took a fleet snapshot, then re-queried each device via
get_device_functions() and summed the results. If any device in the
snapshot was evicted (TTL expiry, disconnect, test-teardown from a
sibling test running in the same discovery provider) between the two
queries, get_device_functions() returned {"error": ...} and the loop
silently skipped it:

    for dev in fleet["devices"]:
        funcs = await asyncio.to_thread(get_device_functions, dev["device_id"])
        if "error" not in funcs:
            individual_total += len(funcs["functions"])

That's what caused run 24867127751 on main to see fleet=23 / per-device
sum=19. The two numbers can only diverge when a device disappears mid-
test; the aggregate_fleet / full_device code paths themselves cannot.

Fix by comparing total_functions against the snapshot's own devices
list (which was already normalized by full_device() under the patched
SMALL_FLEET_THRESHOLD). This tests the actual invariant we care about —
that describe_fleet's returned structure is internally consistent —
without opening a race window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@atsyplikhin atsyplikhin changed the title Fix aggregate_fleet total_functions to match get_device_functions Fix race in test_describe_fleet_function_count_matches_actual Apr 24, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a flaky integration test caused by a time-of-check/time-of-use race: the test previously took a fleet snapshot and then re-queried each device’s functions, allowing device eviction/TTL expiry to skew the “actual” count.

Changes:

  • Remove the per-device get_device_functions() re-query from test_describe_fleet_function_count_matches_actual.
  • Compute the per-device function total directly from the describe_fleet() snapshot (fleet["devices"]) and compare it to fleet["total_functions"].

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@atsyplikhin atsyplikhin merged commit 30c15d9 into main Apr 24, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants