Fix race in test_describe_fleet_function_count_matches_actual by atsyplikhin · Pull Request #22 · arm/device-connect

atsyplikhin · 2026-04-24T01:24:17Z

Summary

Root cause

The post-merge failure of test_describe_fleet_function_count_matches_actual[nats] (run 24867127751: fleet=23 vs per-device=19, diff=4) was a time-of-check/time-of-use race in the test itself.

The test took a fleet snapshot, then re-queried every device via get_device_functions(). If any device was evicted between the two queries (TTL expiry, disconnect, leaked registration from a sibling test in the same session), the re-query returned {"error": ...} and the test silently dropped that device's contribution:

for dev in fleet["devices"]:
    funcs = await asyncio.to_thread(get_device_functions, dev["device_id"])
    if "error" not in funcs:          # silent skip on eviction
        individual_total += len(funcs["functions"])

One missing device carrying 4 functions explains the exact diff.

Function entries are guaranteed a name by the @rpc decorator (base.py:379 sets func_name = name or func.__name__), and aggregate_fleet / full_device both read from the same caps["functions"] list via flatten_device. The two code paths can't diverge on a single snapshot.

Fix

Compare total_functions against the snapshot's own devices list (already normalized by full_device() under the patched SMALL_FLEET_THRESHOLD) instead of re-querying. Tests the real invariant — internal consistency of describe_fleet's return — without a race window.

(An earlier commit added defensive nameless-entry filtering inside aggregate_fleet; reverted because the scenario can't occur by construction and keeping it obscured the actual diagnosis.)

Test plan

Python syntax checks pass
Net diff: single test file, no production code changes
CI: test_describe_fleet_function_count_matches_actual[nats] passes post-merge, no longer flaky

🤖 Generated with Claude Code

aggregate_fleet() counted raw len(functions) while get_device_functions() normalizes via _normalize_functions(), which filters out entries with no name. The two paths could diverge when the registry carries malformed or nameless function entries, causing test_describe_fleet_function_count_matches_actual to fail intermittently on main (e.g. run 24867127751: individual=19 vs fleet=23). Apply the same nameless-entry filter inside aggregate_fleet so both paths agree. Add a unit test pinning the invariant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The test took a fleet snapshot, then re-queried each device via get_device_functions() and summed the results. If any device in the snapshot was evicted (TTL expiry, disconnect, test-teardown from a sibling test running in the same discovery provider) between the two queries, get_device_functions() returned {"error": ...} and the loop silently skipped it: for dev in fleet["devices"]: funcs = await asyncio.to_thread(get_device_functions, dev["device_id"]) if "error" not in funcs: individual_total += len(funcs["functions"]) That's what caused run 24867127751 on main to see fleet=23 / per-device sum=19. The two numbers can only diverge when a device disappears mid- test; the aggregate_fleet / full_device code paths themselves cannot. Fix by comparing total_functions against the snapshot's own devices list (which was already normalized by full_device() under the patched SMALL_FLEET_THRESHOLD). This tests the actual invariant we care about — that describe_fleet's returned structure is internally consistent — without opening a race window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ions" This reverts commit 8034cc4.

Copilot

Pull request overview

This PR fixes a flaky integration test caused by a time-of-check/time-of-use race: the test previously took a fleet snapshot and then re-queried each device’s functions, allowing device eviction/TTL expiry to skew the “actual” count.

Changes:

Remove the per-device get_device_functions() re-query from test_describe_fleet_function_count_matches_actual.
Compute the per-device function total directly from the describe_fleet() snapshot (fleet["devices"]) and compare it to fleet["total_functions"].

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

atsyplikhin and others added 2 commits April 23, 2026 18:24

atsyplikhin changed the title ~~Fix aggregate_fleet total_functions to match get_device_functions~~ Fix race in test_describe_fleet_function_count_matches_actual Apr 24, 2026

Revert "Fix aggregate_fleet total_functions to match get_device_funct…

d64d774

…ions" This reverts commit 8034cc4.

atsyplikhin requested a review from Copilot April 24, 2026 01:39

Copilot started reviewing on behalf of atsyplikhin April 24, 2026 01:40 View session

Copilot AI reviewed Apr 24, 2026

View reviewed changes

atsyplikhin merged commit 30c15d9 into main Apr 24, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race in test_describe_fleet_function_count_matches_actual#22

Fix race in test_describe_fleet_function_count_matches_actual#22
atsyplikhin merged 3 commits intomainfrom
fix/aggregate-fleet-function-count

atsyplikhin commented Apr 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

atsyplikhin commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

atsyplikhin commented Apr 24, 2026 •

edited

Loading