Skip to content

VOK-139 Relay task claims can strand deliveries after route ownership changes#148

Open
vkforeman[bot] wants to merge 1 commit into
mainfrom
lb/foreman/019dc1c9-04ba-7263-954e-b81e56094cc9
Open

VOK-139 Relay task claims can strand deliveries after route ownership changes#148
vkforeman[bot] wants to merge 1 commit into
mainfrom
lb/foreman/019dc1c9-04ba-7263-954e-b81e56094cc9

Conversation

@vkforeman
Copy link
Copy Markdown

@vkforeman vkforeman Bot commented Apr 24, 2026

Request

Prompt context:

<title>Relay task claims can strand deliveries after route ownership changes</title> ## Problem

Commit 34c68f8595fba84407529542ade3a0d97b6d221a (Prevent duplicate proxy task pulls, April 24, 2026) introduces durable relay_task_claims ownership for a (owner_user_id, provider, claim_key) tuple, but the route-claim replacement path never clears or reassigns that ownership when a node stops claiming the route.

As a result, a task identity can stay pinned to a node that no longer owns the route. New deliveries for the replacement node are inserted, but they are filtered out during leasing because relay_task_claim_allows_node still points at the old node id.

Concrete repo evidence

  • /Users/lemi/code/foreman/foreman-proxy/src/db/relay.rs
    • Lines 371-418 insert a relay_task_claims row on first lease and then permanently gate future leases on the stored claimed_node_id matching the current node.
    • If the stored claimed_node_id differs, lines 406-407 return false, which skips leasing entirely for the new node.
  • /Users/lemi/code/foreman/foreman-proxy/src/db/nodes.rs
    • Lines 359-362 delete all existing node_route_claims for a node before writing the replacement set.
    • That path does not touch relay_task_claims, so historical task ownership survives even when the node drops the route.
  • /Users/lemi/code/foreman/foreman-proxy/src/api/tests.rs
    • Lines 520-760 cover the new sticky-ownership behavior across two nodes and prove that a claim_key stays attached to the first claimant for subsequent deliveries.
    • There is no companion test for the route-reassignment case where the original claimant removes its route claim and another node becomes the sole current claimant.

Why this is a bug

The new claim table is intended to suppress duplicate task pulls, but without any cleanup or reassignment hook it can outlive the routing truth in node_route_claims. Once that happens, deliveries for the surviving route owner remain pending forever because no current claimant is allowed to lease them.

Minimal safe fix

  1. Clear or reassign relay_task_claims entries that point at a node when replace_node_route_claims removes the corresponding route from that node.
  2. Keep the durable-ownership behavior for still-valid route owners.
  3. Add a regression test that reproduces: node A claims a route, receives the first keyed delivery, drops the route, node B claims the route, and node B can lease the next keyed delivery.

Duplicate check

Searched Linear for route claims, relay task claims, relay claim, and related Foreman bug terms; no duplicate issue was returned.


Bug

Linear issue VOK-139: Relay task claims can strand deliveries after route ownership changes
Issue URL: https://linear.app/vokality/issue/VOK-139/relay-task-claims-can-strand-deliveries-after-route-ownership-changes
Team: Vokality
State: Todo
Labels: Bug

Problem

Commit 34c68f8595fba84407529542ade3a0d97b6d221a (Prevent duplicate proxy task pulls, April 24, 2026) introduces durable relay_task_claims ownership for a (owner_user_id, provider, claim_key) tuple, but the route-claim replacement path never clears or reassigns that ownership when a node stops claiming the route.

As a result, a task identity can stay pinned to a node that no longer owns the route. New deliveries for the replacement node are inserted, but they are filtered out during leasing because relay_task_claim_allows_node still points at the old node id.

Concrete repo evidence

  • /Users/lemi/code/foreman/foreman-proxy/src/db/relay.rs
    • Lines 371-418 insert a relay_task_claims row on first lease and then permanently gate future leases on the stored claimed_node_id matching the current node.
    • If the stored claimed_node_id differs, lines 406-407 return false, which skips leasing entirely for the new node.
  • /Users/lemi/code/foreman/foreman-proxy/src/db/nodes.rs
    • Lines 359-362 delete all existing node_route_claims for a node before writing the replacement set.
    • That path does not touch relay_task_claims, so historical task ownership survives even when the node drops the route.
  • /Users/lemi/code/foreman/foreman-proxy/src/api/tests.rs
    • Lines 520-760 cover the new sticky-ownership behavior across two nodes and prove that a claim_key stays attached to the first claimant for subsequent deliveries.
    • There is no companion test for the route-reassignment case where the original claimant removes its route claim and another node becomes the sole current claimant.

Why this is a bug

The new claim table is intended to suppress duplicate task pulls, but without any cleanup or reassignment hook it can outlive the routing truth in node_route_claims. Once that happens, deliveries for the surviving route owner remain pending forever because no current claimant is allowed to lease them.

Minimal safe fix

  1. Clear or reassign relay_task_claims entries that point at a node when replace_node_route_claims removes the corresponding route from that node.
  2. Keep the durable-ownership behavior for still-valid route owners.
  3. Add a regression test that reproduces: node A claims a route, receives the first keyed delivery, drops the route, node B claims the route, and node B can lease the next keyed delivery.

Duplicate check

Searched Linear for route claims, relay task claims, relay claim, and related Foreman bug terms; no duplicate issue was returned.

Session comment:
This thread is for an agent session with foreman.

Summary

Cleared stale relay task claims on route replacement and added route-handoff regressions

Checks

  • foreman-server-format: passed (exit status 0)
  • foreman-server-checks: passed (exit status 0)
  • foreman-server-tests: passed (exit status 0)

Foreman Metadata

  • Task ID: 019dc1c9-04ba-7263-954e-b81e56094cc9
  • Delivery ID: 019dc1ca-f73f-79b2-bfde-eb2f42361108
  • Run ID: 019dc1ca-f614-7c63-b280-a0510ffde834
  • Attempt: 1
  • Branch: lb/foreman/019dc1c9-04ba-7263-954e-b81e56094cc9

@vkforeman vkforeman Bot requested review from foreman-vk and rhymiz as code owners April 24, 2026 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant