You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Validators on lagging nodes keep signing duties against stale heads. Their attestations land in fork choice as weight on the wrong subtree, pulling the network away from the canonical head. This issue adds a sync-lag gate on validator duties: skip attestation and proposal when the local head trails wall clock by more than SYNC_LAG_THRESHOLD slots.
Companion to #688 (BlocksByRange), which addresses the catch-up speed itself. The two are independent and can land in either order, but together they unblock devnet stalls.
Problem
src/lean_spec/subspecs/validator/service.py has no sync-lag check. A node 800 slots behind continues to attest and propose against its stale head.
Consequences:
Stale-head attestations deposit LMD-GHOST weight on the wrong subtree, slowing convergence on the canonical head.
This bites lean harder than beacon. Faster finality means fork choice has less time to absorb noise.
Lagging validators waste their own attestations and accrue inclusion-distance / wrong-head penalties.
A node can be SyncState.SYNCED per the state machine but still many slots behind wall clock during a brief network hiccup, validator restart, or partition. The right signal is wall-clock lag against the local head, not the binary sync-state flag.
Proposed gate
Skip both attestation and proposal duties when local head lags wall clock by more than SYNC_LAG_THRESHOLD = 4 slots.
# src/lean_spec/subspecs/validator/service.pySYNC_LAG_THRESHOLD: Final[int] =4defis_synced_for_duties(store: Store, wall_clock_slot: Slot) ->bool:
"""Return False if the node is too far behind to safely sign duties."""head_slot=store.blocks[store.head].slotifwall_clock_slot<=head_slot:
returnTruereturn (wall_clock_slot-head_slot) <=SYNC_LAG_THRESHOLD
Applies to attestation and proposal. Both pollute fork choice or chain history when produced from a stale view.
Decision uses store.head, not justified or finalized. The gate is about whether the validator's view of current head is fresh.
wall_clock_slot < head_slot (clock skew) does NOT gate. Trust the chain over the wall clock in that direction.
Threshold of 4 slots is a starting value. Generous enough to absorb normal gossip jitter, tight enough to silence validators on materially-stale nodes.
On skip, emit a structured log distinguishing "skipped, unsynced" from "no duty this slot" so operators can attribute missed duties correctly.
Implementation checklist
Stage 1 — Helper and constant
Add SYNC_LAG_THRESHOLD: Final[int] = 4 to validator/config.py (create if missing) or validator/registry.py.
Add is_synced_for_duties(store, wall_clock_slot) helper in validator/service.py.
Stage 2 — Gate the duties
Gate attestation duty entry point with early return on not is_synced_for_duties(...).
Gate proposal duty entry point with the same check.
Structured log on each skip: include head_slot, wall_clock_slot, lag.
Stage 3 — Operator visibility
Counter for skipped-due-to-lag attestations and proposals (separate from "no duty this slot").
Surface the counter in the existing observability subspec.
Validators on lagging nodes keep signing duties against stale heads. Their attestations land in fork choice as weight on the wrong subtree, pulling the network away from the canonical head. This issue adds a sync-lag gate on validator duties: skip attestation and proposal when the local head trails wall clock by more than
SYNC_LAG_THRESHOLDslots.Companion to #688 (
BlocksByRange), which addresses the catch-up speed itself. The two are independent and can land in either order, but together they unblock devnet stalls.Problem
src/lean_spec/subspecs/validator/service.pyhas no sync-lag check. A node 800 slots behind continues to attest and propose against its stale head.Consequences:
A node can be
SyncState.SYNCEDper the state machine but still many slots behind wall clock during a brief network hiccup, validator restart, or partition. The right signal is wall-clock lag against the local head, not the binary sync-state flag.Proposed gate
Skip both attestation and proposal duties when local head lags wall clock by more than
SYNC_LAG_THRESHOLD = 4slots.store.head, not justified or finalized. The gate is about whether the validator's view of current head is fresh.wall_clock_slot < head_slot(clock skew) does NOT gate. Trust the chain over the wall clock in that direction.Implementation checklist
Stage 1 — Helper and constant
SYNC_LAG_THRESHOLD: Final[int] = 4tovalidator/config.py(create if missing) orvalidator/registry.py.is_synced_for_duties(store, wall_clock_slot)helper invalidator/service.py.Stage 2 — Gate the duties
not is_synced_for_duties(...).head_slot,wall_clock_slot,lag.Stage 3 — Operator visibility
Test plan
tests/lean_spec/subspecs/validator/test_service.pyis_synced_for_dutiesreturnsTruewhenwall_clock_slot - head_slot <= 4is_synced_for_dutiesreturnsFalsewhenwall_clock_slot - head_slot > 4is_synced_for_dutiesreturnsTruewhenwall_clock_slot < head_slot(clock skew edge case)is_synced_for_dutiesboundary: lag== 4is allowed; lag== 5is gatedFalseFalsehead_slot,wall_clock_slot,lagtests/consensus/(spec fixtures)fork_choice_test: validator under gate produces no attestation; canonical head unchanged by its absencefork_choice_test: same scenario with gate disabled would have produced a stale-head attestation (negative control documenting the bug this fixes)Out of scope
SyncStatesemantics. The gate is independent of the state machine.BlocksByRange).Open questions
SECONDS_PER_SLOTand finality cadence?References
src/lean_spec/subspecs/validator/service.py.src/lean_spec/subspecs/sync/states.py.is_syncingchecks before attesting (Lighthouse, Prysm, Teku) — informal, not in the consensus spec itself.