Skip to content

Add BlocksByRange req/resp protocol #688

@tcoratger

Description

@tcoratger

Lean nodes that fall behind cannot recover. The only block-fetch primitive is BlocksByRoot, so closing an 872-slot gap takes ~872 sequential round trips while gossip orphans pile up faster than they resolve. This issue adds a chunked BlocksByRange protocol, mirroring the established beacon design.

Problem

src/lean_spec/subspecs/networking/reqresp/message.py defines only BlocksByRootRequest. When a gossip block arrives with a missing parent, the sync layer queues a single-root fetch; the parent's parent is also missing; another fetch; etc.

src/lean_spec/subspecs/sync/backfill_sync.py already batches per call (MAX_BLOCKS_PER_REQUEST = 10), but the caller pattern feeds one orphan in at a time, so batching never engages.

With ranges, an 872-slot gap closes in ceil(872 / 1024) = 1 round trip.

The caller-pattern fan-in (gossip handler enqueues one orphan at a time) is partly to blame and worth fixing in passing, but the structural fix is ranges.

Proposed protocol

Mirror beacon BeaconBlocksByRange under the lean namespace. New protocol added alongside BlocksByRoot, not replacing it.

# src/lean_spec/subspecs/networking/reqresp/message.py
BLOCKS_BY_RANGE_PROTOCOL_V1: Final = ProtocolId(
    "/leanconsensus/req/blocks_by_range/1/ssz_snappy"
)

class BlocksByRangeRequest(Container):
    start_slot: Slot
    count: Uint64
  • No step field. Deprecated in beacon v2; no validator client used it. If a use case appears, bump to /2/.
  • Response framing: chunked ssz_snappy, one SignedBlock per chunk, identical to BlocksByRoot.
  • Reuse existing limits: MAX_REQUEST_BLOCKS = 1024, MAX_PAYLOAD_SIZE = 10 MiB, TTFB_TIMEOUT = 5.0s, RESP_TIMEOUT = 10.0s. No new constants.
  • Status codes: 0 = SUCCESS, 1 = INVALID_REQUEST, 2 = SERVER_ERROR, 3 = RESOURCE_UNAVAILABLE.

Responder rules

  • MUST serve blocks canonical on the responder's current fork choice.
  • MUST return blocks at consecutive slots; missing slots omitted, order preserved.
  • MUST ensure each block's parent_root matches the previous returned block's root, or links to a known ancestor.
  • MUST return INVALID_REQUEST if count == 0 or count > MAX_REQUEST_BLOCKS.
  • MUST return RESOURCE_UNAVAILABLE if start_slot predates the retained history window.
  • For slots <= finalized.slot from the most recent Status, returned blocks MUST lead to that finalized root.
  • MAY stop early on fork-choice change, load shedding, or reaching head.

Requester rules

  • Verify slot strictly increasing and block.parent_root == prev_block.hash_tree_root() before importing.
  • On parent_root or slot-monotonicity violation: drop the response, downscore the peer.
  • During initial sync, request overlapping ranges from at least two peers; cross-check hash_tree_root agreement at overlap slots.
  • Treat RESOURCE_UNAVAILABLE as non-punitive.

Implementation checklist

Stage 1 — Protocol surface

  • Add BLOCKS_BY_RANGE_PROTOCOL_V1 and BlocksByRangeRequest to networking/reqresp/message.py.
  • Implement responder in networking/reqresp/handler.py, mirroring the existing single-root handler.
  • Implement requester in networking/client/reqresp_client.py, mirroring request_blocks_by_root.
  • Chunked-stream parent_root and slot continuity verification.

Stage 2 — Sync integration

  • BackfillSync uses ranges first; falls back to BlocksByRoot only for residual missing roots.
  • Sync caller pattern: enqueue all current orphan roots per fetch, not one per gossip arrival.
  • Multi-peer cross-check on initial sync.

Stage 3 — Peer scoring hooks

  • Downscore on parent_root mismatch.
  • Downscore on slot non-monotonicity, duplicate slots, out-of-order chunks.
  • No penalty for RESOURCE_UNAVAILABLE or empty post-head ranges.

Test plan

tests/lean_spec/subspecs/networking/reqresp/test_message.py

  • BlocksByRangeRequest SSZ round-trip: encode then decode yields identical container
  • hash_tree_root stable across re-encodings of equal requests
  • Reject count == 0 with INVALID_REQUEST
  • Reject count > MAX_REQUEST_BLOCKS (boundary: MAX, MAX+1)
  • start_slot at Slot(0) and at Uint64.MAX decode cleanly
  • Truncated and oversized payload bytes rejected at decode

tests/lean_spec/subspecs/networking/reqresp/test_handler.py

  • Returns exactly count consecutive blocks from start_slot when all retained
  • Returns fewer than count when range overruns head, no error
  • Skips empty slots, preserves slot monotonicity
  • RESOURCE_UNAVAILABLE when start_slot predates retained history
  • INVALID_REQUEST on count == 0, count > MAX_REQUEST_BLOCKS, malformed SSZ
  • Range spanning finalization boundary returns canonical chain only
  • MAY terminate stream early on mid-response reorg; partial chunks well-formed

tests/lean_spec/subspecs/networking/client/test_reqresp_client.py

  • TTFB timeout and total RESP timeout surface distinct errors
  • Empty response accepted when range is post-head
  • Rejects non-monotonic slots; downscores peer
  • Rejects mismatched parent_root against previous chunk; downscores
  • Rejects out-of-order chunks and duplicate slots; downscores
  • Rejects chunks outside [start_slot, start_slot + count)

tests/lean_spec/subspecs/sync/test_backfill_sync.py

  • Node 872 slots behind closes gap in ceil(872 / MAX_REQUEST_BLOCKS) RPCs
  • Orphan flood (N unknown roots in one slot) resolves in O(1) range RPCs, not O(N) single-root fetches
  • Range used first; single-root fallback only for residual missing roots
  • Pipelined ranges across peers do not double-fetch overlapping slots

tests/consensus/ (spec fixtures)

  • state_transition_test: range-based catch-up applies a contiguous block batch and yields expected post-state

Out of scope

  • No V2 with skip-slot semantics or step field.
  • No blob-range framing (lean has no blobs).
  • No peer-scoring overhaul beyond the localized hooks above.
  • No changes to BlocksByRoot; it remains for targeted recovery.
  • No backward-compatibility shims (per repo policy).
  • Validator-side mitigations (sync-lag duty gate, fork-choice attestation filter) are tracked in a separate issue.

Open questions

  • Should the request or each response chunk carry an explicit head_root to disambiguate forks the responder is on, or is Status sufficient?
  • For non-finalized slots, MUST the responder serve only its canonical fork, or MAY it serve any-fork on explicit requester opt-in?
  • Rate-limit policy (per-peer requests/sec, max in-flight)? Beacon leaves this client-defined; should lean spec a floor?
  • During catch-up, should an empty range count against peer score, or is it expected near the responder's head?

References

  • Beacon BeaconBlocksByRange: /eth2/beacon_chain/req/beacon_blocks_by_range/1/ in consensus-specs/specs/phase0/p2p-interface.md.
  • Existing BlocksByRoot: src/lean_spec/subspecs/networking/reqresp/message.py.
  • Networking constants: src/lean_spec/subspecs/networking/config.py.
  • Sync layer: src/lean_spec/subspecs/sync/backfill_sync.py, src/lean_spec/subspecs/sync/service.py.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions