Skip to content

feat: AL-direct prefetch from EIP-2930 access lists at block build time#97

Open
defistar wants to merge 13 commits intodevfrom
feature/al-prefetch-only
Open

feat: AL-direct prefetch from EIP-2930 access lists at block build time#97
defistar wants to merge 13 commits intodevfrom
feature/al-prefetch-only

Conversation

@defistar
Copy link

@defistar defistar commented Mar 14, 2026

Background

Reth's block execution bottleneck is MDBX I/O — when the EVM executes a transaction, it reads account state and storage slots from disk on demand. These reads happen serially on the critical path, directly adding to block build latency and capping TPS.

The existing solution (feature/txn-execution-cache-warming) uses simulation workers: background rayon threads continuously simulate pending transactions to discover which accounts/slots they'll touch, store those keys in a PreWarmedCache, and then at block build time
parallel-load the values from MDBX into CachedReads before the EVM runs. This achieves ~99% cache hit rates but introduces CPU competition between simulation workers and block execution.

What & Why

This branch introduces a simpler, zero-worker alternative: instead of simulating transactions to discover keys, we read them directly from EIP-2930 access lists.

EIP-2930 (type 0x01) transactions carry an explicit AccessList field — a list of (address, [storage_slots]) pairs that the transaction declares it will touch. This is exactly the prefetch oracle we need, already embedded in the transaction itself. No simulation
required.

At block build time, before handing the state DB to the EVM:

  1. Iterate all pending pool transactions
  2. For each transaction with a non-empty access list, extract declared addresses and slots
  3. Pre-load them from MDBX into CachedReads via CachedReadsDbMut
  4. EVM executes against the warm cache — near 100% hits on declared keys

Trade-offs vs simulation approach:

┌────────────────────┬──────────────────────────────────────────────┬───────────────────────────────────────────┐
│                    │              Simulation workers              │                AL prefetch                │
├────────────────────┼──────────────────────────────────────────────┼───────────────────────────────────────────┤
│ Key discovery      │ EVM simulation (discovers all touched state) │ EIP-2930 access list (only declared keys) │
├────────────────────┼──────────────────────────────────────────────┼───────────────────────────────────────────┤
│ Background workers │ Yes (rayon pool, CPU competition)            │ None                                      │
├────────────────────┼──────────────────────────────────────────────┼───────────────────────────────────────────┤
│ Coverage           │ All transactions                             │ Only EIP-2930 transactions                │
├────────────────────┼──────────────────────────────────────────────┼───────────────────────────────────────────┤
│ Complexity         │ High                                         │ Minimal (~160 lines)                      │
├────────────────────┼──────────────────────────────────────────────┼───────────────────────────────────────────┤
│ CPU overhead       │ Continuous (competes with block exec)        │ Zero between blocks                       │
└────────────────────┴──────────────────────────────────────────────┴───────────────────────────────────────────┘

This is the right approach when the workload uses EIP-2930 transactions (as the adventure benchmark does with ADVENTURE_USE_ACCESS_LIST=true).

Deduplication

A critical optimization in the implementation: keys are deduplicated before any MDBX reads.

In a busy mempool, hot contracts (USDC, WETH, DEX routers) appear in thousands of pending transactions. Without deduplication, a naïve implementation would call basic(usdc_address) once per transaction that touches USDC — if that's 5,000 transactions, that's 5,000
calls. Only the first is an MDBX round-trip; the rest are cache hits inside CachedReadsDbMut, but thousands of redundant function calls still add measurable overhead at full mempool scale.

The fix is a two-pass approach:

  • Pass 1 (no I/O): iterate all pending transactions, insert every declared address and (address, slot) pair into HashSets
  • Pass 2 (MDBX reads): iterate the deduplicated sets — exactly one basic() per unique address, one storage() per unique (address, slot) pair

Result: MDBX reads scale with unique key count, not with mempool size × access list size.

Implementation

  • crates/optimism/payload/src/al_prefetch.rs — new module, prefetch_from_pool() function
  • crates/optimism/payload/src/builder.rs — 8-line hook in build_payload(), guarded by al_prefetch::is_enabled()
  • Controlled by TXPOOL_AL_PREFETCH_ONLY=1 env var (runtime, no recompile needed)
  • Emits 3 Prometheus metrics: reth_al_prefetch_tx_with_access_list_total, reth_al_prefetch_keys_extracted_total, reth_al_prefetch_duration_seconds
  • No new cargo feature flag — the module is always compiled, dormant unless env var is set

defistar and others added 3 commits March 14, 2026 18:46
At block build time (inside build_payload), iterate all pending pool transactions,
extract every EIP-2930 access list entry, and pre-load the referenced accounts
and storage slots into CachedReads before the EVM starts executing.

The EVM then gets near-100% cache hits for any TX that declared its state
access pattern via an access list — with zero background workers and zero
per-transaction simulation cost.

Enable with: TXPOOL_AL_PREFETCH_ONLY=1

New module: crates/optimism/payload/src/al_prefetch.rs
  - is_enabled()          — env-var check, cached in OnceLock
  - prefetch_from_pool()  — iterates pool, loads state into CachedReads
  - AlPrefetchStats       — tx_with_al, accounts_loaded, slots_loaded, elapsed_us

Prometheus metrics emitted per build call:
  reth_al_prefetch_tx_with_access_list_total
  reth_al_prefetch_keys_extracted_total
  reth_al_prefetch_duration_seconds

Spec: AL_PREFETCH_SPEC.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
*_SPEC.md files are local working documents, not part of the codebase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hot contracts (USDC, WETH, pool routers) appear in thousands of pending
transactions. Without deduplication, prefetch_from_pool called basic(addr)
once per TX appearance — only the first is an MDBX read but thousands of
redundant cache-hit calls still add up under a full mempool.

Two-pass approach:
- Pass 1 (no I/O): collect unique addresses and (address, slot) pairs into
  HashSets across all pending TXs with access lists.
- Pass 2 (MDBX): read each unique key exactly once via CachedReadsDbMut.

accounts_loaded / slots_loaded in metrics now reflect unique key counts,
which is the meaningful number (not raw TX × AL size).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@defistar defistar self-assigned this Mar 14, 2026
@defistar defistar requested a review from cliff0412 March 14, 2026 11:30
let mut unique_accounts: HashSet<Address> = HashSet::default();
let mut unique_slots: HashSet<(Address, U256)> = HashSet::default();

for valid_tx in pool.pending_transactions() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefetch reads the same MDBX data on the same thread, sequentially. There's no parallelism gain

we only need to prefetch for txs included in a block

continue;
}
tx_with_al += 1;
for item in &al.0 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefetch reads the same MDBX data on the same thread, sequentially

defistar and others added 10 commits March 14, 2026 19:57
…execution

Previously prefetch_from_pool() ran in build_payload() using
pending_transactions() — before transaction selection, with no base-fee
filter, on the wrong transaction set.

The prefetch has no business selecting transactions. Its only job is:
given the already-selected best transactions, prefetch their access-list
keys before the EVM executes them.

Changes:
- Change `best: FnOnce` → `Fn` so it can be called twice: once to create
  the prefetch iterator, once for the execution iterator (pool.clone() is
  cheap — Pool is Arc-backed)
- Remove prefetch from build_payload(); add it inside build() between
  step 3.1 (best_transactions_with_attrs selection) and step 3.2 (execute)
- Replace prefetch_from_pool() with prefetch_from_best_txs() which accepts
  `impl PayloadTransactions` + `&mut impl revm::Database`
- DB reads go through builder.evm_mut().db_mut() (State<CachedReadsDbMut>);
  each read falls through State journal → CachedReadsDbMut → MDBX and
  populates CachedReads automatically — no separate cache structure needed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Operating on already-selected best_txs (bounded set), so HashSet
dedup overhead isn't needed. State/CachedReads handles repeated keys
naturally. Single pass, no intermediate collections.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without dedup, hot contracts appearing in many transactions caused
redundant db.basic()/storage() calls for the same key. Two-pass approach:
collect unique addresses/slots first, then read each exactly once.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Increments unconditionally every time prefetch_from_best_txs runs.
Distinguishes two previously indistinguishable zero-metric cases:
  calls=0 → is_enabled() returned false (env var not reaching binary)
  calls>0, tx_with_access_list=0 → function runs but txns have no access lists

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fires exactly once on first is_enabled() call — confirms whether
TXPOOL_AL_PREFETCH_ONLY reaches the binary and what value it sees.
Needed because calls_total=0 despite env var being set in container.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BlockOrPayload::block_access_list() was a stub returning None, so
PrewarmMode::BlockAccessList was never activated and all BAL metrics
stayed at zero.

Fix: implement block_access_list() to collect EIP-2930 access list
entries from the block's transactions and convert them into a
BlockAccessList (EIP-7928 format) for the prefetcher.

The access list data travels intact from sender → pool → block →
engine payload, but was never read. Now:

- Block variant: iterates decoded transactions, calls tx.access_list()
- Payload variant: decodes raw RLP bytes via TxEnvelope::decode_2718,
  extracts access lists. Deposit txs (OP type 0x7e) silently skipped.

Each AccessListItem's storage_keys become storage_reads in AccountChanges,
so the BAL prewarm worker prefetches those slots before execution.

Also adds ExecutionPayload::encoded_transactions() with default empty
slice; implemented for OpExecutionData to expose the raw tx bytes from
the v1 payload.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three additions to make the AL prefetch path observable:

1. payload_validator.rs: debug log when BAL is built from EIP-2930
   access lists (block hash, entry count, total slot count), or trace
   log when no access lists are found and BAL prefetch is skipped.
   Answers: "are transactions actually carrying access lists?"

2. prewarm.rs: upgrade run_bal_prewarm start/complete logs from trace
   to debug, adding total_entries and total_slots fields so they appear
   in normal debug output without enabling trace.

3. prewarm.rs: new sync.prewarm.bal_slots_prefetched counter incremented
   with total_slots after all BAL workers complete. Previously only
   bal_slot_iteration_duration was recorded; the counter lets you track
   cumulative slots prefetched over time in Prometheus.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants