feat: AL-direct prefetch from EIP-2930 access lists at block build time#97
Open
feat: AL-direct prefetch from EIP-2930 access lists at block build time#97
Conversation
At block build time (inside build_payload), iterate all pending pool transactions, extract every EIP-2930 access list entry, and pre-load the referenced accounts and storage slots into CachedReads before the EVM starts executing. The EVM then gets near-100% cache hits for any TX that declared its state access pattern via an access list — with zero background workers and zero per-transaction simulation cost. Enable with: TXPOOL_AL_PREFETCH_ONLY=1 New module: crates/optimism/payload/src/al_prefetch.rs - is_enabled() — env-var check, cached in OnceLock - prefetch_from_pool() — iterates pool, loads state into CachedReads - AlPrefetchStats — tx_with_al, accounts_loaded, slots_loaded, elapsed_us Prometheus metrics emitted per build call: reth_al_prefetch_tx_with_access_list_total reth_al_prefetch_keys_extracted_total reth_al_prefetch_duration_seconds Spec: AL_PREFETCH_SPEC.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
*_SPEC.md files are local working documents, not part of the codebase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hot contracts (USDC, WETH, pool routers) appear in thousands of pending transactions. Without deduplication, prefetch_from_pool called basic(addr) once per TX appearance — only the first is an MDBX read but thousands of redundant cache-hit calls still add up under a full mempool. Two-pass approach: - Pass 1 (no I/O): collect unique addresses and (address, slot) pairs into HashSets across all pending TXs with access lists. - Pass 2 (MDBX): read each unique key exactly once via CachedReadsDbMut. accounts_loaded / slots_loaded in metrics now reflect unique key counts, which is the meaningful number (not raw TX × AL size). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cliff0412
reviewed
Mar 14, 2026
| let mut unique_accounts: HashSet<Address> = HashSet::default(); | ||
| let mut unique_slots: HashSet<(Address, U256)> = HashSet::default(); | ||
|
|
||
| for valid_tx in pool.pending_transactions() { |
There was a problem hiding this comment.
The prefetch reads the same MDBX data on the same thread, sequentially. There's no parallelism gain
we only need to prefetch for txs included in a block
cliff0412
reviewed
Mar 14, 2026
| continue; | ||
| } | ||
| tx_with_al += 1; | ||
| for item in &al.0 { |
There was a problem hiding this comment.
The prefetch reads the same MDBX data on the same thread, sequentially
…execution Previously prefetch_from_pool() ran in build_payload() using pending_transactions() — before transaction selection, with no base-fee filter, on the wrong transaction set. The prefetch has no business selecting transactions. Its only job is: given the already-selected best transactions, prefetch their access-list keys before the EVM executes them. Changes: - Change `best: FnOnce` → `Fn` so it can be called twice: once to create the prefetch iterator, once for the execution iterator (pool.clone() is cheap — Pool is Arc-backed) - Remove prefetch from build_payload(); add it inside build() between step 3.1 (best_transactions_with_attrs selection) and step 3.2 (execute) - Replace prefetch_from_pool() with prefetch_from_best_txs() which accepts `impl PayloadTransactions` + `&mut impl revm::Database` - DB reads go through builder.evm_mut().db_mut() (State<CachedReadsDbMut>); each read falls through State journal → CachedReadsDbMut → MDBX and populates CachedReads automatically — no separate cache structure needed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Operating on already-selected best_txs (bounded set), so HashSet dedup overhead isn't needed. State/CachedReads handles repeated keys naturally. Single pass, no intermediate collections. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without dedup, hot contracts appearing in many transactions caused redundant db.basic()/storage() calls for the same key. Two-pass approach: collect unique addresses/slots first, then read each exactly once. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Increments unconditionally every time prefetch_from_best_txs runs. Distinguishes two previously indistinguishable zero-metric cases: calls=0 → is_enabled() returned false (env var not reaching binary) calls>0, tx_with_access_list=0 → function runs but txns have no access lists Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fires exactly once on first is_enabled() call — confirms whether TXPOOL_AL_PREFETCH_ONLY reaches the binary and what value it sees. Needed because calls_total=0 despite env var being set in container. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BlockOrPayload::block_access_list() was a stub returning None, so PrewarmMode::BlockAccessList was never activated and all BAL metrics stayed at zero. Fix: implement block_access_list() to collect EIP-2930 access list entries from the block's transactions and convert them into a BlockAccessList (EIP-7928 format) for the prefetcher. The access list data travels intact from sender → pool → block → engine payload, but was never read. Now: - Block variant: iterates decoded transactions, calls tx.access_list() - Payload variant: decodes raw RLP bytes via TxEnvelope::decode_2718, extracts access lists. Deposit txs (OP type 0x7e) silently skipped. Each AccessListItem's storage_keys become storage_reads in AccountChanges, so the BAL prewarm worker prefetches those slots before execution. Also adds ExecutionPayload::encoded_transactions() with default empty slice; implemented for OpExecutionData to expose the raw tx bytes from the v1 payload. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three additions to make the AL prefetch path observable: 1. payload_validator.rs: debug log when BAL is built from EIP-2930 access lists (block hash, entry count, total slot count), or trace log when no access lists are found and BAL prefetch is skipped. Answers: "are transactions actually carrying access lists?" 2. prewarm.rs: upgrade run_bal_prewarm start/complete logs from trace to debug, adding total_entries and total_slots fields so they appear in normal debug output without enabling trace. 3. prewarm.rs: new sync.prewarm.bal_slots_prefetched counter incremented with total_slots after all BAL workers complete. Previously only bal_slot_iteration_duration was recorded; the counter lets you track cumulative slots prefetched over time in Prometheus. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
Reth's block execution bottleneck is MDBX I/O — when the EVM executes a transaction, it reads account state and storage slots from disk on demand. These reads happen serially on the critical path, directly adding to block build latency and capping TPS.
The existing solution (feature/txn-execution-cache-warming) uses simulation workers: background rayon threads continuously simulate pending transactions to discover which accounts/slots they'll touch, store those keys in a PreWarmedCache, and then at block build time
parallel-load the values from MDBX into CachedReads before the EVM runs. This achieves ~99% cache hit rates but introduces CPU competition between simulation workers and block execution.
What & Why
This branch introduces a simpler, zero-worker alternative: instead of simulating transactions to discover keys, we read them directly from EIP-2930 access lists.
EIP-2930 (type 0x01) transactions carry an explicit AccessList field — a list of (address, [storage_slots]) pairs that the transaction declares it will touch. This is exactly the prefetch oracle we need, already embedded in the transaction itself. No simulation
required.
At block build time, before handing the state DB to the EVM:
Trade-offs vs simulation approach:
This is the right approach when the workload uses EIP-2930 transactions (as the adventure benchmark does with ADVENTURE_USE_ACCESS_LIST=true).
Deduplication
A critical optimization in the implementation: keys are deduplicated before any MDBX reads.
In a busy mempool, hot contracts (USDC, WETH, DEX routers) appear in thousands of pending transactions. Without deduplication, a naïve implementation would call basic(usdc_address) once per transaction that touches USDC — if that's 5,000 transactions, that's 5,000
calls. Only the first is an MDBX round-trip; the rest are cache hits inside CachedReadsDbMut, but thousands of redundant function calls still add measurable overhead at full mempool scale.
The fix is a two-pass approach:
Result: MDBX reads scale with unique key count, not with mempool size × access list size.
Implementation