Summary
tx-submitter panics during startup after a restart and enters a crash loop. The service cannot recover without manual intervention (deleting the local LevelDB data).
Symptom
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x8 pc=...]
morph-l2/common/batch.(*BatchStorage).LoadAllSealedBatchesAndHeader(...)
common/batch/batch_storage.go:146
Accompanied by Batch not found in cache batch_index=... logs from the commit loop, indicating the locally persisted sealed batch data is no longer contiguous.
Root cause
LoadAllSealedBatchesAndHeader looks up the parent batch with batches[idx-1], assuming the persisted sealed_batch_indices list is strictly contiguous. A missing key returns a nil map value, and parentBatch.Hash dereferences nil — addr=0x8 matches the Hash field offset in eth.RPCRollupBatch (Version uint occupies offset 0–7).
- The holes come from finalize cleanup: after a finalize tx confirms,
handleConfirmedTx deletes only the single index batchIndex-1. The finalize target jumps when other submitters advance lastFinalizedBatchIndex on L1, and batches finalized by other submitters are never deleted locally (their txs are not in this node's pending pool). The next local delete then removes a middle index, persisting a holed indices snapshot.
- The hole is invisible at runtime because all queries are memory-first and never iterate the indices list. Only the startup path (
InitAndSyncFromDatabase → LoadAllSealedBatchesAndHeader) iterates it — so the node runs fine until the next restart, then crash-loops. The startup retry loop only handles returned errors, not panics, and the existing self-heal logic sits after the load, so it never gets a chance to run.
More likely to occur with multiple submitters running concurrently or when several batches are finalized in quick succession.
Impact
- Startup crash loop; the service cannot come back up on its own.
- No data/funds safety impact (local cache only; all data can be rebuilt from the rollup contract).
Fix
See #991:
- Load path: sort indices, verify contiguity, nil-check the parent batch, and return an error so the existing
DeleteBatchStorageAndInitFromRollup self-heal rebuilds from the rollup contract instead of panicking.
- Finalize cleanup: range-based
DeleteUntil(batchIndex-1) keeps surviving indices a contiguous window (also reclaims previously leaked header keys).
- Storage hardening: batch data + header + indices are persisted in one atomic
WriteBatch; indices update errors are no longer swallowed.
Summary
tx-submitterpanics during startup after a restart and enters a crash loop. The service cannot recover without manual intervention (deleting the local LevelDB data).Symptom
Accompanied by
Batch not found in cache batch_index=...logs from the commit loop, indicating the locally persisted sealed batch data is no longer contiguous.Root cause
LoadAllSealedBatchesAndHeaderlooks up the parent batch withbatches[idx-1], assuming the persistedsealed_batch_indiceslist is strictly contiguous. A missing key returns a nil map value, andparentBatch.Hashdereferences nil —addr=0x8matches theHashfield offset ineth.RPCRollupBatch(Version uintoccupies offset 0–7).handleConfirmedTxdeletes only the single indexbatchIndex-1. The finalize target jumps when other submitters advancelastFinalizedBatchIndexon L1, and batches finalized by other submitters are never deleted locally (their txs are not in this node's pending pool). The next local delete then removes a middle index, persisting a holed indices snapshot.InitAndSyncFromDatabase→LoadAllSealedBatchesAndHeader) iterates it — so the node runs fine until the next restart, then crash-loops. The startup retry loop only handles returned errors, not panics, and the existing self-heal logic sits after the load, so it never gets a chance to run.More likely to occur with multiple submitters running concurrently or when several batches are finalized in quick succession.
Impact
Fix
See #991:
DeleteBatchStorageAndInitFromRollupself-heal rebuilds from the rollup contract instead of panicking.DeleteUntil(batchIndex-1)keeps surviving indices a contiguous window (also reclaims previously leaked header keys).WriteBatch; indices update errors are no longer swallowed.