Skip to content

tx-submitter bug fix #975

@SegueII

Description

@SegueII

Summary

tx-submitter panics during startup after a restart and enters a crash loop. The service cannot recover without manual intervention (deleting the local LevelDB data).

Symptom

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x8 pc=...]
morph-l2/common/batch.(*BatchStorage).LoadAllSealedBatchesAndHeader(...)
        common/batch/batch_storage.go:146

Accompanied by Batch not found in cache batch_index=... logs from the commit loop, indicating the locally persisted sealed batch data is no longer contiguous.

Root cause

  1. LoadAllSealedBatchesAndHeader looks up the parent batch with batches[idx-1], assuming the persisted sealed_batch_indices list is strictly contiguous. A missing key returns a nil map value, and parentBatch.Hash dereferences nil — addr=0x8 matches the Hash field offset in eth.RPCRollupBatch (Version uint occupies offset 0–7).
  2. The holes come from finalize cleanup: after a finalize tx confirms, handleConfirmedTx deletes only the single index batchIndex-1. The finalize target jumps when other submitters advance lastFinalizedBatchIndex on L1, and batches finalized by other submitters are never deleted locally (their txs are not in this node's pending pool). The next local delete then removes a middle index, persisting a holed indices snapshot.
  3. The hole is invisible at runtime because all queries are memory-first and never iterate the indices list. Only the startup path (InitAndSyncFromDatabaseLoadAllSealedBatchesAndHeader) iterates it — so the node runs fine until the next restart, then crash-loops. The startup retry loop only handles returned errors, not panics, and the existing self-heal logic sits after the load, so it never gets a chance to run.

More likely to occur with multiple submitters running concurrently or when several batches are finalized in quick succession.

Impact

  • Startup crash loop; the service cannot come back up on its own.
  • No data/funds safety impact (local cache only; all data can be rebuilt from the rollup contract).

Fix

See #991:

  • Load path: sort indices, verify contiguity, nil-check the parent batch, and return an error so the existing DeleteBatchStorageAndInitFromRollup self-heal rebuilds from the rollup contract instead of panicking.
  • Finalize cleanup: range-based DeleteUntil(batchIndex-1) keeps surviving indices a contiguous window (also reclaims previously leaked header keys).
  • Storage hardening: batch data + header + indices are persisted in one atomic WriteBatch; indices update errors are no longer swallowed.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions