fix(client): match DEFAULT_STORE_TIMEOUT_SECS to storer CLOSENESS_LOOKUP_TIMEOUT+padding#83
Open
grumbach wants to merge 5 commits into
Open
Conversation
…KUP_TIMEOUT+padding Bump DEFAULT_STORE_TIMEOUT_SECS from 10s to 270s for merkle batch PUTs. The storer-side merkle payment verifier runs an iterative DHT lookup with CLOSENESS_LOOKUP_TIMEOUT = 240s (ant-node, post-PR #89). The old 10s client-side timeout fired long before the storer could finish verifying, with three downstream costs: 1. The storer keeps working on a chunk the client has already discarded, wasting CPU and bandwidth. 2. The client re-targets a different close-K member and may double-store the same chunk on a different peer set. 3. Cross-region close-K membership (sgp1 / syd1 storers serving a lon1 client) makes this happen on virtually every merkle chunk, not just a tail. Set client timeout = storer timeout + 30s padding (store-response RTT + storer-local LMDB put/fsync + clock skew tolerance). Invariant: client store-response timeout >= node CLOSENESS_LOOKUP_TIMEOUT + padding. Re-validate if either side's value changes.
…anged Adversarial review of the previous bulk timeout bump (270s for everyone) flagged that the chunk GET path at chunk.rs:296 also reads store_timeout_secs. Bumping the shared field to 270s silently changed GET behavior too, which was not the intent. This commit: - Introduces a dedicated DEFAULT_MERKLE_STORE_TIMEOUT_SECS = 270 const - Adds merkle_store_timeout_secs: u64 to ClientConfig (default 270) - Routes only the merkle PUT path (store_response_timeout_for_proof) to the new field - Leaves DEFAULT_STORE_TIMEOUT_SECS at 10 (matches current main behavior); the chunk GET path keeps reading store_timeout_secs unchanged - Updates doc comments to be honest about what each knob actually governs (store_timeout_secs now governs only the GET path and any direct readers, not non-merkle PUTs which use the STORE_RESPONSE_TIMEOUT const) - Strengthens the regression test to pin the invariant that non-merkle proof tags ignore the merkle timeout value Coordinates with Mick's PR WithAutonomi#78 which adds a dedicated chunk_get_timeout_secs field. After both land, the three timeout regions (merkle PUT / non-merkle PUT / GET) will be cleanly separated.
… limit foundryup curls api.github.com to resolve the nightly tag. Anonymous calls are rate-limited at 60/hour shared per IP; macOS runners hit this regularly and fail every E2E and Merkle E2E job with `curl: (56) ... 403`. Passing the workflow's GITHUB_TOKEN authenticates the call, raising the cap to 1000/hour per token. Same fix Mick's PR WithAutonomi#78 will want.
Setting GITHUB_TOKEN on foundry-toolchain@v1 didn't help: foundryup itself does not read $GITHUB_TOKEN before calling api.github.com to resolve the nightly tag, so macOS runners on shared egress IPs still hit the 60/h anonymous rate limit and 403 every install. Pin to v1.3.6 (last stable as of 2026-05-12) and curl the release tarball directly from the GitHub Releases CDN. Release assets are served from a CDN unaffected by the API rate limit. Unpacks anvil/forge/cast/chisel to /usr/local/bin and verifies versions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
merkle_store_timeout_secs(default 270 s) toClientConfigand routes the merkle batch PUT response timeout through it.Storer-side merkle payment verifier runs an iterative DHT lookup with
CLOSENESS_LOOKUP_TIMEOUT = 240 s(ant-node, post-PR #89). The old 10 s default fired long before the storer could finish verifying, with three downstream costs:This is the 5th merkle failure mode — orthogonal to A+G (#89), clock-skew (#91), and the eviction trio (#114/#90/#77).
Scope: merkle PUT only
This PR intentionally does not touch the non-merkle PUT path (already uses the
STORE_RESPONSE_TIMEOUTconst inchunk.rs, currently 30 s) or the chunk GET path (currently readsstore_timeout_secs, which stays at 10 s — same asmain).The GET-path rename is Mick's PR #78 which adds a dedicated
chunk_get_timeout_secsfield. After both PRs land, the three timeout regions (merkle PUT / non-merkle PUT / GET) are cleanly separated:merkle_store_timeout_secs(this PR)STORE_RESPONSE_TIMEOUTconstchunk_get_timeout_secs(Mick #78)Reasoning for 270 s
Client timeout = storer timeout + 30 s padding (store-response RTT + storer-local LMDB put / fsync + clock-skew tolerance). 240 + 30 = 270.
Invariant: client merkle store-response timeout >= node
CLOSENESS_LOOKUP_TIMEOUT+ padding. Pinned by the newdefault_merkle_store_timeout_satisfies_storer_invariantregression test so it cannot silently re-break.Empirical validation
Ran a 210-node 7-region testnet (lon1, fra1, sgp1, nyc1, ams3, tor1, syd1) on a branch combining
main+ Mick'sfix/stability-improvements+ the eviction trio (#114, #90, #77):merkle_store_timeout_secsFailure signature on the unpatched side:
Findings doc in the saorsa repo under
notes/anselme-testnet-2026-05-12/findings.md.Tests
single_node_proof_uses_store_response_timeout(existing)unknown_proof_uses_store_response_timeout(existing)merkle_proof_uses_configured_store_timeout(existing)default_merkle_store_timeout_satisfies_storer_invariant— NEW: pins the invariant thatmerkle_store_timeout_secs >= 240 + 30non_merkle_put_ignores_merkle_timeout_value— NEW: pins that non-merkle proof tags ignore whatever value is passed for the merkle timeoutFuture work / not in this PR
--merkle-store-timeout-secsCLI flag for ad-hoc tuning on residential / very-far-region setups. Not added here to keep the diff tight; can follow up.store_timeout_secsto its actual role (chunk GET) is Mick's Split chunk GET timeout from store timeout #78.