feat(upload): instrument per-chunk retries and store wall-clock#87
Merged
jacderida merged 1 commit intoMay 12, 2026
Merged
Conversation
4 tasks
8fd5720 to
a300fea
Compare
Track per-chunk attempt counts and store-RPC wall-clock through the upload pipeline so testnet runs can identify when slowdowns are client-side quorum/retry cost vs network or storage cost. Surface on FileUploadResult and ant-cli --json output: - chunk_attempts_total: sum of store-RPC attempts (>= chunks_stored) - store_durations_ms: per-chunk wall-clock from first attempt to success - retries_histogram: how many stored chunks needed N retries Also emit a structured "chunk_store_wave_complete" info log per wave with p50/p95/max durations and per-round retry counts, for log-based analysis without --json parsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a300fea to
fd6121b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Track per-chunk store-RPC attempt counts and wall-clock durations through the upload pipeline so testnet runs can distinguish client-side quorum/retry cost from network or storage cost when investigating upload slowdowns.
Motivation: a recent testnet showed aggregate upload throughput halving over 12 hours before any node failures, with large-file uploaders degrading ~5x and small-file uploaders only ~1.5x. The file-size sensitivity strongly implicates per-chunk client-side cost (CLOSE_GROUP quorum, slowest-peer-dominates), but neither chunk-retry counts nor per-chunk wall-clock were observable. This change adds both.
Changes
ant-coreWaveResultandFileUploadResultgainchunk_attempts_total,store_durations_ms, andretries_histogramfields.FileUploadResultis#[non_exhaustive]so this is non-breaking.WaveAggregateStatshelper inbatch.rsfolds multipleWaveResults into one upload-level summary; threaded throughbatch_upload_chunks_with_events,upload_waves_single,upload_waves_merkle, andmerkle_upload_chunks.ant-cli--jsonoutput forfile uploadexposes the new fields so downstream tooling can record them without log parsing.info!("chunk_store_wave_complete", ...)log line per wave with p50/p95/max store durations and per-round retry counts, for log-based analysis.Test plan
cargo check --workspacecleancargo clippy --workspace --tests -- -D warningscleanant-corebatch tests pass (cargo test -p ant-core --lib data::client::batch)ant --json file upload --public <file>and verify the new JSON fields appearGenerated with Claude Code