perf(chunk): share chunk Bytes across close-group fan-out (drop to_vec)#82
Open
jacderida wants to merge 1 commit into
Open
perf(chunk): share chunk Bytes across close-group fan-out (drop to_vec)#82jacderida wants to merge 1 commit into
jacderida wants to merge 1 commit into
Conversation
`chunk_put_with_proof` was deep-copying every chunk's content into a fresh `Vec<u8>` via `Bytes::to_vec()` before stuffing it into a `ChunkPutRequest`. Since `chunk_put_to_close_group` spawns one of these per recipient (CLOSE_GROUP_MAJORITY ≈ 5) and the AIMD store controller caps concurrent chunks at 64, peak in-flight chunk content reached ~64 × 5 × 4 MB = 2.5 GB just from these copies — enough to OOM-kill `ant` on a 4 GB client VM partway through a 300 MB-or-larger upload. Heaptrack against a clean-exit 20 MB upload (release `ant` on a 4 GB VM) showed 285 MB peak in `alloc::raw_vec::RawVecInner::finish_grow`, with 168 MB of that consumed via `ChunkMessage::encode` → `chunk_put_to_close_group` — i.e. the per-recipient copy + encode loop. ant-protocol now holds `ChunkPutRequest::content` as `bytes::Bytes` (see jacderida/ant-protocol#1), so the caller can pass the refcounted `Bytes` straight through. Each peer's spawned task now shares the single 4 MB backing buffer instead of holding an independent copy. Wire format is unchanged. The patch.crates-io stanza pins ant-protocol to the perf branch commit so the build resolves the Bytes-typed field. The pin should be removed once a crates.io release containing the protocol change is published. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0d19966 to
0a0747a
Compare
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
content.to_vec()atant-core/src/data/client/chunk.rs:173and pass the refcountedBytesstraight through toChunkPutRequest::with_payment.ant-protocolto perf(chunk): hold ChunkPutRequest content as Bytes for zero-copy fan-out ant-protocol#6 via[patch.crates-io]so the newcontent: Bytesfield type resolves until that PR lands and is released to crates.io.Why this matters
A heaptrack capture of a release
antbinary uploading a 20 MB file on a 4 GB client VM (DEV-01 testnet) shows peak heap ≈ 285 MB pinned toalloc::raw_vec::RawVecInner::finish_grow, with 168 MB consumed viaChunkMessage::encode→chunk_put_to_close_group. Trace summary:Source of the bloat:
chunk_put_to_close_groupspawnsCLOSE_GROUP_MAJORITY ≈ 5per-peer tasks viaFuturesUnordered, each callingchunk_put_with_proof, which until now did:Even though
contentarrives as a refcountedbytes::Bytes, that.to_vec()deep-copies the whole 4 MB chunk into a freshVec<u8>— once per recipient. Combined with the AIMD store concurrency cap at 64 (raised from the prior static 8 in commit768e3bdin late April), peak in-flight chunk content reaches:which is exactly the gap between the 256 MB peak the spilling design intended (file.rs:130 comment) and the actual ~3.55 GB anon-rss observed at every OOM kill on the DEV-01 1 GB / 500 MB / 300 MB uploaders. Lowering the cap would also restore 4 GB-VM viability, but this PR addresses the root cause: the per-recipient copy.
With this change, each peer's spawned task holds the same single 4 MB backing buffer via
Bytes' refcount — N copies → 1 copy per chunk.Wire compatibility
Unchanged.
BytesandVec<u8>serialise identically under postcard/serde (varint length + raw bytes). Mixed networks of patched/unpatched clients and servers interoperate.Patch.crates-io stanza
This PR adds:
The pin is against the perf branch on the jacderida fork; the ant-protocol PR is at WithAutonomi/ant-protocol#6. The stanza should be removed once that PR lands and a crates.io release containing it is published.
Test plan
cargo check --workspace— cleancargo check --bin ant— clean (the deployable binary builds).zstprofilessaorsa-deploy build ant, re-provision a 1 GB / 500 MB / 300 MB uploader on a 4 GB VM, confirm OOMs cease and memory stays well below the 3.55 GB ceilingKnown follow-up
cargo test --workspacecurrently fails on a transitive dev-dep build ofant-node 0.11.1, which readsrequest.contentasVec<u8>instorage/handler.rs:239. The deployableantbinary doesn't link ant-node, so this doesn't block the operational fix, but ant-node will need a matching update when its protocol pin advances — out of scope for this PR.🤖 Generated with Claude Code