Skip to content

Add Crr Cascade capabilities to backbeat crr replication#2747

Open
SylvainSenechal wants to merge 3 commits into
development/9.5from
improvement/BB-767
Open

Add Crr Cascade capabilities to backbeat crr replication#2747
SylvainSenechal wants to merge 3 commits into
development/9.5from
improvement/BB-767

Conversation

@SylvainSenechal

@SylvainSenechal SylvainSenechal commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Issue: BB-767

Related PRs :
Arsenal : scality/Arsenal#2628
Cloudserver : scality/cloudserver#6179
CloudserverClient : scality/cloudserverclient#24
S3utils : scality/s3utils#395

@bert-e

bert-e commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Hello sylvainsenechal,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

Comment thread package.json Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
  • package.json:54 — @scality/cloudserverclient uses a local file path (file:../cloudserverclient/...). Must be changed to a proper registry or git-pinned reference before merge.
    - ReplicateObject.js:6 — checkCrrCascadeEvent and getMicroVersionId() do not appear to exist in arsenal 8.3.9. Arsenal version bump likely needed.
    - ReplicateObject.js:743 — Any 409 from destination putMetadata is assumed to be cascade-stale and marked COMPLETED. Consider using a more specific signal to avoid silently skipping replication if 409 is returned for other reasons.

    Review by Claude Code

@SylvainSenechal SylvainSenechal marked this pull request as ready for review June 3, 2026 16:15

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can already check this pr, but should really be reviewed after all the other cascade prs, as changes in these pr would also mean changes here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can functional tests instead of just these,
But waiting for Arsenal/cloudserver to be merged, as it will be easier to make these tests (functional tests in backbeat rely on an image of cloudserver)

Comment thread package.json Outdated
@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 82.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.63%. Comparing base (cd24b31) to head (f83fbd4).

Files with missing lines Patch % Lines
extensions/replication/tasks/ReplicateObject.js 82.00% 9 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
extensions/replication/tasks/ReplicateObject.js 89.85% <82.00%> (-1.36%) ⬇️

... and 3 files with indirect coverage changes

Components Coverage Δ
Bucket Notification 80.22% <ø> (ø)
Core Library 80.65% <ø> (-0.62%) ⬇️
Ingestion 70.63% <ø> (ø)
Lifecycle 79.46% <ø> (ø)
Oplog Populator 85.83% <ø> (ø)
Replication 59.97% <82.00%> (+0.22%) ⬆️
Bucket Scanner 85.76% <ø> (ø)
@@                 Coverage Diff                 @@
##           development/9.5    #2747      +/-   ##
===================================================
- Coverage            74.88%   74.63%   -0.26%     
===================================================
  Files                  200      200              
  Lines                13670    13710      +40     
===================================================
- Hits                 10237    10232       -5     
- Misses                3423     3468      +45     
  Partials                10       10              
Flag Coverage Δ
api:retry 9.12% <0.00%> (-0.03%) ⬇️
api:routes 8.94% <0.00%> (-0.03%) ⬇️
bucket-scanner 85.76% <ø> (ø)
ft_test:queuepopulator 9.11% <0.00%> (-1.87%) ⬇️
ingestion 12.52% <0.00%> (-0.04%) ⬇️
lib 7.75% <0.00%> (-0.03%) ⬇️
lifecycle 18.84% <0.00%> (-0.08%) ⬇️
notification 1.02% <0.00%> (-0.01%) ⬇️
oplogPopulator 0.14% <0.00%> (-0.01%) ⬇️
unit 51.78% <82.00%> (+0.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
  • package.json:54 — @scality/cloudserverclient uses a local file: path that will break CI and other developers. Must be changed to a proper npm or git-tag reference before merge.
    - extensions/replication/tasks/ReplicateObject.js:33-35 — Cascade sentinel objects (cascadeLoopDetected, cascadeDataComplete, partAlreadyAtDest) are plain {} singletons that pass through BackbeatTask.retry(), which mutates errors. Consider Object.freeze() to prevent accidental mutation.

    The cascade logic itself (putData/putMetadata detection, allPartsAlreadyAtDest propagation through the waterfall, _handleReplicationOutcome treating cascade signals as COMPLETED) is sound. The retry wrapper correctly passes cascade signals through without retrying (no retryable property). Tests cover the key scenarios well.

    Review by Claude Code

Comment thread extensions/replication/tasks/ReplicateObject.js
Comment thread package.json Outdated
Comment thread package.json
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@claude

claude Bot commented Jun 5, 2026

Copy link
Copy Markdown
  • ReplicateObject.js:602 — Cascade sentinel objects (cascadeLoopDetected, cascadeDataComplete) are passed as err to callbacks. Verify that BackbeatTask._retry won't retry on these truthy-but-non-error sentinels, which would defeat the loop/stale detection.
    - ReplicateObject.js:433 — On error path in _getAndPutData, destLocations may contain partial results. The sentinel objects ({}) pass the filter and could reach _deleteOrphans with undefined keys. Likely harmless due to downstream filtering, but worth a defensive check.
    - ReplicateObject.js:1014 — _processQueueEntryRetryFull now passes allPartsAlreadyAtDest as mdOnly to _putMetadata, changing retry-full semantics from always-full to conditionally-metadata-only. Confirm this is intentional.
    - package.json:57 — Arsenal pinned to a raw commit hash instead of a tag. Should be updated to a tag once the Arsenal PR merges.
    - package.json:54 — scality-cloudserverclient-v1.0.9.tgz vendored as a binary blob in git. Prefer a registry or git-tag reference.
    - ReplicateObject.js:573 — Trailing whitespace.

    Review by Claude Code

Comment thread package.json Outdated
"@smithy/node-http-handler": "^3.3.3",
"JSONStream": "^1.3.5",
"arsenal": "git+https://github.com/scality/arsenal#8.3.9",
"arsenal": "git+https://github.com/scality/Arsenal#2c429ab35a5ac82c3dafa5a0296a49a23a9c8a4a",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arsenal is pinned to a raw commit hash (2c429ab...) instead of a semantic version tag. Per project conventions, git-based deps (arsenal, vaultclient, etc.) should pin to tags (e.g. #8.x.y). Commit hashes are opaque — it's unclear which features/fixes are included, and there's no semver contract. This also makes it harder for reviewers and operators to reason about what changed.

— Claude Code

Comment thread package.json
"@aws-sdk/client-sts": "^3.921.0",
"@aws-sdk/credential-providers": "^3.921.0",
"@scality/cloudserverclient": "^1.0.8",
"@scality/cloudserverclient": "file:./scality-cloudserverclient-v1.0.9.tgz",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A binary .tgz is committed to the repo and referenced via file: in package.json. This permanently inflates the git history (binary blobs can't be diffed or garbage-collected), prevents code review of the dependency contents, and bypasses the normal package registry workflow. Consider publishing @scality/cloudserverclient@1.0.9 to the npm registry (or a private registry) and referencing it as a versioned dependency instead.

— Claude Code

@claude

claude Bot commented Jun 5, 2026

Copy link
Copy Markdown
  • package.json:57 — Arsenal pinned to a raw commit hash instead of a semantic version tag. Git-based deps should use tags per project conventions.
  • package.json:54 — @scality/cloudserverclient vendored as a binary .tgz committed to the repo. Should be published to a registry.

The cascade replication logic in ReplicateObject.js (the ExistingMicroVersionId switch, MicroVersionIdExists / 409 handling in putMetadata, and the _handleReplicationOutcome cascade signal handling) is well-structured: sentinel objects flow correctly through the retry wrapper without being retried, the orphan cleanup path correctly filters out partAlreadyAtDest sentinels, and the test coverage is thorough across all cascade branches.

Review by Claude Code

@SylvainSenechal SylvainSenechal requested review from a team, benzekrimaha and maeldonn June 8, 2026 15:47
@scality scality deleted a comment from bert-e Jun 9, 2026
@bert-e

bert-e commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

method: 'ReplicateObject._getAndPutPartOnce',
entry: destEntry.getLogInfo(),
});
return doneOnce(null, partAlreadyAtDest);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source stream (incomingMsg) is not destroyed when a VersionIdCollisionException is caught. All four return paths in this handler (lines 581–636) exit early via doneOnce(...), bypassing the existing stream cleanup at lines 639–646.

For large objects, this leaks the source HTTP connection — the response body from the GET is never consumed or destroyed, so the socket stays open until it times out. Add incomingMsg.destroy() before returning in the collision handler (e.g. at the top of the if (err instanceof VersionIdCollisionException) block, or before each return doneOnce(...)).

— Claude Code

Comment thread package.json Outdated
"@smithy/node-http-handler": "^3.3.3",
"JSONStream": "^1.3.5",
"arsenal": "git+https://github.com/scality/arsenal#8.3.9",
"arsenal": "git+https://github.com/scality/Arsenal#2c429ab35a5ac82c3dafa5a0296a49a23a9c8a4a",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arsenal is pinned to a raw commit hash instead of a version tag. The project convention is to pin git-based deps to a tag (e.g. #8.4.4). This should be updated to the corresponding tag before merging.

— Claude Code

Comment thread package.json
"@aws-sdk/client-sts": "^3.921.0",
"@aws-sdk/credential-providers": "^3.921.0",
"@scality/cloudserverclient": "^1.0.8",
"@scality/cloudserverclient": "file:./scality-cloudserverclient-v1.0.9.tgz",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scality/cloudserverclient is vendored as a local .tgz tarball checked into the repo. This bloats git history and bypasses the package registry audit trail. Should be replaced with a published registry version or a git reference before merging.

— Claude Code

@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
  • ReplicateObject.js:636 — Source stream leak: VersionIdCollisionException handler exits early without destroying incomingMsg, bypassing cleanup at lines 639-646. Leaks source HTTP connection for large objects.
    - package.json:57 — arsenal pinned to raw commit hash instead of a version tag per project convention.
    - package.json:54 — cloudserverclient vendored as a local .tgz tarball; should use a published version or git ref before merging.

    Review by Claude Code

entry: destEntry.getLogInfo(),
});
return doneOnce(null, partAlreadyAtDest);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VersionIdCollisionException handler returns via doneOnce() without aborting the source stream or the abort controller. Compare with the generic error handler below (lines 639-646) which properly calls abortController.abort() and incomingMsg.destroy(). With the expect-continue middleware the body may not have been sent, but the source GET request and its stream are still open — this leaks a connection/fd per cascade-hit part.

Suggested change
}
return doneOnce(null, partAlreadyAtDest);
}
if (!destRequestAborted) {
abortController.abort();
sourceStreamAborted = true;
if (incomingMsg.destroy) {
incomingMsg.destroy();
}
}

— Claude Code

Comment thread package.json
"@smithy/node-http-handler": "^3.3.3",
"JSONStream": "^1.3.5",
"arsenal": "git+https://github.com/scality/arsenal#8.3.9",
"arsenal": "git+https://github.com/scality/Arsenal#83682d1ae77705877b81414f39aacce4e27e0ceb",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arsenal is pinned to a raw commit hash (83682d1...) instead of a tag. The project convention (and CLAUDE.md) requires git-based deps to pin to a tag, not a branch or commit. This makes it harder to audit what version is in use and whether it's a release or an arbitrary snapshot. Please publish a tag and reference it here.

— Claude Code

Comment thread package.json
"@aws-sdk/client-sts": "^3.921.0",
"@aws-sdk/credential-providers": "^3.921.0",
"@scality/cloudserverclient": "^1.0.8",
"@scality/cloudserverclient": "file:./scality-cloudserverclient-v1.0.9.tgz",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vendoring a binary .tgz in the repo is concerning: it can't be diffed/reviewed, permanently bloats git history, and bypasses registry integrity checks. Consider publishing @scality/cloudserverclient@1.0.9 to the npm registry (or a private one) and referencing it as a normal semver dependency, same as the previous ^1.0.8.

— Claude Code

@claude

claude Bot commented Jun 10, 2026

Copy link
Copy Markdown
  • ReplicateObject.js:637 — Source stream/connection leaked on VersionIdCollisionException: all cascade branches return via doneOnce() without calling abortController.abort() or incomingMsg.destroy(), unlike the generic error path below. This leaks an fd per cascade-hit part.
    - package.json:57 — arsenal pinned to a raw commit hash instead of a tag, violating the project's dependency pinning convention.
    - package.json:54 — cloudserverclient vendored as a binary .tgz checked into the repo. Should be published to a registry for auditability and to avoid git history bloat.

    Review by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants