Skip to content

INTEGRATION [PR#2749 > development/9.4] BB-780: Configure CRR queue processor poll interval#2758

Merged
bert-e merged 8 commits into
development/9.4from
w/9.4/improvement/BB-780-crr-poll-interval
Jun 9, 2026
Merged

INTEGRATION [PR#2749 > development/9.4] BB-780: Configure CRR queue processor poll interval#2758
bert-e merged 8 commits into
development/9.4from
w/9.4/improvement/BB-780-crr-poll-interval

Conversation

@bert-e

@bert-e bert-e commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This pull request has been created automatically.
It is linked to its parent pull request #2749.

Do not edit this pull request directly.
If you need to amend/cancel the changeset on branch
w/9.4/improvement/BB-780-crr-poll-interval, please follow this
procedure:

 git fetch
 git checkout w/9.4/improvement/BB-780-crr-poll-interval
 # <amend or cancel the changeset by _adding_ new commits>
 git push origin w/9.4/improvement/BB-780-crr-poll-interval

Please always comment pull request #2749 instead of this one.

anurag4DSB and others added 8 commits June 9, 2026 12:43
TDD: specify the contract first. The replication queue processor
accepts an optional maxPollIntervalMs in its config block;
tests/config.json sets a distinctive value (350000) so the
pass-through is provable, the value is optional, and values below
45000 are rejected. These tests fail until the next commit adds the
schema and wiring (both pushed together so CI sees the green tip).
max.poll.interval.ms was pinned to its 5 minute default. Lowering
mpu_parts_concurrency to protect Metadata makes a large-MPU
replication exceed it, which evicts the queue processor's consumer
and redelivers the partition - losing the partial transfer and
leaving orphan parts. Let the replication queue processor set an
optional maxPollIntervalMs in its config block (min 45000, the
librdkafka session.timeout.ms floor); unset falls back to
BackbeatConsumer's existing 300000 default. Scoped to the queue
processor - the only consumer with the slow-task problem (the status
processor does fast metadata writes). Adds a matching
EXTENSIONS_REPLICATION_QUEUE_PROCESSOR_MAX_POLL_INTERVAL_MS mapping.
The schema only enforced a floor (min 45000). Add a 30-minute ceiling
(1800000 ms) so an operator can't set an unbounded maxPollIntervalMs.
This knob exists to let a single slow large-MPU transfer finish, not to
mask a wedged consumer: a poll gap beyond 30 min means the task is stuck,
and we want kafka to evict and redeliver rather than hide the failure
behind an ever-growing timeout. Over-limit values are rejected, mirroring
how the floor rejects under-limit values (no silent clamping).
@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.88%. Comparing base (c33e8be) to head (cd24b31).

Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
...tensions/replication/ReplicationConfigValidator.js 100.00% <100.00%> (ø)
...sions/replication/queueProcessor/QueueProcessor.js 72.94% <ø> (ø)

... and 2 files with indirect coverage changes

Components Coverage Δ
Bucket Notification 80.22% <ø> (ø)
Core Library 81.27% <ø> (-0.06%) ⬇️
Ingestion 70.63% <ø> (ø)
Lifecycle 79.46% <ø> (ø)
Oplog Populator 85.83% <ø> (ø)
Replication 59.74% <100.00%> (+0.06%) ⬆️
Bucket Scanner 85.76% <ø> (ø)
@@                 Coverage Diff                 @@
##           development/9.4    #2758      +/-   ##
===================================================
- Coverage            74.89%   74.88%   -0.02%     
===================================================
  Files                  200      200              
  Lines                13668    13670       +2     
===================================================
  Hits                 10237    10237              
- Misses                3421     3423       +2     
  Partials                10       10              
Flag Coverage Δ
api:retry 9.15% <100.00%> (+0.01%) ⬆️
api:routes 8.97% <100.00%> (+0.01%) ⬆️
bucket-scanner 85.76% <ø> (ø)
ft_test:queuepopulator 10.97% <100.00%> (+0.02%) ⬆️
ingestion 12.56% <100.00%> (+0.01%) ⬆️
lib 7.77% <100.00%> (-0.01%) ⬇️
lifecycle 18.92% <100.00%> (+0.01%) ⬆️
notification 1.02% <0.00%> (-0.01%) ⬇️
oplogPopulator 0.14% <0.00%> (-0.01%) ⬇️
replication 18.65% <100.00%> (+0.01%) ⬆️
unit 51.38% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown

LGTM — clean implementation. The new optional maxPollIntervalMs config field has correct Joi bounds (45s–30min), passes through QueueProcessor to BackbeatConsumer properly, falls back to BackbeatConsumer's existing 300s default when unset, and the docker-entrypoint.sh env var override follows established patterns. Tests cover the happy path, unset case, and both boundary violations.

Review by Claude Code

@bert-e bert-e merged commit cd24b31 into development/9.4 Jun 9, 2026
42 checks passed
@bert-e bert-e deleted the w/9.4/improvement/BB-780-crr-poll-interval branch June 9, 2026 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants