Skip to content

feat(manager/container): add configurable initial splay and max jitter factors#3854

Merged
dims merged 1 commit intogoogle:masterfrom
sambhav-jain-16:splay-and-jitter
Apr 16, 2026
Merged

feat(manager/container): add configurable initial splay and max jitter factors#3854
dims merged 1 commit intogoogle:masterfrom
sambhav-jain-16:splay-and-jitter

Conversation

@sambhav-jain-16
Copy link
Copy Markdown
Contributor

@sambhav-jain-16 sambhav-jain-16 commented Mar 17, 2026

This PR is an extension of #2918.
The initial jitter PR aims to avoid CPU spikes when hundreds of HKs are running. Although it does the job as intended, the actual interval ranges from [HK, 2HK], resulting in higher variance when cadvisor metric timestamps are considered for rate calculations in counter metrics. Since the jitter factor is not configurable, it is impossible to reduce the variance in the current scenarios.
Also, a good way to avoid CPU spikes is to spread out the container HKs initially, so that subsequent jitter can be set to zero and we get the same benefit as with random jitter every time.

To keep the change backward-compatible, the default values are set to 1.0.

Attaching a screenshot showing how this change helped to reduce variance with initial splay at 1.0 and jitter factor at 0.0, without CPU spikes in cadvisor.

image (3)

@sambhav-jain-16
Copy link
Copy Markdown
Contributor Author

sambhav-jain-16 commented Apr 13, 2026

Hi @dims
Can you PTAL at this PR? TIA

Comment thread manager/container.go Outdated
@sambhav-jain-16
Copy link
Copy Markdown
Contributor Author

sambhav-jain-16 commented Apr 13, 2026

integration tests failing with

Unable to find image 'gcr.io/k8s-staging-test-infra/bootstrap:v20250702-52f5173c3a' locally
docker: Error response from daemon: manifest for gcr.io/k8s-staging-test-infra/bootstrap:v20250702-52f5173c3a not found: manifest unknown: Failed to fetch "v20250702-52f5173c3a"

@dims
Copy link
Copy Markdown
Collaborator

dims commented Apr 13, 2026

integration tests failing with Unable to find image 'gcr.io/k8s-staging-test-infra/bootstrap:v20250702-52f5173c3a' locally

switch to gcr.io/k8s-staging-test-infra/bootstrap:v20251209-855adc2699

@sambhav-jain-16
Copy link
Copy Markdown
Contributor Author

integration tests failing with Unable to find image 'gcr.io/k8s-staging-test-infra/bootstrap:v20250702-52f5173c3a' locally

switch to gcr.io/k8s-staging-test-infra/bootstrap:v20251209-855adc2699

Created a different PR for this #3865, will rebase this PR after merging the fix

@sambhav-jain-16
Copy link
Copy Markdown
Contributor Author

Hi @dims
LMK if you have other review comments

@sambhav-jain-16 sambhav-jain-16 requested a review from dims April 16, 2026 09:08
@dims dims added this pull request to the merge queue Apr 16, 2026
Merged via the queue into google:master with commit c8138ce Apr 16, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants