fix: harden e2e against gitea throttling and container contention#121
Merged
Conversation
Run the act-heavy e2e scenarios serially (-parallel 1) in CI and raise the go test timeout to 60m. Under concurrent execution the 4-core runner's gitea + act + job containers contend, throttling gitea (405 'try again later') and destabilising act runs, which surfaces as intermittent, product-unrelated failures. Serial execution removes that contention. Wrap the gitea REST calls that have been observed to throttle (merge, create-pr, create-branch, change-files, label create/apply) with a bounded retry on transient 405 'try again later' and 5xx responses. The retry is safe: gitea returns these before applying any state change, so re-issuing cannot double-apply a mutation. Real 4xx client errors are surfaced immediately so expect-failure assertions stay deterministic. Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The act-heavy e2e scenarios pass individually but fail intermittently when many run concurrently (CI
-parallel 2on a 4-core runner; local full suite at default GOMAXPROCS). Two transient symptoms recur under load:405 Method Not Allowed - {"message":"Please try again later"}(throttling under load), notably on PR merge.These are resource-contention flakes, not product or scenario defects.
Fix
build-cli.yamlnow runsgo test -parallel 1 -timeout 60m, with a 70m jobtimeout-minutes.e2e.yaml's dispatch default parallelism drops from 2 to 1. Serial execution removes the container contention that is the root cause; the longer timeout covers the slower wall-clock.405"try again later" and on5xx. Real4xxclient errors are surfaced immediately, so expect-failure assertions stay deterministic. The retry is safe because gitea returns these throttle responses before applying any state change, so a re-issue cannot double-apply a mutation.No retry was added at the act-run layer: a bare retry of an act run that may already have mutated gitea state is unsafe, and the act exit code does not reliably distinguish a transient infra failure from a genuine job-failure conclusion (which legitimate expect-failure scenarios produce). The gitea-client retry covers the transient setup-step throttle that is the usual upstream cause, and the serial execution removes the contention itself.
Verification
go build ./...,go vet ./...(e2e), andgolangci-lint run ./e2e/...all clean.Hotfix Clean Apply, which exercises the retried merge path) passes locally with the change.Note: the
Promote forcescenario fails deterministically on currentmainin isolation (a separate pre-existing scenario defect, reproduced on a clean checkout with no concurrency), so it is tracked separately and is not the contention flake this PR targets.