Skip to content

fix(e2e): close gitea lost-commit race that flakes the Orchestrate e2e job#110

Merged
joshua-temple merged 2 commits into
mainfrom
fix/e2e-harness-lost-commit-flake
Jun 11, 2026
Merged

fix(e2e): close gitea lost-commit race that flakes the Orchestrate e2e job#110
joshua-temple merged 2 commits into
mainfrom
fix/e2e-harness-lost-commit-flake

Conversation

@joshua-temple

Copy link
Copy Markdown
Collaborator

Problem

The Orchestrate Build (cli) / E2E Tests job on main intermittently fails with generate-workflow exited 0 but did not produce .github/workflows/orchestrate.yaml. Root cause is a lost-commit race in the e2e harness, not a generate problem: GenerateWorkflows pushes the workflows commit via raw git push from the act container, then the next runner step commits through gitea's Contents API. Under parallel-suite load gitea's API layer can hold a stale branch head, parent the API commit on the pre-push head, and move the ref to it, silently discarding the pushed workflows commit. The later git fetch && git reset --hard origin/main then yields a tree with no workflows directory.

Fix

  • Verify-after-push: after the workflows git push, poll gitea's head SHA until it matches the pushed SHA before returning, closing the staleness window deterministically (includes a fix to demux the docker exec stream before parsing the pushed SHA).
  • Bounded retry plus an accurate, distinct error message at the repo-sync call site, which previously reused the generate assertion message and misattributed the failure.
  • Transport-error discrimination in the generated-file assertion (retry docker-exec transport errors, distinct from a real missing file).
  • Drop the e2e suite from -parallel 4 to -parallel 2 in build-cli.yaml to match e2e.yaml and its documented memory rationale, shrinking the race window.

Verification

  • Root and e2e modules build and vet clean; new lost_commit_test.go unit test passes.
  • The previously-failing TestMultiStepScenarios/Two_Environment_Happy_Path scenario is run repeatedly post-fix to confirm it is consistently green. The definitive proof is consecutive green Orchestrate runs on main after merge.

The e2e harness pushed generated workflows via raw git push from the act
container, then immediately wrote the next commit through gitea's Contents
API. Under parallel-suite load gitea's API layer could hold a stale branch
head from before the push, parent the API commit on it, and move the branch
ref there - silently discarding the workflows commit. The later
fetch+reset then produced a tree with no .github/workflows, surfacing as a
misattributed generate-workflow failure.

Verify the push converged before returning: read the pushed SHA in the
container and poll gitea's branch head until it matches (bounded). Add a
bounded fetch+reset retry at the sync site with its own distinct error
message, and treat docker-exec transport errors in the workflow probe as
retryable rather than as a missing file. Cap build-cli e2e parallelism at 2
to match e2e.yaml and shrink the race window.

Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
Signed-off-by: Joshua Temple <joshua.temple@stablekernel.com>
@joshua-temple joshua-temple merged commit 63dc71f into main Jun 11, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant