Skip to content

fix(ci): repair direct-backport-push YAML and post backport result comments#4846

Merged
Yicong-Huang merged 8 commits intoapache:mainfrom
Yicong-Huang:fix/backport-yaml-and-comment-log
May 3, 2026
Merged

fix(ci): repair direct-backport-push YAML and post backport result comments#4846
Yicong-Huang merged 8 commits intoapache:mainfrom
Yicong-Huang:fix/backport-yaml-and-comment-log

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented May 3, 2026

What changes were proposed in this PR?

Three changes to .github/workflows/direct-backport-push.yml.

1. Repair YAML. The inline python3 -c '<source>' from #4696 put Python at column 0 inside a run: | block indented at column 10. YAML treats import re, sys as a top-level key, so every push to main failed in 0 seconds with 0 jobs (e.g. run 25271247473). Python can't be re-indented (top-level statements reject leading whitespace), so the script moves to .github/scripts/compose-backport-message.py. Behavior unchanged.

2. Surface backport status on the original commit + PR. Cherry-picks produce a new SHA, so the release branch never appears in the auto-derived branch badge on the main commit. Three channels instead — commit status badge, commit comment, PR comment — on success; commit status + PR comment on failure with an inline conflict diagnosis.

Success PR comment:

Backport to release/0.4 succeeded as a1b2c3d. Run

Failure PR comment (when cherry-pick conflicts):

Backport to release/0.4 failed. See job log.

Conflicts in:

  • f.txt

Likely-missing prerequisites on main (commits that touched these files between merge-base 6343a1bc and c027f3b2^ — consider backporting these first):

  • 958b8e8 main: prereq edit f

Capped at 5 files / 10 commits; full detail stays in the job log. Rebase-race conflicts get the same shape but list the racing commits on origin/<target> instead.

3. Retry + structured logging. git push retries 5x with [0, 5, 15, 30, 60]s backoff and rebases on origin/<target> between attempts to absorb push races. Annotation API calls retry with [0, 2, 5, 15]s and degrade to warnings on final failure (a 5xx on a comment shouldn't undo a successful cherry-pick). Every phase is wrapped in ::group:: markers with a [backport <target>] ... prefix.

Any related issues, documentation, discussions?

Fixes the regression introduced in #4696.

How was this PR tested?

yaml.safe_load parses the workflow. compose-backport-message.py round-trips through git interpret-trailers --parse with Co-authored-by preserved. The conflict diagnosis output above came verbatim from a throwaway repo where main introduces a prerequisite edit + feature commit and the release branch touches the same lines.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7, 1M context)

…mments

The inline `python3 -c '<source>'` block introduced in the trailer-aware
backport message composer had its Python source at column 0 inside a
`run: |` block whose indentation indicator was set at column 10 by the
preceding bash. YAML treated `import re, sys` as a top-level key and
GitHub Actions failed every push to main with 0 jobs / 0 seconds. Move
the Python into `.github/scripts/compose-backport-message.py` and call
it from the workflow.

While here, add post-cherry-pick steps that:
* On success, post a commit comment on the original main commit naming
  the backport target branch and the new SHA, plus a comment on the
  original PR with the same info.
* On failure, post a comment on the original PR linking to the failing
  matrix-leg job log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added fix python ci changes related to CI labels May 3, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.11%. Comparing base (fc83951) to head (3bde08a).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4846      +/-   ##
============================================
+ Coverage     43.06%   43.11%   +0.04%     
- Complexity     2036     2104      +68     
============================================
  Files           957      957              
  Lines         34077    34946     +869     
  Branches       3753     3893     +140     
============================================
+ Hits          14676    15067     +391     
- Misses        18629    19095     +466     
- Partials        772      784      +12     
Flag Coverage Δ
access-control-service 28.12% <ø> (ø)
agent-service 33.49% <ø> (-0.24%) ⬇️
amber 41.44% <ø> (+0.39%) ⬆️
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 32.40% <ø> (-0.85%) ⬇️
frontend 34.97% <ø> (-0.31%) ⬇️
python 84.72% <ø> (-0.10%) ⬇️
workflow-compiling-service 47.72% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Yicong-Huang and others added 5 commits May 2, 2026 23:10
Commit comments are easy to miss in busy commit pages. A commit status
shows next to CI badges on both the commit and any PR that references
it, with target_url linking to the new SHA on success or the failing
job on failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…on steps

The bash cherry-pick step now retries `git push` up to 5 times with
backoff [0, 5, 15, 30, 60]s. Between attempts it refreshes
`origin/<target>` and rebases the single backport commit on top, so a
race with another push to the same release branch resolves itself
instead of leaving the run in a "almost backported" state. A genuine
rebase conflict aborts the rebase and fails loudly. Each phase is
wrapped in `::group::` markers and emits a `[backport <target>] ...`
prefix for greppable logs (parent_count, base_sha, local_sha, new_sha,
remote_head per attempt).

The github-script annotation steps (commit status, commit comment, PR
comment, job-URL lookup) gain a `withRetry` helper with the same
backoff schedule. Annotation failures degrade to warnings instead of
demoting the whole job — when the cherry-pick + push has already
succeeded, a transient 5xx on a comment shouldn't undo that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rt conflict

When a cherry-pick or rebase hits a conflict, the run now logs:
* every conflicted file
* the conflict-marker line numbers in each file (`grep -nE` on
  `<<<<<<<` / `=======` / `>>>>>>>`)
* commits on `main` between merge-base and the source commit that
  touched each conflicted file — these are the likely-missing
  prerequisite commits the backport needs first
* commits on the release branch that diverged from main on the same
  file — these are what's already there
* the most recent commits anywhere that touched the file

For the rebase-during-push retry path (race with another push to the
same release branch), the diagnosis instead lists the racing commits
that landed on origin/<target> while this run was preparing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cherry-pick step already logs the likely-missing prerequisite
commits and per-file conflict markers. Mirror that into a condensed
markdown summary written to \$RUNNER_TEMP/backport-diagnosis.md (capped
at 5 conflicted files and 10 prerequisite commits to keep PR comments
scannable). The failure script reads the file and appends it to the
PR comment so the reviewer sees, in the PR itself, which commits
likely need to be backported first.

Same treatment for rebase-during-push conflicts: the comment lists the
racing commits that landed on origin/<target> between the start of the
run and the push attempt.

If no diagnosis file exists (e.g. failure was a permissions error or
network 5xx after retries), the comment falls back to the basic
"Backport failed. See job log." form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Yicong-Huang Yicong-Huang added the release/v1.1.0-incubating back porting to release/v1.1.0-incubating label May 3, 2026
@bobbai00 bobbai00 self-requested a review May 3, 2026 06:30
@Yicong-Huang Yicong-Huang enabled auto-merge (squash) May 3, 2026 06:44
@Yicong-Huang Yicong-Huang merged commit af5d174 into apache:main May 3, 2026
37 checks passed
Yicong-Huang added a commit that referenced this pull request May 3, 2026
…mments (#4846)

### What changes were proposed in this PR?

Three changes to `.github/workflows/direct-backport-push.yml`.

**1. Repair YAML.** The inline `python3 -c '<source>'` from #4696 put
Python at column 0 inside a `run: |` block indented at column 10. YAML
treats `import re, sys` as a top-level key, so every push to `main`
failed in 0 seconds with 0 jobs (e.g. [run
25271247473](https://github.com/apache/texera/actions/runs/25271247473)).
Python can't be re-indented (top-level statements reject leading
whitespace), so the script moves to
`.github/scripts/compose-backport-message.py`. Behavior unchanged.

**2. Surface backport status on the original commit + PR.** Cherry-picks
produce a new SHA, so the release branch never appears in the
auto-derived branch badge on the main commit. Three channels instead —
commit status badge, commit comment, PR comment — on success; commit
status + PR comment on failure with an inline conflict diagnosis.

Success PR comment:
> Backport to [`release/0.4`](…/tree/release/0.4) succeeded as
[`a1b2c3d`](…/commit/a1b2c3d…). [Run](…)

Failure PR comment (when cherry-pick conflicts):
> Backport to `release/0.4` failed. See [job log](…/job/…).
>
> **Conflicts in:**
> - `f.txt`
>
> **Likely-missing prerequisites on main** (commits that touched these
files between merge-base `6343a1bc` and `c027f3b2^` — consider
backporting these first):
> - `958b8e8 main: prereq edit f`

Capped at 5 files / 10 commits; full detail stays in the job log.
Rebase-race conflicts get the same shape but list the racing commits on
`origin/<target>` instead.

**3. Retry + structured logging.** `git push` retries 5x with `[0, 5,
15, 30, 60]s` backoff and rebases on `origin/<target>` between attempts
to absorb push races. Annotation API calls retry with `[0, 2, 5, 15]s`
and degrade to warnings on final failure (a 5xx on a comment shouldn't
undo a successful cherry-pick). Every phase is wrapped in `::group::`
markers with a `[backport <target>] ...` prefix.

### Any related issues, documentation, discussions?

Fixes the regression introduced in #4696.

### How was this PR tested?

`yaml.safe_load` parses the workflow. `compose-backport-message.py`
round-trips through `git interpret-trailers --parse` with
`Co-authored-by` preserved. The conflict diagnosis output above came
verbatim from a throwaway repo where main introduces a prerequisite edit
+ feature commit and the release branch touches the same lines.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7, 1M context)

---------

(backported from commit af5d174)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

Backport to release/v1.1.0-incubating succeeded as 1703832. Run

bobbai00 added a commit to bobbai00/texera that referenced this pull request May 3, 2026
* ci: nightly strict license-binary check that files a tracking issue on drift

Resolves apache#4692.

PR builds run check_binary_deps.py with --ignore-transitive-version (apache#4693)
so a benign upstream version bump on a transitive dep does not block merges.
This workflow runs the same checks **without** that flag every night on
`main` so transitive drift is still visible and actionable before each
release. On non-zero exit it files (or updates) one tracking issue
identified by the stable label `license-binary-drift`; on a clean run it
closes the issue if one is open.

Workflow shape:
  - frontend-npm | agent-npm | python | jar — one job per ecosystem,
    each rebuilds its dist exactly the way build.yml does and runs the
    strict check; failures don't fail the workflow (continue-on-error)
    so all four still run.
  - jar uses the unified check across every dist's lib/ rather than a
    per-service matrix; per-service placement errors are still caught
    by build.yml on every PR, and the nightly's job is exact-version
    drift which the unified check surfaces just as well.
  - report — aggregates per-ecosystem results from artifacts and
    creates / updates / closes the tracking issue via
    actions/github-script. Skips issue management when not on the
    default branch (so workflow_dispatch on feature branches still
    runs the checks but does not surface issues).

Trigger: schedule (07:00 UTC daily) + workflow_dispatch.
Permissions: issues:write for the report job.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): repair direct-backport-push YAML and post backport result comments (apache#4846)

### What changes were proposed in this PR?

Three changes to `.github/workflows/direct-backport-push.yml`.

**1. Repair YAML.** The inline `python3 -c '<source>'` from apache#4696 put
Python at column 0 inside a `run: |` block indented at column 10. YAML
treats `import re, sys` as a top-level key, so every push to `main`
failed in 0 seconds with 0 jobs (e.g. [run
25271247473](https://github.com/apache/texera/actions/runs/25271247473)).
Python can't be re-indented (top-level statements reject leading
whitespace), so the script moves to
`.github/scripts/compose-backport-message.py`. Behavior unchanged.

**2. Surface backport status on the original commit + PR.** Cherry-picks
produce a new SHA, so the release branch never appears in the
auto-derived branch badge on the main commit. Three channels instead —
commit status badge, commit comment, PR comment — on success; commit
status + PR comment on failure with an inline conflict diagnosis.

Success PR comment:
> Backport to [`release/0.4`](…/tree/release/0.4) succeeded as
[`a1b2c3d`](…/commit/a1b2c3d…). [Run](…)

Failure PR comment (when cherry-pick conflicts):
> Backport to `release/0.4` failed. See [job log](…/job/…).
>
> **Conflicts in:**
> - `f.txt`
>
> **Likely-missing prerequisites on main** (commits that touched these
files between merge-base `6343a1bc` and `c027f3b2^` — consider
backporting these first):
> - `958b8e8 main: prereq edit f`

Capped at 5 files / 10 commits; full detail stays in the job log.
Rebase-race conflicts get the same shape but list the racing commits on
`origin/<target>` instead.

**3. Retry + structured logging.** `git push` retries 5x with `[0, 5,
15, 30, 60]s` backoff and rebases on `origin/<target>` between attempts
to absorb push races. Annotation API calls retry with `[0, 2, 5, 15]s`
and degrade to warnings on final failure (a 5xx on a comment shouldn't
undo a successful cherry-pick). Every phase is wrapped in `::group::`
markers with a `[backport <target>] ...` prefix.

### Any related issues, documentation, discussions?

Fixes the regression introduced in apache#4696.

### How was this PR tested?

`yaml.safe_load` parses the workflow. `compose-backport-message.py`
round-trips through `git interpret-trailers --parse` with
`Co-authored-by` preserved. The conflict diagnosis output above came
verbatim from a throwaway repo where main introduces a prerequisite edit
+ feature commit and the release branch touches the same lines.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7, 1M context)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: reuse build.yml for nightly via ignore_transitive_version input

Per review on apache#4734: instead of duplicating build.yml's dist-producing
steps in the nightly workflow, parametrize build.yml with a new
`ignore_transitive_version` input and have the nightly call it as a
reusable workflow with that input flipped to false. PR builds keep the
default (true). This guarantees PR and nightly runs go through identical
code paths — the only difference between them is the value of one input.

Changes:

- build.yml: add `ignore_transitive_version: boolean = true` input.
  Replace each of the 6 hard-coded `--ignore-transitive-version` flags
  (frontend/amber/platform/python/agent-service license checks) with
  `${{ inputs.ignore_transitive_version && '--ignore-transitive-version'
  || '' }}`. The platform job's check previously didn't pass the flag
  at all (strict on PRs); this commit unifies it with the rest so all
  five ecosystems behave the same: relaxed on PRs, strict on nightly.

- license-binary-nightly.yml: drop the per-ecosystem job copies. The
  workflow now has just two jobs:
    - build: `uses: ./.github/workflows/build.yml` with
      `ignore_transitive_version: false`, `secrets: inherit`.
    - report: walks the current run's jobs via listJobsForWorkflowRun,
      identifies license-check step failures (regex matches step names
      containing "license-binary" or "binary licenses"), and creates /
      updates / closes the tracking issue accordingly. Non-license
      step failures (flaky tests, network blips) are ignored so they
      don't spuriously surface as drift.

The report step's six branches (drift+new, drift+existing, clean+existing,
clean+nothing, non-license-failure-only, default-branch guard) were
exercised end-to-end with stubbed github/context/core under Node before
push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(workflow-core): add unit test coverage for VFSURIFactory (apache#4757)

### What changes were proposed in this PR?

Add `VFSURIFactorySpec` covering URI construction and decoding in
`VFSURIFactory`:

- `createResultURI` includes wid/eid/globalportid and the result
resource type
- Result URIs round-trip through `decodeURI`
- `createRuntimeStatisticsURI` omits the `opid/` segment
- `createConsoleMessagesURI` embeds the operator id and the
`consoleMessages` resource type
- `decodeURI` rejects non-vfs schemes, URIs missing required segments,
and unknown resource-type tails

### Any related issues, documentation, discussions?

Closes apache#4756

### How was this PR tested?

`sbt "WorkflowCore/testOnly
org.apache.texera.amber.core.storage.VFSURIFactorySpec"` — 7/7 tests
pass.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: rename ignore_transitive_version input to mode (PR | release)

Per review on apache#4734: replace the boolean input with a string "mode" so
the call sites name *what* they are (PR-time relaxed vs. release-time
strict) instead of *what flag they pass*.

  build.yml:
    inputs.mode: string, default "PR"
      "PR"      -> --ignore-transitive-version (relaxed)
      "release" -> no flag                     (strict exact-match)
    The five license-check invocations now read
      ${{ inputs.mode == 'PR' && '--ignore-transitive-version' || '' }}
    so any value other than "PR" falls through to strict, which is the
    safer side. workflow_call inputs cannot enforce string enums; the
    valid values are documented inline.

  license-binary-nightly.yml:
    Pass `mode: release` instead of `ignore_transitive_version: false`.
    Updated the inline comment + tracking-issue body wording to match.

  required-checks.yml is unchanged: it doesn't pass this input, so PR
  builds keep the default ("PR") and behave exactly as before.

Re-ran the three representative report scenarios (drift+new,
clean+existing, non-license failure only) under Node with stubbed
github/context/core; all three still behave correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(nightly): move schedule to 11:00 UTC (04:00 PDT / 03:00 PST)

Per review on apache#4734: 07:00 UTC was midnight PDT, when many people are
still working. Move to 11:00 UTC so it lands outside US-Pacific working
hours. GitHub cron is fixed UTC; the local clock-time shifts by an hour
at DST transitions.

Daily cadence is fine for now; if it turns out to be too frequent we
can drop to every 48–72 h.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Yicong Huang <17627829+Yicong-Huang@users.noreply.github.com>
Co-authored-by: Xinyuan Lin <xinyual3@uci.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI fix python release/v1.1.0-incubating back porting to release/v1.1.0-incubating

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants