Run benchmarks in parallel with tests, not sequentially #1130

sbryngelson · 2026-02-09T16:27:10Z

User description

Summary

Move benchmark jobs from bench.yml into test.yml so they run in parallel with tests instead of sequentially after them, reducing typical PR merge time from ~10 hours to ~5 hours
Add cross-cancellation jobs: if tests fail, benchmarks are cancelled (and vice versa)
Delete bench.yml - benchmarks now live entirely in test.yml
Docs-only PRs (README, etc.) continue to skip both tests and benchmarks via the existing checkall gate

Test plan

Verify test and bench jobs both start after lint-gate passes (parallel, not sequential)
Verify a docs-only PR skips both self-hosted tests and benchmarks
Verify bench jobs clone both PR and master, build both, run benchmarks, and post comparison
Verify cross-cancellation: test failure cancels benchmarks and vice versa

Summary by CodeRabbit

Chores
- Removed the standalone benchmark workflow and excluded it from CI path filtering.
- Consolidated benchmark orchestration into the primary CI workflow and adjusted concurrency/cancel-on-failure behavior.
New Features
- Added a unified Frontier "all-configs" path to streamline test and benchmark runs.
- Introduced coordinated multi-node Frontier test/benchmark orchestration, automated post-processing, and improved log collection and archiving.

CodeAnt-AI Description

Run Frontier tests and benchmarks in consolidated multi-node SLURM jobs and run benchmarks in parallel with tests

What Changed

Benchmarks were moved into the main test workflow and now start in parallel with correctness tests instead of waiting for them; the standalone bench workflow was removed.
Frontier test and benchmark runs are consolidated into single multi-node SLURM allocations (one for all test configs, one for all benchmark configs) so all Frontier configs launch together rather than as many separate jobs.
Test failures cancel running benchmarks, but benchmark failures no longer cancel correctness tests; CI logs and outputs for Frontier runs are collected and posted per-config.
Improved Frontier job orchestration: per-config builds run concurrently on the login node and per-config runs execute in parallel on allocated compute nodes, with better monitoring and GPU detection to reduce unexpected failures.

Impact

✅ Faster merges (shorter CI turnaround)
✅ Fewer Frontier SLURM jobs in the queue
✅ Fewer correctness runs cancelled by benchmark flakes

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Move benchmark jobs from bench.yml into test.yml so tests and benchmarks run in parallel instead of sequentially, reducing merge time from ~10hrs to ~5hrs. Add cross-cancellation jobs so a failure in either group cancels the entire workflow run. Docs-only PRs continue to skip both. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai · 2026-02-09T16:27:15Z

CodeAnt AI is reviewing your PR.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

coderabbitai · 2026-02-09T16:27:34Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Removed the standalone .github/workflows/bench.yml and its filter entry; consolidated benchmarking into .github/workflows/test.yml and added multi-node Frontier orchestration and post-processing scripts under .github/scripts/.

Changes

Cohort / File(s)	Summary
Workflow surface `\.github/file-filter.yml`, `\.github/workflows/bench.yml`, `\.github/workflows/test.yml`	Removed `bench.yml` and its file-filter entry; expanded `test.yml` to incorporate `bench` publicly, add `frontier_all` matrix entries, update concurrency groups, adjust build/test gating, and add dedicated Frontier bench/test paths and artifact/log handling.
Frontier orchestration scripts `\.github/scripts/run_frontier_all_tests.sh`, `\.github/scripts/run_frontier_all_benchmarks.sh`, `\.github/scripts/frontier_test_config.sh`, `\.github/scripts/frontier_bench_config.sh`, `\.github/scripts/frontier_bench_post.sh`	Added scripts to create per-config copies, run parallel builds on login node, submit single multi-node sbatch jobs that SSH into allocated nodes to execute per-config test/bench tasks, aggregate YAML outputs, and post-process bench diffs.
Integration / helpers referenced `.../monitor_slurm_job.sh`, `.../pr/.github/scripts/*`, bench_diff/loading utilities	New scripts call existing SLURM monitoring and bench-diff/loading helpers; verify artifact paths, naming conventions, and PR vs master ref handling.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant GH as GitHub Actions (test.yml)
    participant Repo as Repository (PR & Master refs)
    participant Login as Login Node (build)
    participant SLURM as SLURM Scheduler
    participant Node as Compute Node (per-config)
    participant Diff as bench_diff / Poster

    rect rgba(93,156,236,0.5)
    GH->>Repo: clone PR and master refs
    end

    rect rgba(96,169,79,0.5)
    GH->>Login: run per-config builds (parallel)
    Login-->>GH: build artifacts/logs
    end

    rect rgba(236,151,31,0.5)
    GH->>SLURM: submit multi-node sbatch (run_frontier_all_*.sh)
    SLURM->>Node: allocate nodes and SSH per-config tasks
    Node->>Node: run frontier_test_config / frontier_bench_config
    Node-->>SLURM: write per-config YAML outputs
    SLURM-->>GH: job completion and collected outputs
    end

    rect rgba(178,120,255,0.5)
    GH->>Diff: run bench_diff & generate comment
    Diff-->>GH: diff results / comment payload
    GH->>Repo: post comment & upload logs/artifacts
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Shell completion auto-install and pre-commit hook improvements #1124 — Overlapping changes to benchmark CI and Frontier submit/build scripts, directly touching the removed bench.yml surface.
benchmarks hardening! #1078 — Related edits to CI benchmarking orchestration and parallel benchmarking scripts referenced by the new Frontier scripts.

Suggested labels

Review effort 4/5

Poem

🐰 I hopped through CI at dawn,
removed a bench and carried on,
nodes hum, scripts align,
YAMLs bloom in tidy line,
the rabbit cheers — tests refashioned!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective of the PR: moving benchmarks to run in parallel with tests instead of sequentially.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check	✅ Passed	PR description covers all major changes, includes testing plan, and links to issue via type reference; however, it lacks some template elements like explicit 'Type of change' checkboxes and GPU testing details.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-09T16:27:40Z

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

 _____________________________________________________________
< All programming is an exercise in caching. - Terje Mathisen >
 -------------------------------------------------------------
  \
   \   \
        \ /\
        ( )
      .( o ).

_{✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.}

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-09T16:27:45Z

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

 _____________________________________________________________
< All programming is an exercise in caching. - Terje Mathisen >
 -------------------------------------------------------------
  \
   \   \
        \ /\
        ( )
      .( o ).

_{✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.}

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-09T16:27:55Z

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

 _____________________________________________________________
< All programming is an exercise in caching. - Terje Mathisen >
 -------------------------------------------------------------
  \
   \   \
        \ /\
        ( )
      .( o ).

_{✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.}

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codeant-ai · 2026-02-09T16:28:19Z

CodeAnt AI finished reviewing your PR.

coderabbitai · 2026-02-09T16:31:47Z

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow Consolidation `.github/file-filter.yml`, `.github/workflows/bench.yml`, `.github/workflows/test.yml`	Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check	✅ Passed	The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-09T16:31:52Z

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow Consolidation `.github/file-filter.yml`, `.github/workflows/bench.yml`, `.github/workflows/test.yml`	Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check	✅ Passed	The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-09T16:31:58Z

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow Consolidation `.github/file-filter.yml`, `.github/workflows/bench.yml`, `.github/workflows/test.yml`	Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check	✅ Passed	The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai · 2026-02-09T16:32:08Z

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow Consolidation `.github/file-filter.yml`, `.github/workflows/bench.yml`, `.github/workflows/test.yml`	Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check	✅ Passed	The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/test.yml:
- Around line 285-287: The bench job is missing the NODE_OPTIONS env used in the
self job and can cause OOMs on Phoenix self-hosted runners; update the bench
job's env block (the same place where ACTIONS_RUNNER_FORCE_ACTIONS_NODE_VERSION
and ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION are set) to include NODE_OPTIONS:
'--max-old-space-size=2048' so Phoenix bench runners get the same Node memory
limit as the self job.

🧹 Nitpick comments (2)

.github/workflows/test.yml (2)
301-306: Parallel build: first failure orphans the second background job.

GitHub Actions' default shell uses set -eo pipefail. If wait %1 returns non-zero, the shell exits immediately and wait %2 is never called — the second build process keeps running in the background on the self-hosted runner. While the step correctly fails, the orphaned process wastes resources.

A more robust pattern waits for both jobs unconditionally and then checks both exit codes:
Proposed fix
       - name: Setup & Build
         if: matrix.build_script != ''
         run: |
           (cd pr     && ${{ matrix.build_script }}) &
           (cd master && ${{ matrix.build_script }}) &
-          wait %1 && wait %2
+          wait -n 2 || true  # suppress set -e for individual waits
+          wait -n 2 || true
+          wait %1; s1=$?
+          wait %2; s2=$?
+          if [ "$s1" -ne 0 ] || [ "$s2" -ne 0 ]; then exit 1; fi
Actually, a simpler approach since both jobs have already finished:
Simpler alternative
       - name: Setup & Build
         if: matrix.build_script != ''
         run: |
+          set +e
           (cd pr     && ${{ matrix.build_script }}) &
           (cd master && ${{ matrix.build_script }}) &
-          wait %1 && wait %2
+          wait %1; s1=$?
+          wait %2; s2=$?
+          [ "$s1" -eq 0 ] && [ "$s2" -eq 0 ]
311-314: bench_diff step missing if: always() or if: success() guard.

If the "Bench (Master v. PR)" step fails, the "Generate & Post Comment" step is skipped by default (GitHub Actions skips subsequent steps on failure). If you want the diff comment posted even on partial benchmark failure (e.g., to report which benchmarks regressed), consider adding if: always(). If the current behavior (skip on failure) is intended, this is fine as-is.

.github/workflows/test.yml

coderabbitai · 2026-02-09T16:32:38Z

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow Consolidation `.github/file-filter.yml`, `.github/workflows/bench.yml`, `.github/workflows/test.yml`	Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check	✅ Passed	The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Benchmark failures (e.g. transient network issues) should not cancel correctness tests. Keep only cancel-on-test-failure so that test failures still cancel benchmarks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov · 2026-02-09T20:10:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.03%. Comparing base (0ba6c02) to head (e2cb418).

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1130   +/-   ##
=======================================
  Coverage   44.03%   44.03%           
=======================================
  Files          70       70           
  Lines       20649    20649           
  Branches     2053     2053           
=======================================
  Hits         9093     9093           
  Misses      10368    10368           
  Partials     1188     1188

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Reduces Frontier queue depth from 11 SLURM jobs to 2 by consolidating test configs into a single 5-node allocation and bench configs into a single 6-node allocation. Builds remain on login nodes; tests/benchmarks run on compute nodes via ssh to allocated nodes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai · 2026-02-09T23:18:55Z

CodeAnt AI is running Incremental review

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

.github/scripts/frontier_bench_post.sh

.github/scripts/frontier_test_config.sh

.github/scripts/run_frontier_all_benchmarks.sh

codeant-ai · 2026-02-09T23:21:50Z

CodeAnt AI Incremental review completed.

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In @.github/scripts/frontier_bench_post.sh:
- Around line 34-35: The issue is that the module environment sourced by ".
./mfc.sh load -c \"$flag\" -m g" is lost because it runs in a separate subshell
from the subsequent "./mfc.sh bench_diff" call; fix it by running both commands
in the same subshell or sourcing the loader in the parent shell so the
environment persists — e.g., change the two separate "(cd pr && ...)"
invocations so they are combined into a single "(cd pr && . ./mfc.sh load -c
\"$flag\" -m g && ./mfc.sh bench_diff \"../$master_yaml\" \"../$pr_yaml\")" (or
alternatively run "cd pr" then source "mfc.sh load" in the current shell before
invoking "./mfc.sh bench_diff"), ensuring the loader (./mfc.sh load) and
bench_diff run in the same shell/session.

In @.github/scripts/frontier_test_config.sh:
- Around line 28-34: The GPU probe using rocm-smi (the gpus and ngpus
assignments) is executed unconditionally and will cause CPU-only runs to fail;
move the rocm-smi call and the ngpus calculation inside the branch that runs
when device equals "gpu". Specifically, relocate the lines that set
gpus=$(rocm-smi ...) and ngpus=$(...) into the if [ "$device" = "gpu" ] branch
before invoking ./mfc.sh, and add a safe fallback for ngpus (e.g., default to 1
or exit with a clear error) if rocm-smi yields no results; ensure you reference
the same variable names (gpus, ngpus, device) and preserve the existing ./mfc.sh
invocation when fixing.

In @.github/workflows/test.yml:
- Around line 270-275: The "Setup & Build" workflow step is dead because
matrix.build_script is set to the empty string for every matrix entry, so the
conditional if: matrix.build_script != '' is never true; either remove the
entire "Setup & Build" step from the workflow or populate the matrix entries
with non-empty build_script values for the matrix entries that need building so
the condition can be true, and update any references to that step accordingly
(look for the job name/step "Setup & Build" and the matrix variable
matrix.build_script).
- Around line 310-319: The workflow only implements one-directional cancellation
via the job named cancel-on-test-failure; to make cancellation bidirectional add
a symmetric job (e.g., cancel-on-bench-failure) that uses needs: [bench], if:
failure(), runs-on: ubuntu-latest and the same steps (Cancel Workflow Run using
gh run cancel ${{ github.run_id }} --repo ${{ github.repository }} with GH_TOKEN
env) so benchmarks failing will cancel the tests; ensure the new job mirrors
cancel-on-test-failure but references needs: [bench] and an appropriate name and
position in the YAML.

🧹 Nitpick comments (4)

.github/scripts/frontier_bench_config.sh (1)
29-33: Quote $(nproc) to prevent word splitting (ShellCheck SC2046).

While nproc returns a single integer in practice, quoting the command substitution is the safer shell idiom and silences the linter.
Proposed fix
-    ./mfc.sh bench --mem 1 -j $(nproc) -o "$job_slug.yaml" -- -c "$cluster" $device_opts -n $n_ranks
+    ./mfc.sh bench --mem 1 -j "$(nproc)" -o "$job_slug.yaml" -- -c "$cluster" $device_opts -n $n_ranks
.github/scripts/run_frontier_all_benchmarks.sh (2)

78-86: Duplicated config table inside the heredoc is a maintenance hazard.

The config array on lines 79–86 must be kept in exact sync with lines 15–22. If someone adds, removes, or reorders a config in one place but not the other, the node-to-config mapping silently breaks (wrong benchmark runs on wrong node, or index out of bounds). The same pattern appears in run_frontier_all_tests.sh lines 81–88.

Consider generating the inner config table dynamically (e.g., write it to a file in Phase 1 and source it inside the heredoc), or at minimum add a CI-time assertion that both arrays match.

111-118: $? after a failed if wait may not carry the original exit code in all shells.

In bash, $? in the else branch of if wait "$pid" correctly reflects the child's exit status. However, this is a bash-specific behavior — the shebang (#!/bin/bash) ensures it, but worth a brief comment for future maintainers, since it's a subtle pattern.
.github/scripts/run_frontier_all_tests.sh (1)
56-56: Hardcoded "5-node" in the echo; the benchmarks script uses $num_nodes.

Minor inconsistency — run_frontier_all_benchmarks.sh line 53 uses "Submitting ${num_nodes}-node SLURM job..." while this script hardcodes the number. Using $num_nodes keeps it self-documenting if configs change.
Proposed fix
-echo "All builds complete. Submitting 5-node SLURM job..."
+echo "All builds complete. Submitting ${num_nodes}-node SLURM job..."

.github/scripts/frontier_bench_post.sh

.github/scripts/frontier_test_config.sh

.github/workflows/test.yml

cubic-dev-ai

4 issues found across 6 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name=".github/scripts/frontier_test_config.sh">

<violation number="1" location=".github/scripts/frontier_test_config.sh:28">
P1: The `rocm-smi` command is invoked unconditionally before the device type check. With `set -e` enabled, this will cause CPU-only tests to fail immediately on nodes without ROCm installed, as the script will terminate before reaching the `else` branch. Move the GPU discovery logic inside the `if [ "$device" = "gpu" ]` block.</violation>
</file>

<file name=".github/scripts/run_frontier_all_benchmarks.sh">

<violation number="1" location=".github/scripts/run_frontier_all_benchmarks.sh:140">
P1: The job ID extraction using `grep -oE '[0-9]+'` is fragile and can incorrectly match numbers from error messages (e.g., account names like 'ENG160' or error codes). This could cause the script to proceed with an invalid job ID when `sbatch` actually failed. Parse only the specific 'Submitted batch job' line to ensure a real job ID is captured.</violation>
</file>

<file name=".github/scripts/frontier_bench_config.sh">

<violation number="1" location=".github/scripts/frontier_bench_config.sh:13">
P2: `frontier_bench_config.sh` always loads GPU mode (`-m g`) even when `device=cpu`, unlike the test config. CPU benchmark runs will get the wrong module set; mirror the device-based mode selection.</violation>
</file>

<file name=".github/scripts/frontier_bench_post.sh">

<violation number="1" location=".github/scripts/frontier_bench_post.sh:35">
P1: The environment setup and `bench_diff` invocation are in separate subshells, so the environment changes from `. ./mfc.sh load` are not visible to `bench_diff`. Both commands should run in the same subshell to ensure the loaded environment is applied.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

.github/scripts/frontier_test_config.sh

.github/scripts/run_frontier_all_benchmarks.sh

.github/scripts/frontier_bench_post.sh

.github/scripts/frontier_bench_config.sh

Use rsync with --link-dest and --exclude to avoid copying the target directories back into themselves when creating per-config source copies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Move rocm-smi GPU discovery inside the gpu branch so CPU configs don't fail with set -e (frontier_test_config.sh) - Combine mfc.sh load and bench_diff into single subshell so loaded module environment persists (frontier_bench_post.sh) - Use device-based mode selection in bench config for consistency (frontier_bench_config.sh) - Parse 'Submitted batch job' line instead of bare grep for digits (run_frontier_all_tests.sh, run_frontier_all_benchmarks.sh) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Prevents rapid pushes from cancelling unrelated jobs. Each job type (github, self, bench) now has its own concurrency group keyed by ref and matrix params, so a new push only cancels stale instances of the same job — not all jobs across the workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name=".github/workflows/test.yml">

<violation number="1" location=".github/workflows/test.yml:68">
P2: The concurrency group for the github matrix job omits matrix.precision, so the default-precision job and the `single` precision job share the same group and can cancel each other when `cancel-in-progress: true` is enabled. Include precision in the concurrency key to avoid unintended cancellations and preserve parallelism.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

.github/workflows/test.yml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Eliminate config table duplication by writing configs to a file and reading it inside the sbatch heredoc (now unquoted for variable expansion) - Add EXIT trap in sbatch scripts to kill orphaned SSH processes on job cancellation - Add per-config timeout (90 min tests, 120 min benchmarks) to prevent a single hanging config from consuming the full walltime - Extract SLURM account, partition, walltime, and node count into variables at the top of each orchestration script - Validate GPU count from rocm-smi (1-16 range) with diagnostic output on failure in both test and benchmark compute-node scripts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai · 2026-02-10T17:06:22Z

CodeAnt AI is running Incremental review

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

codeant-ai · 2026-02-10T17:10:19Z

CodeAnt AI Incremental review completed.

Running all 5-6 builds simultaneously on a Frontier login node causes memory pressure that crashes the Cray compiler (optcg segfault). Cap concurrent builds at 2 using a polling semaphore to stay within login node resource limits. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai · 2026-02-11T16:45:20Z

CodeAnt AI is running Incremental review

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

.github/scripts/run_frontier_all_benchmarks.sh

.github/scripts/run_frontier_all_tests.sh

codeant-ai · 2026-02-11T16:49:59Z

CodeAnt AI Incremental review completed.

Hardlink copies (cp -al / rsync --link-dest) share the same .git/index inode across all config directories. When parallel builds invoke setuptools_scm, concurrent git status calls on the shared index cause timeouts on Lustre. Bypass git entirely by setting the pretend version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The throttle loop blocks the main process while waiting for build slots, so the post-loop heartbeat subprocess never starts until late. Add inline heartbeat every 120s during throttle waits showing the last log line of each active build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai · 2026-02-12T00:22:15Z

CodeAnt AI is running Incremental review

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

.github/scripts/run_frontier_all_benchmarks.sh

.github/scripts/run_frontier_all_tests.sh

codeant-ai · 2026-02-12T00:25:47Z

CodeAnt AI Incremental review completed.

read -t 0.1 (sub-second timeout) in a loop with process substitution file descriptors triggers a bash internal error (unwind_frame_run: read_builtin: frame not found) leading to a segfault. Use integer timeout (read -t 1) instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Hardlinked source copies caused deterministic Cray compiler segfaults in optcg during IPA when compiling m_phase_change.fpp.f90 with OpenACC. Switching to real copies avoids inode sharing on Lustre that triggers the compiler bug. Also bump MAX_PARALLEL from 2 to 3. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai · 2026-02-12T21:46:16Z

CodeAnt AI is running Incremental review

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

.github/scripts/run_frontier_all_benchmarks.sh

.github/scripts/run_frontier_all_tests.sh

codeant-ai · 2026-02-12T21:49:27Z

CodeAnt AI Incremental review completed.

Phoenix runners work fine without the max-old-space-size override. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ners The actions/checkout clean step fails with ESTALE errors on NFS-backed storage when build artifacts from previous runs have stale file handles. Pre-clean with rm -rf (which tolerates stale handles) and disable checkout's built-in clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings February 9, 2026 16:27

codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Feb 9, 2026

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

.github/workflows/test.yml Show resolved Hide resolved

Copilot AI reviewed Feb 9, 2026

View reviewed changes

codeant-ai bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 9, 2026

codeant-ai bot reviewed Feb 9, 2026

View reviewed changes

.github/scripts/frontier_bench_post.sh Outdated Show resolved Hide resolved

.github/scripts/frontier_test_config.sh Outdated Show resolved Hide resolved

.github/scripts/run_frontier_all_benchmarks.sh Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 9, 2026

View reviewed changes

.github/scripts/frontier_bench_post.sh Outdated Show resolved Hide resolved

.github/scripts/frontier_test_config.sh Outdated Show resolved Hide resolved

.github/workflows/test.yml Show resolved Hide resolved

.github/workflows/test.yml Show resolved Hide resolved

cubic-dev-ai bot reviewed Feb 9, 2026

View reviewed changes

sbryngelson and others added 3 commits February 9, 2026 18:28

Fix cp-into-self error in consolidated Frontier test script

8c14b05

Use rsync with --link-dest and --exclude to avoid copying the target directories back into themselves when creating per-config source copies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cubic-dev-ai bot reviewed Feb 9, 2026

View reviewed changes

.github/workflows/test.yml Outdated Show resolved Hide resolved

sbryngelson and others added 2 commits February 9, 2026 20:54

Switch Phoenix GPU jobs to H200 nodes for faster scheduling

6e173e4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codeant-ai bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:XL This PR changes 500-999 lines, ignoring generated files labels Feb 10, 2026

codeant-ai bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:XL This PR changes 500-999 lines, ignoring generated files labels Feb 11, 2026

codeant-ai bot reviewed Feb 11, 2026

View reviewed changes

.github/scripts/run_frontier_all_benchmarks.sh Outdated Show resolved Hide resolved

.github/scripts/run_frontier_all_tests.sh Outdated Show resolved Hide resolved

sbryngelson and others added 3 commits February 11, 2026 14:24

Merge branch 'master' into readme

1347a54

codeant-ai bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:XL This PR changes 500-999 lines, ignoring generated files labels Feb 12, 2026

Merge branch 'master' into readme

2575eb7

codeant-ai bot reviewed Feb 12, 2026

View reviewed changes

.github/scripts/run_frontier_all_benchmarks.sh Show resolved Hide resolved

.github/scripts/run_frontier_all_tests.sh Show resolved Hide resolved

sbryngelson and others added 2 commits February 11, 2026 21:35

codeant-ai bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:XL This PR changes 500-999 lines, ignoring generated files labels Feb 12, 2026

codeant-ai bot reviewed Feb 12, 2026

View reviewed changes

.github/scripts/run_frontier_all_benchmarks.sh Show resolved Hide resolved

.github/scripts/run_frontier_all_tests.sh Show resolved Hide resolved

sbryngelson and others added 2 commits February 12, 2026 17:12

Remove NODE_OPTIONS from CI workflow

9d509fb

Phoenix runners work fine without the max-old-space-size override. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Run benchmarks in parallel with tests, not sequentially #1130

Are you sure you want to change the base?

Run benchmarks in parallel with tests, not sequentially #1130

Conversation

sbryngelson commented Feb 9, 2026 • edited by codeant-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Test plan

Summary by CodeRabbit

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

codeant-ai bot commented Feb 9, 2026

Thanks for using CodeAnt! 🎉

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

coderabbitai bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026

Uh oh!

codeant-ai bot commented Feb 9, 2026

Uh oh!

coderabbitai bot commented Feb 9, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot commented Feb 9, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot commented Feb 9, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot commented Feb 9, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Feb 9, 2026

Review failed

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

sbryngelson commented Feb 9, 2026 •

edited by codeant-ai bot

Loading

coderabbitai bot commented Feb 9, 2026 •

edited

Loading

codecov bot commented Feb 9, 2026 •

edited

Loading