Skip to content

Conversation

@sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Feb 9, 2026

User description

Summary

  • Move benchmark jobs from bench.yml into test.yml so they run in parallel with tests instead of sequentially after them, reducing typical PR merge time from ~10 hours to ~5 hours
  • Add cross-cancellation jobs: if tests fail, benchmarks are cancelled (and vice versa)
  • Delete bench.yml - benchmarks now live entirely in test.yml
  • Docs-only PRs (README, etc.) continue to skip both tests and benchmarks via the existing checkall gate

Test plan

  • Verify test and bench jobs both start after lint-gate passes (parallel, not sequential)
  • Verify a docs-only PR skips both self-hosted tests and benchmarks
  • Verify bench jobs clone both PR and master, build both, run benchmarks, and post comparison
  • Verify cross-cancellation: test failure cancels benchmarks and vice versa

Summary by CodeRabbit

  • Chores

    • Removed the standalone benchmark workflow and excluded it from CI path filtering.
    • Consolidated benchmark orchestration into the primary CI workflow and adjusted concurrency/cancel-on-failure behavior.
  • New Features

    • Added a unified Frontier "all-configs" path to streamline test and benchmark runs.
    • Introduced coordinated multi-node Frontier test/benchmark orchestration, automated post-processing, and improved log collection and archiving.

CodeAnt-AI Description

Run Frontier tests and benchmarks in consolidated multi-node SLURM jobs and run benchmarks in parallel with tests

What Changed

  • Benchmarks were moved into the main test workflow and now start in parallel with correctness tests instead of waiting for them; the standalone bench workflow was removed.
  • Frontier test and benchmark runs are consolidated into single multi-node SLURM allocations (one for all test configs, one for all benchmark configs) so all Frontier configs launch together rather than as many separate jobs.
  • Test failures cancel running benchmarks, but benchmark failures no longer cancel correctness tests; CI logs and outputs for Frontier runs are collected and posted per-config.
  • Improved Frontier job orchestration: per-config builds run concurrently on the login node and per-config runs execute in parallel on allocated compute nodes, with better monitoring and GPU detection to reduce unexpected failures.

Impact

✅ Faster merges (shorter CI turnaround)
✅ Fewer Frontier SLURM jobs in the queue
✅ Fewer correctness runs cancelled by benchmark flakes

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Move benchmark jobs from bench.yml into test.yml so tests and benchmarks
run in parallel instead of sequentially, reducing merge time from ~10hrs
to ~5hrs. Add cross-cancellation jobs so a failure in either group cancels
the entire workflow run. Docs-only PRs continue to skip both.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 9, 2026 16:27
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 9, 2026

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removed the standalone .github/workflows/bench.yml and its filter entry; consolidated benchmarking into .github/workflows/test.yml and added multi-node Frontier orchestration and post-processing scripts under .github/scripts/.

Changes

Cohort / File(s) Summary
Workflow surface
\.github/file-filter.yml, \.github/workflows/bench.yml, \.github/workflows/test.yml
Removed bench.yml and its file-filter entry; expanded test.yml to incorporate bench publicly, add frontier_all matrix entries, update concurrency groups, adjust build/test gating, and add dedicated Frontier bench/test paths and artifact/log handling.
Frontier orchestration scripts
\.github/scripts/run_frontier_all_tests.sh, \.github/scripts/run_frontier_all_benchmarks.sh, \.github/scripts/frontier_test_config.sh, \.github/scripts/frontier_bench_config.sh, \.github/scripts/frontier_bench_post.sh
Added scripts to create per-config copies, run parallel builds on login node, submit single multi-node sbatch jobs that SSH into allocated nodes to execute per-config test/bench tasks, aggregate YAML outputs, and post-process bench diffs.
Integration / helpers referenced
.../monitor_slurm_job.sh, .../pr/.github/scripts/*, bench_diff/loading utilities
New scripts call existing SLURM monitoring and bench-diff/loading helpers; verify artifact paths, naming conventions, and PR vs master ref handling.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant GH as GitHub Actions (test.yml)
    participant Repo as Repository (PR & Master refs)
    participant Login as Login Node (build)
    participant SLURM as SLURM Scheduler
    participant Node as Compute Node (per-config)
    participant Diff as bench_diff / Poster

    rect rgba(93,156,236,0.5)
    GH->>Repo: clone PR and master refs
    end

    rect rgba(96,169,79,0.5)
    GH->>Login: run per-config builds (parallel)
    Login-->>GH: build artifacts/logs
    end

    rect rgba(236,151,31,0.5)
    GH->>SLURM: submit multi-node sbatch (run_frontier_all_*.sh)
    SLURM->>Node: allocate nodes and SSH per-config tasks
    Node->>Node: run frontier_test_config / frontier_bench_config
    Node-->>SLURM: write per-config YAML outputs
    SLURM-->>GH: job completion and collected outputs
    end

    rect rgba(178,120,255,0.5)
    GH->>Diff: run bench_diff & generate comment
    Diff-->>GH: diff results / comment payload
    GH->>Repo: post comment & upload logs/artifacts
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

Review effort 4/5

Poem

🐰 I hopped through CI at dawn,
removed a bench and carried on,
nodes hum, scripts align,
YAMLs bloom in tidy line,
the rabbit cheers — tests refashioned!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective of the PR: moving benchmarks to run in parallel with tests instead of sequentially.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed PR description covers all major changes, includes testing plan, and links to issue via type reference; however, it lacks some template elements like explicit 'Type of change' checkboxes and GPU testing details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

 _____________________________________________________________
< All programming is an exercise in caching. - Terje Mathisen >
 -------------------------------------------------------------
  \
   \   \
        \ /\
        ( )
      .( o ).

✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

2 similar comments
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

 _____________________________________________________________
< All programming is an exercise in caching. - Terje Mathisen >
 -------------------------------------------------------------
  \
   \   \
        \ /\
        ( )
      .( o ).

✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

 _____________________________________________________________
< All programming is an exercise in caching. - Terje Mathisen >
 -------------------------------------------------------------
  \
   \   \
        \ /\
        ( )
      .( o ).

✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.

Tip

CodeRabbit can generate a title for your PR based on the changes.

Add @coderabbitai placeholder anywhere in the title of your PR and CodeRabbit will replace it with a title based on the changes in the PR. You can change the placeholder by changing the reviews.auto_title_placeholder setting.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Feb 9, 2026
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 9, 2026

CodeAnt AI finished reviewing your PR.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow Consolidation
.github/file-filter.yml, .github/workflows/bench.yml, .github/workflows/test.yml
Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check ✅ Passed The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

3 similar comments
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow Consolidation
.github/file-filter.yml, .github/workflows/bench.yml, .github/workflows/test.yml
Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check ✅ Passed The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow Consolidation
.github/file-filter.yml, .github/workflows/bench.yml, .github/workflows/test.yml
Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check ✅ Passed The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow Consolidation
.github/file-filter.yml, .github/workflows/bench.yml, .github/workflows/test.yml
Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check ✅ Passed The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/test.yml:
- Around line 285-287: The bench job is missing the NODE_OPTIONS env used in the
self job and can cause OOMs on Phoenix self-hosted runners; update the bench
job's env block (the same place where ACTIONS_RUNNER_FORCE_ACTIONS_NODE_VERSION
and ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION are set) to include NODE_OPTIONS:
'--max-old-space-size=2048' so Phoenix bench runners get the same Node memory
limit as the self job.
🧹 Nitpick comments (2)
.github/workflows/test.yml (2)

301-306: Parallel build: first failure orphans the second background job.

GitHub Actions' default shell uses set -eo pipefail. If wait %1 returns non-zero, the shell exits immediately and wait %2 is never called — the second build process keeps running in the background on the self-hosted runner. While the step correctly fails, the orphaned process wastes resources.

A more robust pattern waits for both jobs unconditionally and then checks both exit codes:

Proposed fix
       - name: Setup & Build
         if: matrix.build_script != ''
         run: |
           (cd pr     && ${{ matrix.build_script }}) &
           (cd master && ${{ matrix.build_script }}) &
-          wait %1 && wait %2
+          wait -n 2 || true  # suppress set -e for individual waits
+          wait -n 2 || true
+          wait %1; s1=$?
+          wait %2; s2=$?
+          if [ "$s1" -ne 0 ] || [ "$s2" -ne 0 ]; then exit 1; fi

Actually, a simpler approach since both jobs have already finished:

Simpler alternative
       - name: Setup & Build
         if: matrix.build_script != ''
         run: |
+          set +e
           (cd pr     && ${{ matrix.build_script }}) &
           (cd master && ${{ matrix.build_script }}) &
-          wait %1 && wait %2
+          wait %1; s1=$?
+          wait %2; s2=$?
+          [ "$s1" -eq 0 ] && [ "$s2" -eq 0 ]

311-314: bench_diff step missing if: always() or if: success() guard.

If the "Bench (Master v. PR)" step fails, the "Generate & Post Comment" step is skipped by default (GitHub Actions skips subsequent steps on failure). If you want the diff comment posted even on partial benchmark failure (e.g., to report which benchmarks regressed), consider adding if: always(). If the current behavior (skip on failure) is intended, this is fine as-is.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 9, 2026

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

The benchmark workflow has been consolidated from a standalone file into the main test workflow. The .github/workflows/bench.yml file was entirely removed and its functionality was integrated as a new bench job within .github/workflows/test.yml. The bench workflow reference was also removed from .github/file-filter.yml.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow Consolidation
.github/file-filter.yml, .github/workflows/bench.yml, .github/workflows/test.yml
Moved benchmark job from a separate workflow file into the test workflow. Removed bench.yml entirely, deleted bench reference from file-filter configuration, and added a new bench job to test.yml that performs parallel benchmarking between PR and master branches with matrix-based cluster/device coverage and result posting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 The workflows hop and consolidate,
Bench jobs nestled in test's estate,
No more files to separate,
One stream to accelerate,
GitHub Actions celebrate! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: moving benchmarks from sequential to parallel execution with tests.
Description check ✅ Passed The description covers the key changes, motivation (reducing PR merge time), implementation details, and includes a test plan; however, most template checklist items are not addressed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Benchmark failures (e.g. transient network issues) should not cancel
correctness tests. Keep only cancel-on-test-failure so that test failures
still cancel benchmarks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.03%. Comparing base (0ba6c02) to head (e2cb418).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1130   +/-   ##
=======================================
  Coverage   44.03%   44.03%           
=======================================
  Files          70       70           
  Lines       20649    20649           
  Branches     2053     2053           
=======================================
  Hits         9093     9093           
  Misses      10368    10368           
  Partials     1188     1188           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Reduces Frontier queue depth from 11 SLURM jobs to 2 by consolidating
test configs into a single 5-node allocation and bench configs into a
single 6-node allocation. Builds remain on login nodes; tests/benchmarks
run on compute nodes via ssh to allocated nodes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 9, 2026

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 9, 2026
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 9, 2026

CodeAnt AI Incremental review completed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @.github/scripts/frontier_bench_post.sh:
- Around line 34-35: The issue is that the module environment sourced by ".
./mfc.sh load -c \"$flag\" -m g" is lost because it runs in a separate subshell
from the subsequent "./mfc.sh bench_diff" call; fix it by running both commands
in the same subshell or sourcing the loader in the parent shell so the
environment persists — e.g., change the two separate "(cd pr && ...)"
invocations so they are combined into a single "(cd pr && . ./mfc.sh load -c
\"$flag\" -m g && ./mfc.sh bench_diff \"../$master_yaml\" \"../$pr_yaml\")" (or
alternatively run "cd pr" then source "mfc.sh load" in the current shell before
invoking "./mfc.sh bench_diff"), ensuring the loader (./mfc.sh load) and
bench_diff run in the same shell/session.

In @.github/scripts/frontier_test_config.sh:
- Around line 28-34: The GPU probe using rocm-smi (the gpus and ngpus
assignments) is executed unconditionally and will cause CPU-only runs to fail;
move the rocm-smi call and the ngpus calculation inside the branch that runs
when device equals "gpu". Specifically, relocate the lines that set
gpus=$(rocm-smi ...) and ngpus=$(...) into the if [ "$device" = "gpu" ] branch
before invoking ./mfc.sh, and add a safe fallback for ngpus (e.g., default to 1
or exit with a clear error) if rocm-smi yields no results; ensure you reference
the same variable names (gpus, ngpus, device) and preserve the existing ./mfc.sh
invocation when fixing.

In @.github/workflows/test.yml:
- Around line 270-275: The "Setup & Build" workflow step is dead because
matrix.build_script is set to the empty string for every matrix entry, so the
conditional if: matrix.build_script != '' is never true; either remove the
entire "Setup & Build" step from the workflow or populate the matrix entries
with non-empty build_script values for the matrix entries that need building so
the condition can be true, and update any references to that step accordingly
(look for the job name/step "Setup & Build" and the matrix variable
matrix.build_script).
- Around line 310-319: The workflow only implements one-directional cancellation
via the job named cancel-on-test-failure; to make cancellation bidirectional add
a symmetric job (e.g., cancel-on-bench-failure) that uses needs: [bench], if:
failure(), runs-on: ubuntu-latest and the same steps (Cancel Workflow Run using
gh run cancel ${{ github.run_id }} --repo ${{ github.repository }} with GH_TOKEN
env) so benchmarks failing will cancel the tests; ensure the new job mirrors
cancel-on-test-failure but references needs: [bench] and an appropriate name and
position in the YAML.
🧹 Nitpick comments (4)
.github/scripts/frontier_bench_config.sh (1)

29-33: Quote $(nproc) to prevent word splitting (ShellCheck SC2046).

While nproc returns a single integer in practice, quoting the command substitution is the safer shell idiom and silences the linter.

Proposed fix
-    ./mfc.sh bench --mem 1 -j $(nproc) -o "$job_slug.yaml" -- -c "$cluster" $device_opts -n $n_ranks
+    ./mfc.sh bench --mem 1 -j "$(nproc)" -o "$job_slug.yaml" -- -c "$cluster" $device_opts -n $n_ranks
.github/scripts/run_frontier_all_benchmarks.sh (2)

78-86: Duplicated config table inside the heredoc is a maintenance hazard.

The config array on lines 79–86 must be kept in exact sync with lines 15–22. If someone adds, removes, or reorders a config in one place but not the other, the node-to-config mapping silently breaks (wrong benchmark runs on wrong node, or index out of bounds). The same pattern appears in run_frontier_all_tests.sh lines 81–88.

Consider generating the inner config table dynamically (e.g., write it to a file in Phase 1 and source it inside the heredoc), or at minimum add a CI-time assertion that both arrays match.


111-118: $? after a failed if wait may not carry the original exit code in all shells.

In bash, $? in the else branch of if wait "$pid" correctly reflects the child's exit status. However, this is a bash-specific behavior — the shebang (#!/bin/bash) ensures it, but worth a brief comment for future maintainers, since it's a subtle pattern.

.github/scripts/run_frontier_all_tests.sh (1)

56-56: Hardcoded "5-node" in the echo; the benchmarks script uses $num_nodes.

Minor inconsistency — run_frontier_all_benchmarks.sh line 53 uses "Submitting ${num_nodes}-node SLURM job..." while this script hardcodes the number. Using $num_nodes keeps it self-documenting if configs change.

Proposed fix
-echo "All builds complete. Submitting 5-node SLURM job..."
+echo "All builds complete. Submitting ${num_nodes}-node SLURM job..."

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 6 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name=".github/scripts/frontier_test_config.sh">

<violation number="1" location=".github/scripts/frontier_test_config.sh:28">
P1: The `rocm-smi` command is invoked unconditionally before the device type check. With `set -e` enabled, this will cause CPU-only tests to fail immediately on nodes without ROCm installed, as the script will terminate before reaching the `else` branch. Move the GPU discovery logic inside the `if [ "$device" = "gpu" ]` block.</violation>
</file>

<file name=".github/scripts/run_frontier_all_benchmarks.sh">

<violation number="1" location=".github/scripts/run_frontier_all_benchmarks.sh:140">
P1: The job ID extraction using `grep -oE '[0-9]+'` is fragile and can incorrectly match numbers from error messages (e.g., account names like 'ENG160' or error codes). This could cause the script to proceed with an invalid job ID when `sbatch` actually failed. Parse only the specific 'Submitted batch job' line to ensure a real job ID is captured.</violation>
</file>

<file name=".github/scripts/frontier_bench_config.sh">

<violation number="1" location=".github/scripts/frontier_bench_config.sh:13">
P2: `frontier_bench_config.sh` always loads GPU mode (`-m g`) even when `device=cpu`, unlike the test config. CPU benchmark runs will get the wrong module set; mirror the device-based mode selection.</violation>
</file>

<file name=".github/scripts/frontier_bench_post.sh">

<violation number="1" location=".github/scripts/frontier_bench_post.sh:35">
P1: The environment setup and `bench_diff` invocation are in separate subshells, so the environment changes from `. ./mfc.sh load` are not visible to `bench_diff`. Both commands should run in the same subshell to ensure the loaded environment is applied.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

sbryngelson and others added 3 commits February 9, 2026 18:28
Use rsync with --link-dest and --exclude to avoid copying the target
directories back into themselves when creating per-config source copies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move rocm-smi GPU discovery inside the gpu branch so CPU configs
  don't fail with set -e (frontier_test_config.sh)
- Combine mfc.sh load and bench_diff into single subshell so loaded
  module environment persists (frontier_bench_post.sh)
- Use device-based mode selection in bench config for consistency
  (frontier_bench_config.sh)
- Parse 'Submitted batch job' line instead of bare grep for digits
  (run_frontier_all_tests.sh, run_frontier_all_benchmarks.sh)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents rapid pushes from cancelling unrelated jobs. Each job type
(github, self, bench) now has its own concurrency group keyed by
ref and matrix params, so a new push only cancels stale instances
of the same job — not all jobs across the workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name=".github/workflows/test.yml">

<violation number="1" location=".github/workflows/test.yml:68">
P2: The concurrency group for the github matrix job omits matrix.precision, so the default-precision job and the `single` precision job share the same group and can cancel each other when `cancel-in-progress: true` is enabled. Include precision in the concurrency key to avoid unintended cancellations and preserve parallelism.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

sbryngelson and others added 2 commits February 9, 2026 20:54
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Eliminate config table duplication by writing configs to a file and
  reading it inside the sbatch heredoc (now unquoted for variable expansion)
- Add EXIT trap in sbatch scripts to kill orphaned SSH processes on
  job cancellation
- Add per-config timeout (90 min tests, 120 min benchmarks) to prevent
  a single hanging config from consuming the full walltime
- Extract SLURM account, partition, walltime, and node count into
  variables at the top of each orchestration script
- Validate GPU count from rocm-smi (1-16 range) with diagnostic output
  on failure in both test and benchmark compute-node scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 10, 2026

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:XL This PR changes 500-999 lines, ignoring generated files labels Feb 10, 2026
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 10, 2026

CodeAnt AI Incremental review completed.

Running all 5-6 builds simultaneously on a Frontier login node causes
memory pressure that crashes the Cray compiler (optcg segfault). Cap
concurrent builds at 2 using a polling semaphore to stay within login
node resource limits.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 11, 2026

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:XL This PR changes 500-999 lines, ignoring generated files labels Feb 11, 2026
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 11, 2026

CodeAnt AI Incremental review completed.

sbryngelson and others added 3 commits February 11, 2026 14:24
Hardlink copies (cp -al / rsync --link-dest) share the same .git/index
inode across all config directories. When parallel builds invoke
setuptools_scm, concurrent git status calls on the shared index cause
timeouts on Lustre. Bypass git entirely by setting the pretend version.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The throttle loop blocks the main process while waiting for build slots,
so the post-loop heartbeat subprocess never starts until late. Add
inline heartbeat every 120s during throttle waits showing the last log
line of each active build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 12, 2026

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:XL This PR changes 500-999 lines, ignoring generated files labels Feb 12, 2026
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 12, 2026

CodeAnt AI Incremental review completed.

sbryngelson and others added 2 commits February 11, 2026 21:35
read -t 0.1 (sub-second timeout) in a loop with process substitution
file descriptors triggers a bash internal error (unwind_frame_run:
read_builtin: frame not found) leading to a segfault. Use integer
timeout (read -t 1) instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hardlinked source copies caused deterministic Cray compiler segfaults
in optcg during IPA when compiling m_phase_change.fpp.f90 with OpenACC.
Switching to real copies avoids inode sharing on Lustre that triggers the
compiler bug. Also bump MAX_PARALLEL from 2 to 3.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 12, 2026

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added size:XL This PR changes 500-999 lines, ignoring generated files and removed size:XL This PR changes 500-999 lines, ignoring generated files labels Feb 12, 2026
@codeant-ai
Copy link
Contributor

codeant-ai bot commented Feb 12, 2026

CodeAnt AI Incremental review completed.

sbryngelson and others added 2 commits February 12, 2026 17:12
Phoenix runners work fine without the max-old-space-size override.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ners

The actions/checkout clean step fails with ESTALE errors on NFS-backed
storage when build artifacts from previous runs have stale file handles.
Pre-clean with rm -rf (which tolerates stale handles) and disable
checkout's built-in clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files

Development

Successfully merging this pull request may close these issues.

1 participant