Skip to content

fix(launcher): use afterany dependency for allow_to_fail pipelines#1248

Open
yeyu-nvidia wants to merge 1 commit intomainfrom
yeyu/afterany-pipeline-fix
Open

fix(launcher): use afterany dependency for allow_to_fail pipelines#1248
yeyu-nvidia wants to merge 1 commit intomainfrom
yeyu/afterany-pipeline-fix

Conversation

@yeyu-nvidia
Copy link
Copy Markdown
Contributor

@yeyu-nvidia yeyu-nvidia commented Apr 13, 2026

Summary

  • nemo-run's SlurmExecutor defaults to dependency_type="afterok", which cancels all downstream Slurm tasks when a predecessor times out (TIMEOUT) or fails
  • For pipelines with allow_to_fail=True, this changes the dependency type to "afterany" so subsequent tasks run regardless of predecessor exit status
  • This unblocks EAGLE3 multi-step pipelines where task_0 (data generation) may time out but task_1+ should still run on whatever data was produced

Test plan

  • Verify existing launcher unit tests pass (uv run python3 -m pytest tests/ -v in tools/launcher/)
  • Submit an EAGLE3 pipeline with allow_to_fail: true and confirm task_1 runs after task_0 times out
  • Verify pipelines without allow_to_fail still use default afterok behavior

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Experiments can now continue executing downstream tasks even when upstream tasks fail or timeout, improving workflow resilience and enabling more robust experiment pipelines.

nemo-run's SlurmExecutor defaults to dependency_type="afterok", which
cancels all downstream tasks when a predecessor times out or fails.
For pipelines with allow_to_fail=True, use "afterany" so subsequent
tasks run regardless of predecessor exit status.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Ye Yu <yeyu@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 26dba2c4-106d-43c4-aa06-b0c2bf60569a

📥 Commits

Reviewing files that changed from the base of the PR and between 0b42c14 and 4edc5a6.

📒 Files selected for processing (1)
  • tools/launcher/core.py

📝 Walkthrough

Walkthrough

Added conditional logic in run_jobs function that checks job.allow_to_fail and whether the executor has a dependency_type attribute, then sets executor.dependency_type to "afterany" to enable downstream tasks to proceed independently of predecessor failures.

Changes

Cohort / File(s) Summary
Task Dependency Configuration
tools/launcher/core.py
Added conditional logic to set executor dependency_type to "afterany" when a job allows failure, enabling downstream tasks to continue regardless of predecessor timeout or failure states.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: using afterany dependency type for pipelines with allow_to_fail enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed The pull request adds conditional logic to set executor.dependency_type without introducing security anti-patterns like unsafe deserialization, code execution, or dangerous configurations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch yeyu/afterany-pipeline-fix

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1248/

Built to branch gh-pages at 2026-04-13 17:06 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.91%. Comparing base (5ff1d7b) to head (4edc5a6).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1248   +/-   ##
=======================================
  Coverage   76.91%   76.91%           
=======================================
  Files         350      350           
  Lines       40481    40481           
=======================================
  Hits        31137    31137           
  Misses       9344     9344           
Flag Coverage Δ
unit 55.53% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant