Skip to content

[6034518] Downgrade TRT support for remote autotuning in Autotune from 10.16 to 10.15#1259

Merged
gcunhase merged 9 commits intoNVIDIA:mainfrom
gcunhase:dev/gcunhasergio/6034518_autotune_trt10.15
Apr 15, 2026
Merged

[6034518] Downgrade TRT support for remote autotuning in Autotune from 10.16 to 10.15#1259
gcunhase merged 9 commits intoNVIDIA:mainfrom
gcunhase:dev/gcunhasergio/6034518_autotune_trt10.15

Conversation

@gcunhase
Copy link
Copy Markdown
Contributor

@gcunhase gcunhase commented Apr 14, 2026

What does this PR do?

Type of change: Bug fix

Remote autotuning is supported in TensorRT from version 10.15, but fails with Autotune as it's checking for 10.16+. This PR fixes that check and updates documentation accordingly.

Usage

# Add a code snippet demonstrating how to use this

Testing

See bug 6034518.

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A
  • Did you update Changelog?: ✅

Summary by CodeRabbit

  • Documentation

    • Added a Remote Autotuning guide for TensorRT 10.15+ with CLI examples; updated examples to require --safe --skipInference.
  • Updates

    • Lowered TensorRT minimum requirement for remote autotuning from 10.16 to 10.15.
    • Clarified CLI help text for trtexec/autotune arguments.
  • Bug Fixes

    • trtexec-based autotuning now verifies the trtexec executable version when checking compatibility.

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
@gcunhase gcunhase requested a review from a team as a code owner April 14, 2026 16:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added a trtexec detection helper, lowered the TensorRT version requirement for remote autotuning to 10.15, updated benchmark logic to use trtexec (appending --safe and --skipInference as needed), clarified the CLI help, and added documentation/examples for trtexec-based remote autotuning.

Changes

Cohort / File(s) Summary
Changelog & docs
CHANGELOG.rst, docs/source/guides/9_autotune.rst, examples/onnx_ptq/autotune/README.md
Lowered TensorRT requirement from 10.16 → 10.15; added/updated remote-autotuning docs and example CLI showing --use_trtexec with --trtexec_benchmark_args='--remoteAutoTuningConfig=... --safe --skipInference'.
CLI help & benchmark behavior
modelopt/onnx/quantization/__main__.py, modelopt/onnx/quantization/autotune/benchmark.py
Clarified --autotune_trtexec_args help text. Benchmark now uses _check_for_trtexec, updates version checks/messages to 10.15, and conditionally appends --safe and --skipInference to trtexec args; fallback warning updated.
trtexec detection util
modelopt/onnx/quantization/ort_utils.py
Added _check_for_trtexec(min_version: str = "10.0") -> str to locate trtexec, run it, parse TensorRT version from output, validate against min_version, handle timeouts/errors, log results, and raise ImportError on failure.

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant Bench as AutotuneBenchmark
    participant Util as _check_for_trtexec
    participant TrtExec as trtexec
    participant Remote as Remote AutoTuner

    User->>Bench: invoke autotune (--use_trtexec + args)
    Bench->>Util: probe `trtexec` binary and parse version
    Util-->>Bench: returns detected version or raises ImportError
    Bench->>TrtExec: launch `trtexec` with --remoteAutoTuningConfig + --safe + --skipInference
    TrtExec->>Remote: perform remote autotuning
    Remote-->>TrtExec: tuning results
    TrtExec-->>Bench: benchmark/tuning output
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Security Anti-Patterns ❌ Error PR introduces five # nosec directives without following required exception process per SECURITY.md policy. Remove # nosec comments and add inline explanations. Add Security Exception section to PR description and request @NVIDIA/modelopt-setup-codeowners review.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: downgrading TensorRT version support for remote autotuning from 10.16 to 10.15, which is the primary bug fix across all modified files.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/onnx/quantization/autotune/benchmark.py (1)

209-228: ⚠️ Potential issue | 🔴 Critical

Remove early return so remote autotuning args are actually applied.

Line 218 returns from __init__ before Line 227 extends self._base_cmd, so the --remoteAutoTuningConfig/--safe args never make it into the executed command when TRT >= 10.15. This breaks remote autotuning despite passing the version gate.

💡 Proposed fix
-        trtexec_args = self.trtexec_args or []
+        trtexec_args = list(self.trtexec_args or [])
         has_remote_config = any("--remoteAutoTuningConfig" in arg for arg in trtexec_args)

         if has_remote_config:
             try:
                 _check_for_tensorrt(min_version="10.15")
                 self.logger.debug("TensorRT Python API version >= 10.15 detected")
                 if "--safe" not in trtexec_args:
                     self.logger.warning(
                         "Remote autotuning requires '--safe' to be set. Adding it to trtexec arguments."
                     )
-                    self.trtexec_args.append("--safe")
-                return
+                    trtexec_args.append("--safe")
             except ImportError:
                 self.logger.warning(
                     "Remote autotuning is not supported with TensorRT version < 10.15. "
                     "Removing --remoteAutoTuningConfig from trtexec arguments"
                 )
                 trtexec_args = [
                     arg for arg in trtexec_args if "--remoteAutoTuningConfig" not in arg
                 ]
         self._base_cmd.extend(trtexec_args)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 209 - 228, The
constructor's remote-autotuning branch returns early so trtexec args like
"--safe" or "--remoteAutoTuningConfig" never get appended to the final command;
remove the early return in the block that calls
_check_for_tensorrt(min_version="10.15") and instead ensure any modifications to
self.trtexec_args (and filtered local trtexec_args) are followed by extending
self._base_cmd with those trtexec_args so the arguments are applied; locate the
logic around _check_for_tensorrt, self.trtexec_args, trtexec_args and
self._base_cmd in the __init__ of the Benchmark class and adjust flow so the
successful TensorRT path does not exit before
self._base_cmd.extend(trtexec_args) executes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 209-228: The constructor's remote-autotuning branch returns early
so trtexec args like "--safe" or "--remoteAutoTuningConfig" never get appended
to the final command; remove the early return in the block that calls
_check_for_tensorrt(min_version="10.15") and instead ensure any modifications to
self.trtexec_args (and filtered local trtexec_args) are followed by extending
self._base_cmd with those trtexec_args so the arguments are applied; locate the
logic around _check_for_tensorrt, self.trtexec_args, trtexec_args and
self._base_cmd in the __init__ of the Benchmark class and adjust flow so the
successful TensorRT path does not exit before
self._base_cmd.extend(trtexec_args) executes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: aca1a5f5-c803-49d9-8889-37357600cffb

📥 Commits

Reviewing files that changed from the base of the PR and between b6c6ec3 and 2097254.

📒 Files selected for processing (5)
  • CHANGELOG.rst
  • docs/source/guides/9_autotune.rst
  • examples/onnx_ptq/autotune/README.md
  • modelopt/onnx/quantization/__main__.py
  • modelopt/onnx/quantization/autotune/benchmark.py

@ajrasane
Copy link
Copy Markdown
Contributor

PR #1259 Review: Downgrade TRT support for remote autotuning from 10.16 to 10.15

Verdict: Request Changes — core fix is correct, but docs have formatting issues and there's a code/doc inconsistency.


  1. RST formatting errors — docs/source/guides/9_autotune.rst
  • Missing blank line after .. code-block:: bash — RST requires a blank line between the directive and content. All other code blocks in this file follow this pattern.
  • Wrong indentation — content uses 2-space indent but every other code block in the file uses 3-space indent. This will prevent proper rendering.
  • Stray Markdown fence — there's a ``` closing fence after the code block that doesn't belong in RST. Will render as literal text.
  • Leading space on " Other TensorRT benchmark options..." line — creates an unintended blockquote.
  1. --skipInference inconsistency between docs and code

Docs (both RST and README) now state --safe --skipInference must be enabled, but benchmark.py:211-215 only auto-adds --safe — no check or auto-add for --skipInference:

  if "--safe" not in trtexec_args:                                                                                                                 
      self.logger.warning(...)                                                                                                      
      self.trtexec_args.append("--safe")                                                            
  # No equivalent for --skipInference

Either:

  • Add a similar auto-add for --skipInference in the code (preferred for consistency), or
  • Clarify in docs that --skipInference is recommended but not enforced
  1. Minor: main.py help text

The updated help says "Only relevant with the 'trtexec' workflow enabled" — would be clearer as "Only relevant when --use_trtexec is set" since that's the actual flag name.

What looks good

  • Version downgrade 10.16 → 10.15 in benchmark.py — correct and consistent across all 3 occurrences
  • CHANGELOG.rst entry — properly placed
  • README.md updates — version numbers and --skipInference addition are consistent

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 14.63415% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.72%. Comparing base (b6c6ec3) to head (14d425c).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/quantization/ort_utils.py 14.28% 30 Missing ⚠️
modelopt/onnx/quantization/autotune/benchmark.py 16.66% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1259      +/-   ##
==========================================
- Coverage   76.90%   76.72%   -0.19%     
==========================================
  Files         350      355       +5     
  Lines       40524    41453     +929     
==========================================
+ Hits        31166    31804     +638     
- Misses       9358     9649     +291     
Flag Coverage Δ
examples 43.78% <12.19%> (+1.15%) ⬆️
gpu 57.38% <14.63%> (-0.15%) ⬇️
unit 55.68% <14.63%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…d for qdq tuning

Signed-off-by: dmoodie <dmoodie@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
@gcunhase gcunhase force-pushed the dev/gcunhasergio/6034518_autotune_trt10.15 branch from cf42607 to c957365 Compare April 15, 2026 14:21
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Line 58: Remove the "# nosec" suppressions on the subprocess import and calls
in ort_utils.py; replace any unsafe subprocess usage with secure patterns (use
subprocess.run with a list of args, check=True, capture stdout/stderr, avoid
shell=True, and validate inputs or use shlex.split when necessary) and add
proper error handling/logging around the call; if you cannot convert the call to
a safe pattern, remove the nosec and add an explicit note in the PR requesting
`@NVIDIA/modelopt-setup-codeowners` approval with justification. Reference: the
subprocess import line and the subprocess call sites in ort_utils.py.
- Around line 63-70: The version extraction currently matches any dotted number
first (regex (\d+(\.\d+)+)) and so returns CUDA/driver versions from
version_str; update the logic in the TensorRT parsing block (where version_str
is inspected and the first regex run) to try a TensorRT-specific pattern first
(e.g., look for "TensorRT" or "[TensorRT v...]" and extract its dotted version
or convert vNNNNNN to x.y format) before falling back to generic dotted-number
matching; ensure references to the existing variables and match handling in this
function (the match variable, the first regex branch and the "[TensorRT v...]"
fallback branch) are updated so the TensorRT-specific match is tested and
returned prior to the generic (\d+(\.\d+)+) match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3972ea4d-99f9-463a-940c-61ade3610b89

📥 Commits

Reviewing files that changed from the base of the PR and between 2097254 and cf42607.

📒 Files selected for processing (2)
  • modelopt/onnx/quantization/autotune/benchmark.py
  • modelopt/onnx/quantization/ort_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • modelopt/onnx/quantization/autotune/benchmark.py

Comment thread modelopt/onnx/quantization/ort_utils.py Outdated
Comment thread modelopt/onnx/quantization/ort_utils.py Outdated
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
@gcunhase
Copy link
Copy Markdown
Contributor Author

gcunhase commented Apr 15, 2026

PR #1259 Review: Downgrade TRT support for remote autotuning from 10.16 to 10.15

Verdict: Request Changes — core fix is correct, but docs have formatting issues and there's a code/doc inconsistency.

  1. RST formatting errors — docs/source/guides/9_autotune.rst
  • Missing blank line after .. code-block:: bash — RST requires a blank line between the directive and content. All other code blocks in this file follow this pattern.
  • Wrong indentation — content uses 2-space indent but every other code block in the file uses 3-space indent. This will prevent proper rendering.
  • Stray Markdown fence — there's a ``` closing fence after the code block that doesn't belong in RST. Will render as literal text.
  • Leading space on " Other TensorRT benchmark options..." line — creates an unintended blockquote.
  1. --skipInference inconsistency between docs and code

Docs (both RST and README) now state --safe --skipInference must be enabled, but benchmark.py:211-215 only auto-adds --safe — no check or auto-add for --skipInference:

  if "--safe" not in trtexec_args:                                                                                                                 
      self.logger.warning(...)                                                                                                      
      self.trtexec_args.append("--safe")                                                            
  # No equivalent for --skipInference

Either:

  • Add a similar auto-add for --skipInference in the code (preferred for consistency), or
  • Clarify in docs that --skipInference is recommended but not enforced
  1. Minor: main.py help text

The updated help says "Only relevant with the 'trtexec' workflow enabled" — would be clearer as "Only relevant when --use_trtexec is set" since that's the actual flag name.

What looks good

  • Version downgrade 10.16 → 10.15 in benchmark.py — correct and consistent across all 3 occurrences
  • CHANGELOG.rst entry — properly placed
  • README.md updates — version numbers and --skipInference addition are consistent

Addressed all concerns, PTAL again @ajrasane. Thanks!

Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
@kevalmorabia97
Copy link
Copy Markdown
Collaborator

I think there are 3 or 4 places in modelopt.onnx where we use subprocess to run trtexec. As a separate PR, can you unify all into a single function run_trtexec(args: list[str]) or something like this and call that function everywhere so we only have 1 place where # nosec is needed?

Copy link
Copy Markdown
Contributor

@ajrasane ajrasane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please merge after the subprocess module import issue is resolved.

@gcunhase
Copy link
Copy Markdown
Contributor Author

I think there are 3 or 4 places in modelopt.onnx where we use subprocess to run trtexec. As a separate PR, can you unify all into a single function run_trtexec(args: list[str]) or something like this and call that function everywhere so we only have 1 place where # nosec is needed?

Will do, thank you!

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

/ok to test 14d425c

@gcunhase gcunhase merged commit dec2952 into NVIDIA:main Apr 15, 2026
43 of 44 checks passed
@gcunhase
Copy link
Copy Markdown
Contributor Author

I think there are 3 or 4 places in modelopt.onnx where we use subprocess to run trtexec. As a separate PR, can you unify all into a single function run_trtexec(args: list[str]) or something like this and call that function everywhere so we only have 1 place where # nosec is needed?

@kevalmorabia97 PTAL at #1268. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants