[6034518] Downgrade TRT support for remote autotuning in Autotune from 10.16 to 10.15#1259
Conversation
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded a Changes
Sequence Diagram(s)sequenceDiagram
participant User as User/CLI
participant Bench as AutotuneBenchmark
participant Util as _check_for_trtexec
participant TrtExec as trtexec
participant Remote as Remote AutoTuner
User->>Bench: invoke autotune (--use_trtexec + args)
Bench->>Util: probe `trtexec` binary and parse version
Util-->>Bench: returns detected version or raises ImportError
Bench->>TrtExec: launch `trtexec` with --remoteAutoTuningConfig + --safe + --skipInference
TrtExec->>Remote: perform remote autotuning
Remote-->>TrtExec: tuning results
TrtExec-->>Bench: benchmark/tuning output
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (2 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
modelopt/onnx/quantization/autotune/benchmark.py (1)
209-228:⚠️ Potential issue | 🔴 CriticalRemove early return so remote autotuning args are actually applied.
Line 218 returns from
__init__before Line 227 extendsself._base_cmd, so the--remoteAutoTuningConfig/--safeargs never make it into the executed command when TRT >= 10.15. This breaks remote autotuning despite passing the version gate.💡 Proposed fix
- trtexec_args = self.trtexec_args or [] + trtexec_args = list(self.trtexec_args or []) has_remote_config = any("--remoteAutoTuningConfig" in arg for arg in trtexec_args) if has_remote_config: try: _check_for_tensorrt(min_version="10.15") self.logger.debug("TensorRT Python API version >= 10.15 detected") if "--safe" not in trtexec_args: self.logger.warning( "Remote autotuning requires '--safe' to be set. Adding it to trtexec arguments." ) - self.trtexec_args.append("--safe") - return + trtexec_args.append("--safe") except ImportError: self.logger.warning( "Remote autotuning is not supported with TensorRT version < 10.15. " "Removing --remoteAutoTuningConfig from trtexec arguments" ) trtexec_args = [ arg for arg in trtexec_args if "--remoteAutoTuningConfig" not in arg ] self._base_cmd.extend(trtexec_args)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/autotune/benchmark.py` around lines 209 - 228, The constructor's remote-autotuning branch returns early so trtexec args like "--safe" or "--remoteAutoTuningConfig" never get appended to the final command; remove the early return in the block that calls _check_for_tensorrt(min_version="10.15") and instead ensure any modifications to self.trtexec_args (and filtered local trtexec_args) are followed by extending self._base_cmd with those trtexec_args so the arguments are applied; locate the logic around _check_for_tensorrt, self.trtexec_args, trtexec_args and self._base_cmd in the __init__ of the Benchmark class and adjust flow so the successful TensorRT path does not exit before self._base_cmd.extend(trtexec_args) executes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@modelopt/onnx/quantization/autotune/benchmark.py`:
- Around line 209-228: The constructor's remote-autotuning branch returns early
so trtexec args like "--safe" or "--remoteAutoTuningConfig" never get appended
to the final command; remove the early return in the block that calls
_check_for_tensorrt(min_version="10.15") and instead ensure any modifications to
self.trtexec_args (and filtered local trtexec_args) are followed by extending
self._base_cmd with those trtexec_args so the arguments are applied; locate the
logic around _check_for_tensorrt, self.trtexec_args, trtexec_args and
self._base_cmd in the __init__ of the Benchmark class and adjust flow so the
successful TensorRT path does not exit before
self._base_cmd.extend(trtexec_args) executes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: aca1a5f5-c803-49d9-8889-37357600cffb
📒 Files selected for processing (5)
CHANGELOG.rstdocs/source/guides/9_autotune.rstexamples/onnx_ptq/autotune/README.mdmodelopt/onnx/quantization/__main__.pymodelopt/onnx/quantization/autotune/benchmark.py
|
PR #1259 Review: Downgrade TRT support for remote autotuning from 10.16 to 10.15 Verdict: Request Changes — core fix is correct, but docs have formatting issues and there's a code/doc inconsistency.
Docs (both RST and README) now state --safe --skipInference must be enabled, but benchmark.py:211-215 only auto-adds --safe — no check or auto-add for --skipInference: if "--safe" not in trtexec_args:
self.logger.warning(...)
self.trtexec_args.append("--safe")
# No equivalent for --skipInferenceEither:
The updated help says "Only relevant with the 'trtexec' workflow enabled" — would be clearer as "Only relevant when --use_trtexec is set" since that's the actual flag name. What looks good
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1259 +/- ##
==========================================
- Coverage 76.90% 76.72% -0.19%
==========================================
Files 350 355 +5
Lines 40524 41453 +929
==========================================
+ Hits 31166 31804 +638
- Misses 9358 9649 +291
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…d for qdq tuning Signed-off-by: dmoodie <dmoodie@nvidia.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
cf42607 to
c957365
Compare
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Line 58: Remove the "# nosec" suppressions on the subprocess import and calls
in ort_utils.py; replace any unsafe subprocess usage with secure patterns (use
subprocess.run with a list of args, check=True, capture stdout/stderr, avoid
shell=True, and validate inputs or use shlex.split when necessary) and add
proper error handling/logging around the call; if you cannot convert the call to
a safe pattern, remove the nosec and add an explicit note in the PR requesting
`@NVIDIA/modelopt-setup-codeowners` approval with justification. Reference: the
subprocess import line and the subprocess call sites in ort_utils.py.
- Around line 63-70: The version extraction currently matches any dotted number
first (regex (\d+(\.\d+)+)) and so returns CUDA/driver versions from
version_str; update the logic in the TensorRT parsing block (where version_str
is inspected and the first regex run) to try a TensorRT-specific pattern first
(e.g., look for "TensorRT" or "[TensorRT v...]" and extract its dotted version
or convert vNNNNNN to x.y format) before falling back to generic dotted-number
matching; ensure references to the existing variables and match handling in this
function (the match variable, the first regex branch and the "[TensorRT v...]"
fallback branch) are updated so the TensorRT-specific match is tested and
returned prior to the generic (\d+(\.\d+)+) match.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 3972ea4d-99f9-463a-940c-61ade3610b89
📒 Files selected for processing (2)
modelopt/onnx/quantization/autotune/benchmark.pymodelopt/onnx/quantization/ort_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
- modelopt/onnx/quantization/autotune/benchmark.py
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
Addressed all concerns, PTAL again @ajrasane. Thanks! |
Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com>
|
I think there are 3 or 4 places in modelopt.onnx where we use subprocess to run trtexec. As a separate PR, can you unify all into a single function |
ajrasane
left a comment
There was a problem hiding this comment.
LGTM. Please merge after the subprocess module import issue is resolved.
Will do, thank you! |
|
/ok to test 14d425c |
@kevalmorabia97 PTAL at #1268. Thanks. |
What does this PR do?
Type of change: Bug fix
Remote autotuning is supported in TensorRT from version 10.15, but fails with Autotune as it's checking for 10.16+. This PR fixes that check and updates documentation accordingly.
Usage
# Add a code snippet demonstrating how to use thisTesting
See bug 6034518.
Before your PR is "Ready for review"
CONTRIBUTING.md: N/ASummary by CodeRabbit
Documentation
--safe --skipInference.Updates
Bug Fixes