Conversation
📝 WalkthroughWalkthroughAdds a unit test that quantizes a tiny Qwen3 MoE model with the NVFP4 experts-only config, exports an HF checkpoint, and asserts the exported quantization algorithm and module exclusion patterns (attention and lm_head excluded; routed expert modules not globally excluded). Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
17fd111 to
aba7223
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 271-285: The current substring checks on exclude_modules are
unsafe for glob-like patterns; update the test to detect glob patterns that
would match routed experts by treating each entry as a glob (e.g., use
fnmatch/fnmatchcase or equivalent) and assert there is no pattern that matches
"*mlp.experts*" while not matching "*shared*". Concretely, replace the
substring-based loop over exclude_modules with a glob-aware check that fails if
any pattern would match routed expert paths (pattern matches "*mlp.experts*" and
does not match "*shared*"), referencing the exclude_modules variable used in the
test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 882e6f45-1a6b-4b82-b938-f2a571d15985
📒 Files selected for processing (1)
tests/unit/torch/quantization/plugins/test_huggingface.py
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 267-269: The test unconditionally asserts
quant_section["quant_algo"] == "NVFP4" but in transformers>=5.0
get_quant_config() cannot detect per-expert quantizers (see hugggingface.py
comment), so gate this by checking transformers.__version__ (or parsing with
packaging.version.parse) and call pytest.xfail(...) with a clear reason about
the transformers>=5.0 fused expert gap before the assertion; place this check
just after quant_section = hf_quant_config["quantization"] (referencing
quant_section, hf_quant_config and the get_quant_config() behavior) so
environments with transformers 5.x mark the test as expected to fail until the
exporter is updated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 2f1b7646-bd3e-4ecf-bb8f-4f90182d2baf
📒 Files selected for processing (1)
tests/unit/torch/quantization/plugins/test_huggingface.py
| quant_section = hf_quant_config["quantization"] | ||
| assert quant_section["quant_algo"] == "NVFP4" | ||
| exclude_modules = quant_section["exclude_modules"] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the test file content around lines 267-269
cd /tmp && head -300 tests/unit/torch/quantization/plugins/test_huggingface.py | tail -50Repository: NVIDIA/Model-Optimizer
Length of output: 182
🏁 Script executed:
# Get exact line range for context
sed -n '260,280p' tests/unit/torch/quantization/plugins/test_huggingface.pyRepository: NVIDIA/Model-Optimizer
Length of output: 821
🏁 Script executed:
# Check imports in the test file to see if Version is already imported
head -30 tests/unit/torch/quantization/plugins/test_huggingface.pyRepository: NVIDIA/Model-Optimizer
Length of output: 1125
🏁 Script executed:
# Search for references to transformers version compatibility issues
rg "transformers.*>=.*5" --type pyRepository: NVIDIA/Model-Optimizer
Length of output: 2352
🏁 Script executed:
# Look for any xfail/skip markers already used in similar tests
rg "pytest.xfail|pytest.skip" tests/unit/torch/quantization/plugins/test_huggingface.py -A 2Repository: NVIDIA/Model-Optimizer
Length of output: 152
🏁 Script executed:
# Search for the test function name to understand context
rg "test_qwen3_moe_nvfp4_experts_only_export_exclude_modules" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 30Repository: NVIDIA/Model-Optimizer
Length of output: 1548
🏁 Script executed:
# Look for get_quant_config and MoE expert handling in transformers >= 5.0
rg "get_quant_config" modelopt/torch/quantization/plugins/huggingface.py -B 3 -A 10Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check for any documentation about transformers >= 5.0 and quant_algo in MoE models
rg "quant_algo" modelopt/torch/quantization/plugins/huggingface.py -B 2 -A 2Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check if there are any existing pytest markers or skip conditions in this test file
rg "@pytest\.|pytest\.(mark|skip|xfail|parametrize)" tests/unit/torch/quantization/plugins/test_huggingface.py | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 309
🏁 Script executed:
# Look for any TODOs or FIXMEs related to transformers 5.0 in the quantization module
rg "TODO|FIXME" modelopt/torch/quantization/plugins/huggingface.py | head -10Repository: NVIDIA/Model-Optimizer
Length of output: 174
🏁 Script executed:
# Search more broadly for quant_algo handling, including export modules
rg "quant_algo" modelopt/torch/export/ --type py -B 2 -A 2Repository: NVIDIA/Model-Optimizer
Length of output: 20044
🏁 Script executed:
# Check the export_hf_checkpoint function to understand how quant_algo is set
rg "export_hf_checkpoint|def export_hf_checkpoint" modelopt/torch/export/ --type py -B 2 -A 15Repository: NVIDIA/Model-Optimizer
Length of output: 2601
🏁 Script executed:
# Look for any issue or PR description context in recent commits or docs
find . -name "*.md" -type f | xargs rg "transformers.*5.*quant" 2>/dev/null | head -10Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check if there are any other tests using transformers version checks for similar issues
rg "Version\(transformers" tests/unit/torch/quantization/ -B 2 -A 2Repository: NVIDIA/Model-Optimizer
Length of output: 497
🏁 Script executed:
# Look for issue discussions or expected behavior documentation about Qwen3 MoE export
rg "Qwen3.*MoE|fused.*expert" modelopt/ --type py -B 1 -A 1Repository: NVIDIA/Model-Optimizer
Length of output: 18666
🏁 Script executed:
# Check if there are any pytest markers or fixtures that handle transformers version-specific behavior
rg "Version.*transformers" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 5Repository: NVIDIA/Model-Optimizer
Length of output: 385
🏁 Script executed:
# Look for any issue tracking or discussion about Qwen3 MoE export with transformers 5.0
rg "quant_algo.*None|None.*quant_algo" modelopt/torch/ --type pyRepository: NVIDIA/Model-Optimizer
Length of output: 485
🏁 Script executed:
# Check the quantization exporter to see if it explicitly handles quant_algo export for fused experts
rg "get_quant_config" modelopt/torch/ --type py -B 2 -A 8Repository: NVIDIA/Model-Optimizer
Length of output: 3824
🏁 Script executed:
# Search for how get_quant_config handles fused experts and their quantization
sed -n '/^def get_quant_config/,/^def [a-z]/p' modelopt/torch/export/quant_utils.py | head -100Repository: NVIDIA/Model-Optimizer
Length of output: 4053
🏁 Script executed:
# Check if there's any special handling for fused experts in the config generation
rg "fused.*expert|_QuantFusedExperts" modelopt/torch/export/quant_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Look for per-layer quantization extraction to understand how quant_algo is determined
rg "get_per_layer_quant_config|_get_per_layer_quant_config" modelopt/torch/export/ --type py -B 3 -A 10Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Continue reading the get_quant_config function to see how it determines the final quant_algo
sed -n '/^def get_quant_config/,/^def [a-z]/p' modelopt/torch/export/quant_utils.py | tail -100Repository: NVIDIA/Model-Optimizer
Length of output: 4113
🏁 Script executed:
# Look for any comments or documentation about transformers 5.0 fused experts and quant_algo
rg -i "fused.*expert|transformers.*5\.0" modelopt/torch/export/quant_utils.py -B 2 -A 2Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check the PR or recent commits to understand the issue context better
git log --oneline -20 tests/unit/torch/quantization/plugins/test_huggingface.py 2>/dev/null | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 105
🏁 Script executed:
# Look for any documentation or issue references in the test file itself
head -50 tests/unit/torch/quantization/plugins/test_huggingface.pyRepository: NVIDIA/Model-Optimizer
Length of output: 1838
🏁 Script executed:
# Check the recent commit to see if there are any related discussions or fixes
git show aba7223 --stat 2>/dev/null | head -30Repository: NVIDIA/Model-Optimizer
Length of output: 1638
🏁 Script executed:
# Look for any skip/xfail markers already applied to this specific test
sed -n '230,280p' tests/unit/torch/quantization/plugins/test_huggingface.pyRepository: NVIDIA/Model-Optimizer
Length of output: 2007
🏁 Script executed:
# Check if there are any environment-specific or version-specific skips in the file
rg "@pytest.mark.skip|@pytest.mark.xfail" tests/unit/torch/quantization/plugins/test_huggingface.py -B 2Repository: NVIDIA/Model-Optimizer
Length of output: 88
🏁 Script executed:
# Look for any issue discussions or TODOs about transformers 5.0 and quant_algo
rg "TODO|FIXME|HACK|XXX" modelopt/torch/export/quant_utils.py | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check if there's a specific issue or comment about fused experts and quant_algo export in transformers 5.0
rg "fused.*expert|transformers.*5" modelopt/torch/quantization/plugins/huggingface.py -B 3 -A 3 | head -50Repository: NVIDIA/Model-Optimizer
Length of output: 2116
🏁 Script executed:
# Look for process_layer_quant_config to understand how quant_algo is determined from layer configs
rg "process_layer_quant_config|def process_layer_quant_config" modelopt/torch/export/ --type py -B 2 -A 30Repository: NVIDIA/Model-Optimizer
Length of output: 9406
🏁 Script executed:
# Verify the patch would be syntactically correct by checking imports and pytest.xfail usage
python3 << 'EOF'
import sys
# Check pytest.xfail signature
try:
import pytest
sig = str(pytest.xfail.__doc__)
print("pytest.xfail is available and callable")
# Check if it accepts reason parameter
if "reason" in sig:
print("✓ pytest.xfail accepts 'reason' parameter")
except Exception as e:
print(f"Error: {e}")
EOFRepository: NVIDIA/Model-Optimizer
Length of output: 146
🏁 Script executed:
# Verify the test would actually fail with transformers >= 5.0 by checking if there's documented behavior
rg "known.*issue|known.*failure|bug|regression" modelopt/torch/export/ --type py | grep -i "fused\|moe\|transformers\|5\.0" | head -5Repository: NVIDIA/Model-Optimizer
Length of output: 188
🏁 Script executed:
# Check if there are any existing Qwen3 MoE tests that pass and how they handle this
rg "qwen3.*moe|get_tiny_qwen3_moe" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 20 | head -40Repository: NVIDIA/Model-Optimizer
Length of output: 1476
Gate the quant_algo assertion for transformers>=5.0 to avoid deterministic test failures.
Line 268 will fail unconditionally in environments with transformers 5.x due to the documented structural change in fused expert quantizer detection. The comment at modelopt/torch/quantization/plugins/huggingface.py ("transformers>=5.0 has batched experts, no per-expert quantizers") confirms that get_quant_config() cannot discover quantizers in the new fused expert format, leaving quant_algo as None. Add pytest.xfail() with reason explaining the transformers >= 5.0 gap until the exporter is updated.
Proposed patch
def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path):
"""..."""
quant_section = hf_quant_config["quantization"]
+ if Version(transformers.__version__) >= Version("5.0"):
+ pytest.xfail(
+ "Known issue: transformers>=5.0 fused MoE experts are not recognized by "
+ "get_quant_config, so quant_algo is exported as None."
+ )
assert quant_section["quant_algo"] == "NVFP4"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/torch/quantization/plugins/test_huggingface.py` around lines 267 -
269, The test unconditionally asserts quant_section["quant_algo"] == "NVFP4" but
in transformers>=5.0 get_quant_config() cannot detect per-expert quantizers (see
hugggingface.py comment), so gate this by checking transformers.__version__ (or
parsing with packaging.version.parse) and call pytest.xfail(...) with a clear
reason about the transformers>=5.0 fused expert gap before the assertion; place
this check just after quant_section = hf_quant_config["quantization"]
(referencing quant_section, hf_quant_config and the get_quant_config() behavior)
so environments with transformers 5.x mark the test as expected to fail until
the exporter is updated.
| assert is_homogeneous_hf_model(model) | ||
|
|
||
|
|
||
| def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path): |
There was a problem hiding this comment.
Can you move this to tests/gpu/torch/export? Running with cpu on 0.6B would be too slow and may not work with some old torch / transformers versions we run in unit test
| export_hf_checkpoint(model, export_dir=export_dir) | ||
|
|
||
| # Load the generated hf_quant_config.json | ||
| import json |
There was a problem hiding this comment.
pls move all imports to top of the file outside function
Summary
NVFP4_EXPERTS_ONLY_CFGquantization confighf_quant_config.jsoncorrectly reportsquant_algo: NVFP4and that non-expert modules (self_attn,lm_head) appear inexclude_moduleswhile routed expert layers (mlp.experts.*) do notType of change: New tests
Known issue
On
transformers>=5.0, fused MoE experts (_QuantFusedExperts) are not recognized byget_quant_config, causingquant_algo=Nonein the exported config. This test currently fails on transformers 5.x and is intended to be fixed by a follow-up change.Testing
quant_algoisNonedue to fused expert export gap)Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/ASummary by CodeRabbit