Add qwen3 moe experts only test by cjluo-nv · Pull Request #1274 · NVIDIA/Model-Optimizer

cjluo-nv · 2026-04-16T06:52:53Z

Summary

Add unit test for Qwen3 MoE HF export with NVFP4_EXPERTS_ONLY_CFG quantization config
Verifies that hf_quant_config.json correctly reports quant_algo: NVFP4 and that non-expert modules (self_attn, lm_head) appear in exclude_modules while routed expert layers (mlp.experts.*) do not
Reference: https://huggingface.co/nvidia/Qwen3.5-397B-A17B-NVFP4/blob/main/hf_quant_config.json

Type of change: New tests

Known issue

On transformers>=5.0, fused MoE experts (_QuantFusedExperts) are not recognized by get_quant_config, causing quant_algo=None in the exported config. This test currently fails on transformers 5.x and is intended to be fixed by a follow-up change.

Testing

transformers 4.57.6: PASSED
transformers 5.5.4: FAILED (quant_algo is None due to fused expert export gap)

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: N/A

Summary by CodeRabbit

Tests
- Added unit test coverage for Qwen3 MoE quantization with NVFP4, including checkpoint export validation and verification of generated module exclusion patterns.

coderabbitai · 2026-04-16T06:53:10Z

📝 Walkthrough

Walkthrough

Adds a unit test that quantizes a tiny Qwen3 MoE model with the NVFP4 experts-only config, exports an HF checkpoint, and asserts the exported quantization algorithm and module exclusion patterns (attention and lm_head excluded; routed expert modules not globally excluded).

Changes

Cohort / File(s)	Summary
Test Addition `tests/unit/torch/quantization/plugins/test_huggingface.py`	Added `test_qwen3_moe_nvfp4_experts_only_export_exclude_modules` which builds a tiny Qwen3 MoE, applies `mtq.NVFP4_EXPERTS_ONLY_CFG` quantization, exports HF checkpoint, verifies `hf_quant_config["quantization"]["quant_algo"] == "NVFP4"`, checks `exclude_modules` includes `self_attn` and `lm_head`, and ensures routed expert modules (`mlp.experts.`) are not globally excluded.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: adding a unit test for Qwen3 MoE models with experts-only quantization configuration.
Security Anti-Patterns	✅ Passed	PR modifies only test code, and SECURITY.md explicitly exempts test code from security anti-patterns check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chenjiel/add_qwen_moe_test

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 271-285: The current substring checks on exclude_modules are
unsafe for glob-like patterns; update the test to detect glob patterns that
would match routed experts by treating each entry as a glob (e.g., use
fnmatch/fnmatchcase or equivalent) and assert there is no pattern that matches
"*mlp.experts*" while not matching "*shared*". Concretely, replace the
substring-based loop over exclude_modules with a glob-aware check that fails if
any pattern would match routed expert paths (pattern matches "*mlp.experts*" and
does not match "*shared*"), referencing the exclude_modules variable used in the
test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 882e6f45-1a6b-4b82-b938-f2a571d15985

📥 Commits

Reviewing files that changed from the base of the PR and between d45219b and 17fd111.

📒 Files selected for processing (1)

tests/unit/torch/quantization/plugins/test_huggingface.py

github-actions · 2026-04-16T06:59:52Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1274/
Built to branch `gh-pages` at 2026-04-16 06:59 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 267-269: The test unconditionally asserts
quant_section["quant_algo"] == "NVFP4" but in transformers>=5.0
get_quant_config() cannot detect per-expert quantizers (see hugggingface.py
comment), so gate this by checking transformers.__version__ (or parsing with
packaging.version.parse) and call pytest.xfail(...) with a clear reason about
the transformers>=5.0 fused expert gap before the assertion; place this check
just after quant_section = hf_quant_config["quantization"] (referencing
quant_section, hf_quant_config and the get_quant_config() behavior) so
environments with transformers 5.x mark the test as expected to fail until the
exporter is updated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 2f1b7646-bd3e-4ecf-bb8f-4f90182d2baf

📥 Commits

Reviewing files that changed from the base of the PR and between 17fd111 and aba7223.

📒 Files selected for processing (1)

tests/unit/torch/quantization/plugins/test_huggingface.py

coderabbitai · 2026-04-16T07:03:20Z

+    quant_section = hf_quant_config["quantization"]
+    assert quant_section["quant_algo"] == "NVFP4"
+    exclude_modules = quant_section["exclude_modules"]


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the test file content around lines 267-269 cd /tmp && head -300 tests/unit/torch/quantization/plugins/test_huggingface.py | tail -50

Repository: NVIDIA/Model-Optimizer

Length of output: 182

🏁 Script executed:

# Get exact line range for context sed -n '260,280p' tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 821

🏁 Script executed:

# Check imports in the test file to see if Version is already imported head -30 tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1125

🏁 Script executed:

# Search for references to transformers version compatibility issues rg "transformers.*>=.*5" --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 2352

🏁 Script executed:

# Look for any xfail/skip markers already used in similar tests rg "pytest.xfail|pytest.skip" tests/unit/torch/quantization/plugins/test_huggingface.py -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 152

🏁 Script executed:

# Search for the test function name to understand context rg "test_qwen3_moe_nvfp4_experts_only_export_exclude_modules" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 30

Repository: NVIDIA/Model-Optimizer

Length of output: 1548

🏁 Script executed:

# Look for get_quant_config and MoE expert handling in transformers >= 5.0 rg "get_quant_config" modelopt/torch/quantization/plugins/huggingface.py -B 3 -A 10

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check for any documentation about transformers >= 5.0 and quant_algo in MoE models rg "quant_algo" modelopt/torch/quantization/plugins/huggingface.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check if there are any existing pytest markers or skip conditions in this test file rg "@pytest\.|pytest\.(mark|skip|xfail|parametrize)" tests/unit/torch/quantization/plugins/test_huggingface.py | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 309

🏁 Script executed:

# Look for any TODOs or FIXMEs related to transformers 5.0 in the quantization module rg "TODO|FIXME" modelopt/torch/quantization/plugins/huggingface.py | head -10

Repository: NVIDIA/Model-Optimizer

Length of output: 174

🏁 Script executed:

# Search more broadly for quant_algo handling, including export modules rg "quant_algo" modelopt/torch/export/ --type py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 20044

🏁 Script executed:

# Check the export_hf_checkpoint function to understand how quant_algo is set rg "export_hf_checkpoint|def export_hf_checkpoint" modelopt/torch/export/ --type py -B 2 -A 15

Repository: NVIDIA/Model-Optimizer

Length of output: 2601

🏁 Script executed:

# Look for any issue or PR description context in recent commits or docs find . -name "*.md" -type f | xargs rg "transformers.*5.*quant" 2>/dev/null | head -10

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check if there are any other tests using transformers version checks for similar issues rg "Version\(transformers" tests/unit/torch/quantization/ -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 497

🏁 Script executed:

# Look for issue discussions or expected behavior documentation about Qwen3 MoE export rg "Qwen3.*MoE|fused.*expert" modelopt/ --type py -B 1 -A 1

Repository: NVIDIA/Model-Optimizer

Length of output: 18666

🏁 Script executed:

# Check if there are any pytest markers or fixtures that handle transformers version-specific behavior rg "Version.*transformers" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 5

Repository: NVIDIA/Model-Optimizer

Length of output: 385

🏁 Script executed:

# Look for any issue tracking or discussion about Qwen3 MoE export with transformers 5.0 rg "quant_algo.*None|None.*quant_algo" modelopt/torch/ --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 485

🏁 Script executed:

# Check the quantization exporter to see if it explicitly handles quant_algo export for fused experts rg "get_quant_config" modelopt/torch/ --type py -B 2 -A 8

Repository: NVIDIA/Model-Optimizer

Length of output: 3824

🏁 Script executed:

# Search for how get_quant_config handles fused experts and their quantization sed -n '/^def get_quant_config/,/^def [a-z]/p' modelopt/torch/export/quant_utils.py | head -100

Repository: NVIDIA/Model-Optimizer

Length of output: 4053

🏁 Script executed:

# Check if there's any special handling for fused experts in the config generation rg "fused.*expert|_QuantFusedExperts" modelopt/torch/export/quant_utils.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Look for per-layer quantization extraction to understand how quant_algo is determined rg "get_per_layer_quant_config|_get_per_layer_quant_config" modelopt/torch/export/ --type py -B 3 -A 10

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Continue reading the get_quant_config function to see how it determines the final quant_algo sed -n '/^def get_quant_config/,/^def [a-z]/p' modelopt/torch/export/quant_utils.py | tail -100

Repository: NVIDIA/Model-Optimizer

Length of output: 4113

🏁 Script executed:

# Look for any comments or documentation about transformers 5.0 fused experts and quant_algo rg -i "fused.*expert|transformers.*5\.0" modelopt/torch/export/quant_utils.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check the PR or recent commits to understand the issue context better git log --oneline -20 tests/unit/torch/quantization/plugins/test_huggingface.py 2>/dev/null | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 105

🏁 Script executed:

# Look for any documentation or issue references in the test file itself head -50 tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1838

🏁 Script executed:

# Check the recent commit to see if there are any related discussions or fixes git show aba7223 --stat 2>/dev/null | head -30

Repository: NVIDIA/Model-Optimizer

Length of output: 1638

🏁 Script executed:

# Look for any skip/xfail markers already applied to this specific test sed -n '230,280p' tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2007

🏁 Script executed:

# Check if there are any environment-specific or version-specific skips in the file rg "@pytest.mark.skip|@pytest.mark.xfail" tests/unit/torch/quantization/plugins/test_huggingface.py -B 2

Repository: NVIDIA/Model-Optimizer

Length of output: 88

🏁 Script executed:

# Look for any issue discussions or TODOs about transformers 5.0 and quant_algo rg "TODO|FIXME|HACK|XXX" modelopt/torch/export/quant_utils.py | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check if there's a specific issue or comment about fused experts and quant_algo export in transformers 5.0 rg "fused.*expert|transformers.*5" modelopt/torch/quantization/plugins/huggingface.py -B 3 -A 3 | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 2116

🏁 Script executed:

# Look for process_layer_quant_config to understand how quant_algo is determined from layer configs rg "process_layer_quant_config|def process_layer_quant_config" modelopt/torch/export/ --type py -B 2 -A 30

Repository: NVIDIA/Model-Optimizer

Length of output: 9406

🏁 Script executed:

# Verify the patch would be syntactically correct by checking imports and pytest.xfail usage python3 << 'EOF' import sys # Check pytest.xfail signature try: import pytest sig = str(pytest.xfail.__doc__) print("pytest.xfail is available and callable") # Check if it accepts reason parameter if "reason" in sig: print("✓ pytest.xfail accepts 'reason' parameter") except Exception as e: print(f"Error: {e}") EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 146

🏁 Script executed:

# Verify the test would actually fail with transformers >= 5.0 by checking if there's documented behavior rg "known.*issue|known.*failure|bug|regression" modelopt/torch/export/ --type py | grep -i "fused\|moe\|transformers\|5\.0" | head -5

Repository: NVIDIA/Model-Optimizer

Length of output: 188

🏁 Script executed:

# Check if there are any existing Qwen3 MoE tests that pass and how they handle this rg "qwen3.*moe|get_tiny_qwen3_moe" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 20 | head -40

Repository: NVIDIA/Model-Optimizer

Length of output: 1476

Gate the quant_algo assertion for transformers>=5.0 to avoid deterministic test failures.

Line 268 will fail unconditionally in environments with transformers 5.x due to the documented structural change in fused expert quantizer detection. The comment at modelopt/torch/quantization/plugins/huggingface.py ("transformers>=5.0 has batched experts, no per-expert quantizers") confirms that get_quant_config() cannot discover quantizers in the new fused expert format, leaving quant_algo as None. Add pytest.xfail() with reason explaining the transformers >= 5.0 gap until the exporter is updated.

Proposed patch

def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path): """...""" quant_section = hf_quant_config["quantization"] + if Version(transformers.__version__) >= Version("5.0"): + pytest.xfail( + "Known issue: transformers>=5.0 fused MoE experts are not recognized by " + "get_quant_config, so quant_algo is exported as None." + ) assert quant_section["quant_algo"] == "NVFP4"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/torch/quantization/plugins/test_huggingface.py` around lines 267 - 269, The test unconditionally asserts quant_section["quant_algo"] == "NVFP4" but in transformers>=5.0 get_quant_config() cannot detect per-expert quantizers (see hugggingface.py comment), so gate this by checking transformers.__version__ (or parsing with packaging.version.parse) and call pytest.xfail(...) with a clear reason about the transformers>=5.0 fused expert gap before the assertion; place this check just after quant_section = hf_quant_config["quantization"] (referencing quant_section, hf_quant_config and the get_quant_config() behavior) so environments with transformers 5.x mark the test as expected to fail until the exporter is updated.

kevalmorabia97 · 2026-04-16T07:03:48Z

    assert is_homogeneous_hf_model(model)


+def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path):


Can you move this to tests/gpu/torch/export? Running with cpu on 0.6B would be too slow and may not work with some old torch / transformers versions we run in unit test

kevalmorabia97 · 2026-04-16T07:04:30Z

+    export_hf_checkpoint(model, export_dir=export_dir)
+
+    # Load the generated hf_quant_config.json
+    import json


pls move all imports to top of the file outside function

kevalmorabia97 · 2026-04-16T07:07:13Z

Failing unit test: https://github.com/NVIDIA/Model-Optimizer/actions/runs/24496545237/job/71592788031?pr=1274#step:4:1042

Add qwen3 moe experts only test

aba7223

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

cjluo-nv force-pushed the chenjiel/add_qwen_moe_test branch from 17fd111 to aba7223 Compare April 16, 2026 06:55

cjluo-nv requested review from Edwardf0t1 and kevalmorabia97 April 16, 2026 06:56

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

Comment thread tests/unit/torch/quantization/plugins/test_huggingface.py

coderabbitai bot reviewed Apr 16, 2026

View reviewed changes

kevalmorabia97 reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add qwen3 moe experts only test#1274

Add qwen3 moe experts only test#1274
cjluo-nv wants to merge 1 commit intomainfrom
chenjiel/add_qwen_moe_test

cjluo-nv commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Apr 16, 2026

Built to branch `gh-pages` at 2026-04-16 06:59 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 16, 2026

Uh oh!

kevalmorabia97 Apr 16, 2026 •

edited

Loading

Uh oh!

kevalmorabia97 Apr 16, 2026

Uh oh!

kevalmorabia97 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		assert is_homogeneous_hf_model(model)


		def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path):

Conversation

cjluo-nv commented Apr 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Known issue

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Apr 16, 2026

Built to branch gh-pages at 2026-04-16 06:59 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

kevalmorabia97 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cjluo-nv commented Apr 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 16, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-16 06:59 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

kevalmorabia97 Apr 16, 2026 •

edited

Loading