Skip to content

Add qwen3 moe experts only test#1274

Open
cjluo-nv wants to merge 1 commit intomainfrom
chenjiel/add_qwen_moe_test
Open

Add qwen3 moe experts only test#1274
cjluo-nv wants to merge 1 commit intomainfrom
chenjiel/add_qwen_moe_test

Conversation

@cjluo-nv
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv commented Apr 16, 2026

Summary

Type of change: New tests

Known issue

On transformers>=5.0, fused MoE experts (_QuantFusedExperts) are not recognized by get_quant_config, causing quant_algo=None in the exported config. This test currently fails on transformers 5.x and is intended to be fixed by a follow-up change.

Testing

  • transformers 4.57.6: PASSED
  • transformers 5.5.4: FAILED (quant_algo is None due to fused expert export gap)

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: N/A

Summary by CodeRabbit

  • Tests
    • Added unit test coverage for Qwen3 MoE quantization with NVFP4, including checkpoint export validation and verification of generated module exclusion patterns.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

Adds a unit test that quantizes a tiny Qwen3 MoE model with the NVFP4 experts-only config, exports an HF checkpoint, and asserts the exported quantization algorithm and module exclusion patterns (attention and lm_head excluded; routed expert modules not globally excluded).

Changes

Cohort / File(s) Summary
Test Addition
tests/unit/torch/quantization/plugins/test_huggingface.py
Added test_qwen3_moe_nvfp4_experts_only_export_exclude_modules which builds a tiny Qwen3 MoE, applies mtq.NVFP4_EXPERTS_ONLY_CFG quantization, exports HF checkpoint, verifies hf_quant_config["quantization"]["quant_algo"] == "NVFP4", checks exclude_modules includes self_attn and lm_head, and ensures routed expert modules (mlp.experts.) are not globally excluded.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding a unit test for Qwen3 MoE models with experts-only quantization configuration.
Security Anti-Patterns ✅ Passed PR modifies only test code, and SECURITY.md explicitly exempts test code from security anti-patterns check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chenjiel/add_qwen_moe_test

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
@cjluo-nv cjluo-nv force-pushed the chenjiel/add_qwen_moe_test branch from 17fd111 to aba7223 Compare April 16, 2026 06:55
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 271-285: The current substring checks on exclude_modules are
unsafe for glob-like patterns; update the test to detect glob patterns that
would match routed experts by treating each entry as a glob (e.g., use
fnmatch/fnmatchcase or equivalent) and assert there is no pattern that matches
"*mlp.experts*" while not matching "*shared*". Concretely, replace the
substring-based loop over exclude_modules with a glob-aware check that fails if
any pattern would match routed expert paths (pattern matches "*mlp.experts*" and
does not match "*shared*"), referencing the exclude_modules variable used in the
test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 882e6f45-1a6b-4b82-b938-f2a571d15985

📥 Commits

Reviewing files that changed from the base of the PR and between d45219b and 17fd111.

📒 Files selected for processing (1)
  • tests/unit/torch/quantization/plugins/test_huggingface.py

Comment thread tests/unit/torch/quantization/plugins/test_huggingface.py
@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1274/

Built to branch gh-pages at 2026-04-16 06:59 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/torch/quantization/plugins/test_huggingface.py`:
- Around line 267-269: The test unconditionally asserts
quant_section["quant_algo"] == "NVFP4" but in transformers>=5.0
get_quant_config() cannot detect per-expert quantizers (see hugggingface.py
comment), so gate this by checking transformers.__version__ (or parsing with
packaging.version.parse) and call pytest.xfail(...) with a clear reason about
the transformers>=5.0 fused expert gap before the assertion; place this check
just after quant_section = hf_quant_config["quantization"] (referencing
quant_section, hf_quant_config and the get_quant_config() behavior) so
environments with transformers 5.x mark the test as expected to fail until the
exporter is updated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 2f1b7646-bd3e-4ecf-bb8f-4f90182d2baf

📥 Commits

Reviewing files that changed from the base of the PR and between 17fd111 and aba7223.

📒 Files selected for processing (1)
  • tests/unit/torch/quantization/plugins/test_huggingface.py

Comment on lines +267 to +269
quant_section = hf_quant_config["quantization"]
assert quant_section["quant_algo"] == "NVFP4"
exclude_modules = quant_section["exclude_modules"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the test file content around lines 267-269
cd /tmp && head -300 tests/unit/torch/quantization/plugins/test_huggingface.py | tail -50

Repository: NVIDIA/Model-Optimizer

Length of output: 182


🏁 Script executed:

# Get exact line range for context
sed -n '260,280p' tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 821


🏁 Script executed:

# Check imports in the test file to see if Version is already imported
head -30 tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1125


🏁 Script executed:

# Search for references to transformers version compatibility issues
rg "transformers.*>=.*5" --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 2352


🏁 Script executed:

# Look for any xfail/skip markers already used in similar tests
rg "pytest.xfail|pytest.skip" tests/unit/torch/quantization/plugins/test_huggingface.py -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 152


🏁 Script executed:

# Search for the test function name to understand context
rg "test_qwen3_moe_nvfp4_experts_only_export_exclude_modules" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 30

Repository: NVIDIA/Model-Optimizer

Length of output: 1548


🏁 Script executed:

# Look for get_quant_config and MoE expert handling in transformers >= 5.0
rg "get_quant_config" modelopt/torch/quantization/plugins/huggingface.py -B 3 -A 10

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check for any documentation about transformers >= 5.0 and quant_algo in MoE models
rg "quant_algo" modelopt/torch/quantization/plugins/huggingface.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check if there are any existing pytest markers or skip conditions in this test file
rg "@pytest\.|pytest\.(mark|skip|xfail|parametrize)" tests/unit/torch/quantization/plugins/test_huggingface.py | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 309


🏁 Script executed:

# Look for any TODOs or FIXMEs related to transformers 5.0 in the quantization module
rg "TODO|FIXME" modelopt/torch/quantization/plugins/huggingface.py | head -10

Repository: NVIDIA/Model-Optimizer

Length of output: 174


🏁 Script executed:

# Search more broadly for quant_algo handling, including export modules
rg "quant_algo" modelopt/torch/export/ --type py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 20044


🏁 Script executed:

# Check the export_hf_checkpoint function to understand how quant_algo is set
rg "export_hf_checkpoint|def export_hf_checkpoint" modelopt/torch/export/ --type py -B 2 -A 15

Repository: NVIDIA/Model-Optimizer

Length of output: 2601


🏁 Script executed:

# Look for any issue or PR description context in recent commits or docs
find . -name "*.md" -type f | xargs rg "transformers.*5.*quant" 2>/dev/null | head -10

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check if there are any other tests using transformers version checks for similar issues
rg "Version\(transformers" tests/unit/torch/quantization/ -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 497


🏁 Script executed:

# Look for issue discussions or expected behavior documentation about Qwen3 MoE export
rg "Qwen3.*MoE|fused.*expert" modelopt/ --type py -B 1 -A 1

Repository: NVIDIA/Model-Optimizer

Length of output: 18666


🏁 Script executed:

# Check if there are any pytest markers or fixtures that handle transformers version-specific behavior
rg "Version.*transformers" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 5

Repository: NVIDIA/Model-Optimizer

Length of output: 385


🏁 Script executed:

# Look for any issue tracking or discussion about Qwen3 MoE export with transformers 5.0
rg "quant_algo.*None|None.*quant_algo" modelopt/torch/ --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 485


🏁 Script executed:

# Check the quantization exporter to see if it explicitly handles quant_algo export for fused experts
rg "get_quant_config" modelopt/torch/ --type py -B 2 -A 8

Repository: NVIDIA/Model-Optimizer

Length of output: 3824


🏁 Script executed:

# Search for how get_quant_config handles fused experts and their quantization
sed -n '/^def get_quant_config/,/^def [a-z]/p' modelopt/torch/export/quant_utils.py | head -100

Repository: NVIDIA/Model-Optimizer

Length of output: 4053


🏁 Script executed:

# Check if there's any special handling for fused experts in the config generation
rg "fused.*expert|_QuantFusedExperts" modelopt/torch/export/quant_utils.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Look for per-layer quantization extraction to understand how quant_algo is determined
rg "get_per_layer_quant_config|_get_per_layer_quant_config" modelopt/torch/export/ --type py -B 3 -A 10

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Continue reading the get_quant_config function to see how it determines the final quant_algo
sed -n '/^def get_quant_config/,/^def [a-z]/p' modelopt/torch/export/quant_utils.py | tail -100

Repository: NVIDIA/Model-Optimizer

Length of output: 4113


🏁 Script executed:

# Look for any comments or documentation about transformers 5.0 fused experts and quant_algo
rg -i "fused.*expert|transformers.*5\.0" modelopt/torch/export/quant_utils.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check the PR or recent commits to understand the issue context better
git log --oneline -20 tests/unit/torch/quantization/plugins/test_huggingface.py 2>/dev/null | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 105


🏁 Script executed:

# Look for any documentation or issue references in the test file itself
head -50 tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1838


🏁 Script executed:

# Check the recent commit to see if there are any related discussions or fixes
git show aba7223 --stat 2>/dev/null | head -30

Repository: NVIDIA/Model-Optimizer

Length of output: 1638


🏁 Script executed:

# Look for any skip/xfail markers already applied to this specific test
sed -n '230,280p' tests/unit/torch/quantization/plugins/test_huggingface.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2007


🏁 Script executed:

# Check if there are any environment-specific or version-specific skips in the file
rg "@pytest.mark.skip|@pytest.mark.xfail" tests/unit/torch/quantization/plugins/test_huggingface.py -B 2

Repository: NVIDIA/Model-Optimizer

Length of output: 88


🏁 Script executed:

# Look for any issue discussions or TODOs about transformers 5.0 and quant_algo
rg "TODO|FIXME|HACK|XXX" modelopt/torch/export/quant_utils.py | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check if there's a specific issue or comment about fused experts and quant_algo export in transformers 5.0
rg "fused.*expert|transformers.*5" modelopt/torch/quantization/plugins/huggingface.py -B 3 -A 3 | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 2116


🏁 Script executed:

# Look for process_layer_quant_config to understand how quant_algo is determined from layer configs
rg "process_layer_quant_config|def process_layer_quant_config" modelopt/torch/export/ --type py -B 2 -A 30

Repository: NVIDIA/Model-Optimizer

Length of output: 9406


🏁 Script executed:

# Verify the patch would be syntactically correct by checking imports and pytest.xfail usage
python3 << 'EOF'
import sys
# Check pytest.xfail signature
try:
    import pytest
    sig = str(pytest.xfail.__doc__)
    print("pytest.xfail is available and callable")
    # Check if it accepts reason parameter
    if "reason" in sig:
        print("✓ pytest.xfail accepts 'reason' parameter")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 146


🏁 Script executed:

# Verify the test would actually fail with transformers >= 5.0 by checking if there's documented behavior
rg "known.*issue|known.*failure|bug|regression" modelopt/torch/export/ --type py | grep -i "fused\|moe\|transformers\|5\.0" | head -5

Repository: NVIDIA/Model-Optimizer

Length of output: 188


🏁 Script executed:

# Check if there are any existing Qwen3 MoE tests that pass and how they handle this
rg "qwen3.*moe|get_tiny_qwen3_moe" tests/unit/torch/quantization/plugins/test_huggingface.py -B 5 -A 20 | head -40

Repository: NVIDIA/Model-Optimizer

Length of output: 1476


Gate the quant_algo assertion for transformers>=5.0 to avoid deterministic test failures.

Line 268 will fail unconditionally in environments with transformers 5.x due to the documented structural change in fused expert quantizer detection. The comment at modelopt/torch/quantization/plugins/huggingface.py ("transformers>=5.0 has batched experts, no per-expert quantizers") confirms that get_quant_config() cannot discover quantizers in the new fused expert format, leaving quant_algo as None. Add pytest.xfail() with reason explaining the transformers >= 5.0 gap until the exporter is updated.

Proposed patch
 def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path):
     """..."""
     quant_section = hf_quant_config["quantization"]
+    if Version(transformers.__version__) >= Version("5.0"):
+        pytest.xfail(
+            "Known issue: transformers>=5.0 fused MoE experts are not recognized by "
+            "get_quant_config, so quant_algo is exported as None."
+        )
     assert quant_section["quant_algo"] == "NVFP4"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/plugins/test_huggingface.py` around lines 267 -
269, The test unconditionally asserts quant_section["quant_algo"] == "NVFP4" but
in transformers>=5.0 get_quant_config() cannot detect per-expert quantizers (see
hugggingface.py comment), so gate this by checking transformers.__version__ (or
parsing with packaging.version.parse) and call pytest.xfail(...) with a clear
reason about the transformers>=5.0 fused expert gap before the assertion; place
this check just after quant_section = hf_quant_config["quantization"]
(referencing quant_section, hf_quant_config and the get_quant_config() behavior)
so environments with transformers 5.x mark the test as expected to fail until
the exporter is updated.

assert is_homogeneous_hf_model(model)


def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path):
Copy link
Copy Markdown
Collaborator

@kevalmorabia97 kevalmorabia97 Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to tests/gpu/torch/export? Running with cpu on 0.6B would be too slow and may not work with some old torch / transformers versions we run in unit test

export_hf_checkpoint(model, export_dir=export_dir)

# Load the generated hf_quant_config.json
import json
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls move all imports to top of the file outside function

@kevalmorabia97
Copy link
Copy Markdown
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants