Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions tests/gpu/torch/export/test_export.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import json
from fnmatch import fnmatch

import pytest
import torch
from _test_utils.torch.export.utils import (
Expand All @@ -29,6 +32,7 @@
partial_nvfp4_config,
partial_w4a8_config,
)
from _test_utils.torch.transformers_models import get_tiny_qwen3_moe

import modelopt.torch.quantization as mtq
from modelopt.torch.export.model_config import (
Expand All @@ -53,13 +57,15 @@
postprocess_state_dict,
process_layer_quant_config,
)
from modelopt.torch.export.unified_export_hf import export_hf_checkpoint
from modelopt.torch.quantization.config import (
FP8_DEFAULT_CFG,
INT4_AWQ_CFG,
INT8_SMOOTHQUANT_CFG,
INT8_WEIGHT_ONLY_CFG,
NVFP4_AWQ_LITE_CFG,
NVFP4_DEFAULT_CFG,
NVFP4_EXPERTS_ONLY_CFG,
W4A8_AWQ_BETA_CFG,
)
from modelopt.torch.quantization.nn import SequentialQuantizer, TensorQuantizer
Expand Down Expand Up @@ -466,3 +472,50 @@ def test_get_quant_config(config, expected):
mtq.quantize(model, config, lambda x: x(torch.randn(1, 4, 10, device="cuda")))
quant_config = get_quant_config(model)
assert quant_config["quantization"] == expected


def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path):
"""Test that NVFP4_EXPERTS_ONLY_CFG correctly excludes non-expert modules in HF export.

For a Qwen3 MoE model, only routed expert layers (mlp.experts.*) should be quantized.
Attention layers and lm_head should appear in the exported hf_quant_config.json
exclude_modules.

Reference: https://huggingface.co/nvidia/Qwen3.5-397B-A17B-NVFP4/blob/main/hf_quant_config.json
"""
model = get_tiny_qwen3_moe().to("cuda")
# from_config doesn't set architectures; export code requires it
model.config.architectures = ["Qwen3MoeForCausalLM"]

# Quantize with NVFP4_EXPERTS_ONLY_CFG (targets only *mlp.experts* patterns)
mtq.quantize(model, NVFP4_EXPERTS_ONLY_CFG, lambda m: m(**m.dummy_inputs))

# Export
export_dir = tmp_path / "qwen3_moe_nvfp4_experts_only"
export_hf_checkpoint(model, export_dir=export_dir)

# Load the generated hf_quant_config.json
hf_quant_config_path = export_dir / "hf_quant_config.json"
assert hf_quant_config_path.exists(), "hf_quant_config.json should be generated"
with open(hf_quant_config_path) as f:
hf_quant_config = json.load(f)

quant_section = hf_quant_config["quantization"]
assert quant_section["quant_algo"] == "NVFP4"
exclude_modules = quant_section["exclude_modules"]
Comment on lines +503 to +505
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Dependency specs mentioning transformers =="
rg -n --iglob 'pyproject.toml' --iglob 'setup.py' --iglob 'setup.cfg' --iglob '*requirements*.txt' --iglob '*constraints*.txt' '\btransformers\b'

echo
echo "== Existing version guards/xfail patterns in tests =="
rg -n -C2 --iglob 'tests/**/*.py' 'xfail|skipif|transformers\.__version__|version\.parse\('

Repository: NVIDIA/Model-Optimizer

Length of output: 24668


🏁 Script executed:

# Find and examine the test_export.py file
fd -type f "test_export.py" --path "*/gpu/torch/export/*"

Repository: NVIDIA/Model-Optimizer

Length of output: 236


🏁 Script executed:

# Check the specific lines mentioned in the review
fd -type f "test_export.py" --path "*/tests/*" --exec wc -l {} \;

Repository: NVIDIA/Model-Optimizer

Length of output: 236


🏁 Script executed:

# Look for TRANSFORMERS_VERSION_GE_5_0 definition and usage
rg -n "TRANSFORMERS_VERSION_GE_5_0" --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 1080


🏁 Script executed:

# Read the test_export.py file around lines 477-521
fd -type f "test_export.py" --path "*/tests/*" --exec head -n 530 {} \; | tail -n 60

Repository: NVIDIA/Model-Optimizer

Length of output: 236


🏁 Script executed:

# Find test_export.py in the gpu torch export path
find tests -name "test_export.py" -path "*/gpu/torch/export/*"

Repository: NVIDIA/Model-Optimizer

Length of output: 103


🏁 Script executed:

# Read the specific file and lines
find tests -name "test_export.py" -path "*/gpu/torch/export/*" -exec head -n 530 {} \;

Repository: NVIDIA/Model-Optimizer

Length of output: 17974


🏁 Script executed:

# Check where TRANSFORMERS_VERSION_GE_5_0 is defined
rg -n "TRANSFORMERS_VERSION_GE_5_0\s*=" --context 3

Repository: NVIDIA/Model-Optimizer

Length of output: 627


🏁 Script executed:

# Check imports at the top of test_export.py
head -n 80 tests/gpu/torch/export/test_export.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2567


🏁 Script executed:

# Check if test_export.py uses TRANSFORMERS_VERSION_GE_5_0 anywhere
rg -n "TRANSFORMERS_VERSION_GE_5_0" tests/gpu/torch/export/test_export.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check for any xfail/skipif in test_export.py
rg -n "xfail|skipif" tests/gpu/torch/export/test_export.py

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Get the exact line numbers and context of the test function
sed -n '477,521p' tests/gpu/torch/export/test_export.py | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 2387


🏁 Script executed:

# Check pyproject.toml to see the exact transformers constraint
sed -n '80,90p' pyproject.toml

Repository: NVIDIA/Model-Optimizer

Length of output: 569


🏁 Script executed:

# Look at how TRANSFORMERS_VERSION_GE_5_0 is used in the similar test file
sed -n '30,40p' tests/unit/torch/quantization/plugins/test_sparse_sequential_moe.py

Repository: NVIDIA/Model-Optimizer

Length of output: 413


Add version guard to prevent deterministic test failure in transformers 5.x.

Line 504 unconditionally asserts quant_algo == "NVFP4", but this fails in transformers ≥5.0 due to how MoE expert fusion is handled (confirmed by PR context). The repository pins transformers>=4.56 in pyproject.toml, so v5.x environments are encountered in CI. Align with the established pattern in test_sparse_sequential_moe.py by using @pytest.mark.skipif with the existing TRANSFORMERS_VERSION_GE_5_0 flag.

Suggested approach
+from modelopt.torch.quantization.plugins.huggingface import (
+    TRANSFORMERS_VERSION_GE_5_0,
+)

+@pytest.mark.skipif(TRANSFORMERS_VERSION_GE_5_0, reason="Transformers v5 does not recognize fused MoE experts in get_quant_config; quant_algo may be None")
 def test_qwen3_moe_nvfp4_experts_only_export_exclude_modules(tmp_path):

This matches the skip pattern already used in tests/unit/torch/quantization/plugins/test_sparse_sequential_moe.py (lines 178, 306) for similar MoE-related transformers version constraints.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gpu/torch/export/test_export.py` around lines 503 - 505, The assertion
that quant_section["quant_algo"] == "NVFP4" fails on transformers ≥5.0; add a
pytest skip guard using the existing TRANSFORMERS_VERSION_GE_5_0 flag to skip
the test when running under transformers 5.x. Locate the test that contains the
code referencing hf_quant_config and quant_section (in
tests/gpu/torch/export/test_export.py) and decorate that test function with
`@pytest.mark.skipif`(TRANSFORMERS_VERSION_GE_5_0, reason="MoE expert fusion
change in transformers>=5.0 causes deterministic failure"), following the same
pattern used in
tests/unit/torch/quantization/plugins/test_sparse_sequential_moe.py.


def is_excluded(module_name: str) -> bool:
return any(fnmatch(module_name, pattern) for pattern in exclude_modules)

# Attention layers must be excluded
assert is_excluded("model.layers.0.self_attn.q_proj"), (
f"self_attn should be excluded, got patterns: {exclude_modules}"
)

# lm_head must be excluded
assert is_excluded("lm_head"), f"lm_head should be excluded, got patterns: {exclude_modules}"

# Routed experts should NOT be excluded
assert not is_excluded("model.layers.0.mlp.experts.0.down_proj"), (
f"Routed experts should not be excluded, got patterns: {exclude_modules}"
)
Loading