feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export by justinchuby · Pull Request #2406 · microsoft/Olive

justinchuby · 2026-04-09T21:24:07Z

Summary

Adds a new Olive pass (MobiusModelBuilder) that wraps mobius build() to produce ONNX models from HuggingFace model IDs.

Single-component models (LLMs) → ONNXModelHandler
Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler
EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu)
Precision: fp32 (default), fp16, bf16
Registered in olive_config.json as MobiusModelBuilder
Example pipeline config: examples/gemma4/gemma4_int4_cuda.json
14 unit tests covering single/multi-component, EP detection, and error cases

Validated: Gemma4 INT4 Quantization Pipeline

Successfully tested MobiusModelBuilder → OnnxBlockWiseRtnQuantization on google/gemma-4-E2B-it:

Quantized ops per component

Component	Total nodes	MatMulNBits	GatherBlockQuantized	Other ops
decoder	1,277	277	39	961
audio	1,465	135	0	1,330
vision	1,488	114	66	1,308
embedding	24	0	1	23

Weight quantization coverage

Component	Quantized (UINT8/INT4)	Non-quantized (FP16)	% quantized by size
decoder	316 tensors (2.4G elements)	585 tensors (71M elements)	97%
audio	135 tensors (154M elements)	768 tensors (2.8M elements)	98%
embedding	1 tensor (201M elements)	4 tensors (3.1M elements)	98%
vision	147 tensors (90M elements)	698 tensors (1.7M elements)	98%

Output structure (2.8GB total, down from ~5GB fp16)

models/gemma4-e2b-int4-cuda/
├── decoder.onnx      (853K) + decoder.onnx.data   (2.4G)
├── audio.onnx        (1.2M) + audio.onnx.data     (152M)
├── embedding.onnx    (8.5K) + embedding.onnx.data  (199M)
├── vision.onnx       (1.2M) + vision.onnx.data     (89M)
└── model_config.json

Pipeline timing

Pass	Time
`MobiusModelBuilder` (fp16 build)	77s
`OnnxBlockWiseRtnQuantization` (int4)	129s
Total	~3.5 min

Adds a new Olive pass that wraps mobius's build() function to produce ONNX models directly from HuggingFace model IDs. - Single-component models (LLMs) → ONNXModelHandler - Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler - EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu) - Precision: fp32 (default), fp16, bf16 - Registered in olive_config.json as 'MobiusModelBuilder' - Example pipeline config: examples/gemma4/gemma4_int4_pipeline.json - 10 unit tests covering single/multi-component, EP detection, and error cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

- test_ep_map_covers_common_providers now asserts DML and WebGPU in addition to CPU and CUDA, verifying full EP coverage - Add examples/gemma4/gemma4_fp32_cpu.json showing CPU/fp32 deployment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Use official model IDs: - google/gemma-4-E2B-it and google/gemma-4-E4B-it: Any-to-Any (vision + audio + text) - google/gemma-4-26B-A4B-it and google/gemma-4-31B-it: Image-Text to Text only (no audio encoder) Updated both example configs to use google/gemma-4-E2B-it and added comment strings documenting the audio-capable vs image-only distinction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

…lds) Fix invalid RunConfig fields in both example configs: - Remove output_name and system (not valid engine fields) - Move target reference to engine.target - Use log_severity_level=1 Verified E2E with HuggingFaceTB/SmolLM2-135M-Instruct: - olive run completed successfully - model.onnx + model.onnx.data produced - ORT loaded the model, correct causal-LM I/O (input_ids -> logits + KV cache) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Copilot

Pull request overview

Adds a new ONNX pass (MobiusModelBuilder) that uses the mobius package to build ONNX models directly from HuggingFace model IDs, returning either a single ONNXModelHandler or a CompositeModelHandler for multi-component exports.

Changes:

Introduces olive/passes/onnx/mobius_model_builder.py implementing the new pass (EP mapping, precision mapping, trust_remote_code passthrough).
Registers the pass in olive/olive_config.json and adds two Gemma4 example run configs.
Adds unit tests for single-component, multi-component, EP selection, and error paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`olive/passes/onnx/mobius_model_builder.py`	New pass wrapping `mobius.build()` and emitting Olive model handlers.
`olive/olive_config.json`	Registers `MobiusModelBuilder` and declares extras for its dependencies.
`examples/gemma4/gemma4_int4_pipeline.json`	Example pipeline: mobius export (fp16 CUDA) then INT4 quantization.
`examples/gemma4/gemma4_fp32_cpu.json`	Example pipeline: mobius export (fp32 CPU).
`test/passes/onnx/test_mobius_model_builder.py`	New unit tests for config, handler types, EP mapping, and missing dependency behavior.

@@ -0,0 +1,182 @@
+# -------------------------------------------------------------------------


- _PRECISION_TO_DTYPE: add inline comments explaining each dtype string (f32 = float32, f16 = float16, bf16 = bfloat16) and when to use a downstream quantization pass for INT4/INT8 instead - Remove explicit execution_provider from CUDA example config so both gemma4 configs consistently rely on auto-detection from the accelerator spec; the CPU config already did this - olive_config.json: add mobius-genai to top-level extra_dependencies map so 'olive run' can surface the install hint; remove onnx_ir (transitive dep of mobius-genai) from the pass entry - Move AcceleratorSpec import to TYPE_CHECKING block (RUFF TC001) — safe because the file already has 'from __future__ import annotations' - Use X | Y union syntax instead of Union[X, Y] (RUFF UP007) - Remove redundant 'import onnx_ir' check; ImportError message now correctly says 'pip install mobius-genai' (PYLINT W0611) - Rename unused _fake_pkg 'output_dir' param to '_output_dir' to suppress lint warning (PYLINT W0613) - Wrap long AcceleratorSpec(…) lines to stay under 120 chars (RUFF format) - Collapse nested 'with' into single 'with' (RUFF SIM117) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

- EP_MAP: tighten annotation to ClassVar[dict[ExecutionProvider, str]] (keys are enum instances, not plain strings) - olive_config.json: add onnx-ir (correct pip hyphenated name) to both the pass extra_dependencies and the top-level extra_dependencies map; was previously using wrong underscore spelling 'onnx_ir' - Rename examples/gemma4/gemma4_int4_pipeline.json -> gemma4_int4_cuda.json so both example configs follow the same {precision}_{device}.json naming pattern - _patch_build: expand docstring explaining why 'mobius.build' is the correct patch target (lazy import inside function body, not module-level) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

…delBuilder - After pkg.save(), verify each expected model.onnx exists and raise RuntimeError with a clear message if missing (single-component and per-component in multi-component paths) - Log a WARNING when trust_remote_code=True is passed so users are reminded to only use this with trusted model sources - Add 4 new tests: missing output raises RuntimeError (single and multi-component), trust_remote_code warning emitted, no warning when False (14/14 passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

- Add module-scoped _stub_mobius_module fixture that injects a fake 'mobius' stub into sys.modules when the package is not installed, ensuring patch('mobius.build') works in Olive CI without mobius-genai - Add '# pylint: disable=protected-access' on _default_config test line (PYLINT W0212 — intentional test access to a pass internals method) - Add '# noqa: PLC0415' on lazy 'from mobius import build' inside _run_for_config — import is intentionally deferred to surface a clear ImportError only when the pass actually runs - Run 'lintrunner -a' to auto-apply RUFF-FORMAT and FORMAT-JSON patches on mobius_model_builder.py, test file, and both example configs - 14/14 tests pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Change all references from 'mobius-genai' to 'mobius-ai': - olive_config.json: extra_dependencies key/value and top-level mapping - mobius_model_builder.py: docstring install snippet and ImportError message - test file: fixture docstring comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

lintrunner auto-fixed RUF100 (unused noqa directive) across 15 files. The PLC0415 noqa in mobius_model_builder.py was stale — ruff does not enable PLC0415 in this repo, so the directive was unused. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

jambayk · 2026-04-10T18:26:48Z

+                ),
+            ),
+            "execution_provider": PassConfigParam(
+                type_=str,


we could create an enum of the supported eps for automatic validation like in

Olive/olive/passes/pytorch/autoawq.py

Line 27 in 8b1957e

class ModelDtype(StrEnumBase):

.
unless you think the options might keep growing and it would be hard to keep it in sync across versions

…files to model_attributes Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

…ify test docstring Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Olive's RunConfig uses Pydantic with extra='forbid' on EngineConfig, which causes validation errors when unknown top-level fields like 'comment' are present. Remove them so the configs validate correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Replace GptqQuantizer (requires auto_gptq) with the built-in OnnxBlockWiseRtnQuantization pass which works out of the box. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Replace free-form string with StrEnumBase enum matching the pattern from AutoAWQQuantizer.ModelDtype. Supports: default, cpu, cuda, dml, webgpu, trt-rtx, onnx-standard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Configs moved to microsoft/olive-recipes per repo convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

xiaoyu-work · 2026-04-24T04:34:24Z

will mobius generate genai_config.json and related files for ort genai? Also, does mobius support customized naming for different component for multi components model? I can see all component models are named as "model.onnx"

Add 'runtime' config param (default: ort-genai) that generates genai_config.json, tokenizer files, and processor configs alongside ONNX models via write_ort_genai_config(). Set to 'none' to skip. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

+    )
+
+
+class _combine_patches:


Justin Chu and others added 4 commits April 9, 2026 14:04

Copilot AI review requested due to automatic review settings April 9, 2026 21:24

Copilot started reviewing on behalf of justinchuby April 9, 2026 21:25 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

github-advanced-security AI found potential problems Apr 9, 2026

View reviewed changes

justinchuby marked this pull request as draft April 9, 2026 22:00

Justin Chu and others added 4 commits April 9, 2026 19:56

docs: clarify _patch_build comment on lazy import patch target

8c1259c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

Comment thread examples/gemma4/gemma4_int4_cuda.json Fixed

justinchuby self-assigned this Apr 10, 2026

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

Comment thread olive/passes/onnx/mobius_model_builder.py Fixed

Comment thread olive/passes/onnx/mobius_model_builder.py Fixed

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

Comment thread olive/passes/onnx/mobius_model_builder.py Fixed

justinchuby requested a review from Copilot April 10, 2026 17:36

Copilot started reviewing on behalf of justinchuby April 10, 2026 17:37 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

Comment thread examples/gemma4/gemma4_int4_cuda.json Outdated

Comment thread examples/gemma4/gemma4_int4_cuda.json Outdated

Comment thread olive/passes/onnx/mobius_model_builder.py Outdated

Comment thread olive/passes/onnx/mobius_model_builder.py

justinchuby requested review from jambayk and xiaoyu-work April 10, 2026 17:54