feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export#2406
feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export#2406justinchuby wants to merge 19 commits intomainfrom
Conversation
Adds a new Olive pass that wraps mobius's build() function to produce ONNX models directly from HuggingFace model IDs. - Single-component models (LLMs) → ONNXModelHandler - Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler - EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu) - Precision: fp32 (default), fp16, bf16 - Registered in olive_config.json as 'MobiusModelBuilder' - Example pipeline config: examples/gemma4/gemma4_int4_pipeline.json - 10 unit tests covering single/multi-component, EP detection, and error cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- test_ep_map_covers_common_providers now asserts DML and WebGPU in addition to CPU and CUDA, verifying full EP coverage - Add examples/gemma4/gemma4_fp32_cpu.json showing CPU/fp32 deployment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Use official model IDs: - google/gemma-4-E2B-it and google/gemma-4-E4B-it: Any-to-Any (vision + audio + text) - google/gemma-4-26B-A4B-it and google/gemma-4-31B-it: Image-Text to Text only (no audio encoder) Updated both example configs to use google/gemma-4-E2B-it and added comment strings documenting the audio-capable vs image-only distinction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
…lds) Fix invalid RunConfig fields in both example configs: - Remove output_name and system (not valid engine fields) - Move target reference to engine.target - Use log_severity_level=1 Verified E2E with HuggingFaceTB/SmolLM2-135M-Instruct: - olive run completed successfully - model.onnx + model.onnx.data produced - ORT loaded the model, correct causal-LM I/O (input_ids -> logits + KV cache) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new ONNX pass (MobiusModelBuilder) that uses the mobius package to build ONNX models directly from HuggingFace model IDs, returning either a single ONNXModelHandler or a CompositeModelHandler for multi-component exports.
Changes:
- Introduces
olive/passes/onnx/mobius_model_builder.pyimplementing the new pass (EP mapping, precision mapping, trust_remote_code passthrough). - Registers the pass in
olive/olive_config.jsonand adds two Gemma4 example run configs. - Adds unit tests for single-component, multi-component, EP selection, and error paths.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
olive/passes/onnx/mobius_model_builder.py |
New pass wrapping mobius.build() and emitting Olive model handlers. |
olive/olive_config.json |
Registers MobiusModelBuilder and declares extras for its dependencies. |
examples/gemma4/gemma4_int4_pipeline.json |
Example pipeline: mobius export (fp16 CUDA) then INT4 quantization. |
examples/gemma4/gemma4_fp32_cpu.json |
Example pipeline: mobius export (fp32 CPU). |
test/passes/onnx/test_mobius_model_builder.py |
New unit tests for config, handler types, EP mapping, and missing dependency behavior. |
| @@ -0,0 +1,182 @@ | |||
| # ------------------------------------------------------------------------- | |||
- _PRECISION_TO_DTYPE: add inline comments explaining each dtype string (f32 = float32, f16 = float16, bf16 = bfloat16) and when to use a downstream quantization pass for INT4/INT8 instead - Remove explicit execution_provider from CUDA example config so both gemma4 configs consistently rely on auto-detection from the accelerator spec; the CPU config already did this - olive_config.json: add mobius-genai to top-level extra_dependencies map so 'olive run' can surface the install hint; remove onnx_ir (transitive dep of mobius-genai) from the pass entry - Move AcceleratorSpec import to TYPE_CHECKING block (RUFF TC001) — safe because the file already has 'from __future__ import annotations' - Use X | Y union syntax instead of Union[X, Y] (RUFF UP007) - Remove redundant 'import onnx_ir' check; ImportError message now correctly says 'pip install mobius-genai' (PYLINT W0611) - Rename unused _fake_pkg 'output_dir' param to '_output_dir' to suppress lint warning (PYLINT W0613) - Wrap long AcceleratorSpec(…) lines to stay under 120 chars (RUFF format) - Collapse nested 'with' into single 'with' (RUFF SIM117) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- EP_MAP: tighten annotation to ClassVar[dict[ExecutionProvider, str]]
(keys are enum instances, not plain strings)
- olive_config.json: add onnx-ir (correct pip hyphenated name) to both
the pass extra_dependencies and the top-level extra_dependencies map;
was previously using wrong underscore spelling 'onnx_ir'
- Rename examples/gemma4/gemma4_int4_pipeline.json ->
gemma4_int4_cuda.json so both example configs follow the same
{precision}_{device}.json naming pattern
- _patch_build: expand docstring explaining why 'mobius.build' is the
correct patch target (lazy import inside function body, not module-level)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
…delBuilder - After pkg.save(), verify each expected model.onnx exists and raise RuntimeError with a clear message if missing (single-component and per-component in multi-component paths) - Log a WARNING when trust_remote_code=True is passed so users are reminded to only use this with trusted model sources - Add 4 new tests: missing output raises RuntimeError (single and multi-component), trust_remote_code warning emitted, no warning when False (14/14 passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- Add module-scoped _stub_mobius_module fixture that injects a fake
'mobius' stub into sys.modules when the package is not installed,
ensuring patch('mobius.build') works in Olive CI without mobius-genai
- Add '# pylint: disable=protected-access' on _default_config test line
(PYLINT W0212 — intentional test access to a pass internals method)
- Add '# noqa: PLC0415' on lazy 'from mobius import build' inside
_run_for_config — import is intentionally deferred to surface a clear
ImportError only when the pass actually runs
- Run 'lintrunner -a' to auto-apply RUFF-FORMAT and FORMAT-JSON patches
on mobius_model_builder.py, test file, and both example configs
- 14/14 tests pass
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Change all references from 'mobius-genai' to 'mobius-ai': - olive_config.json: extra_dependencies key/value and top-level mapping - mobius_model_builder.py: docstring install snippet and ImportError message - test file: fixture docstring comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
lintrunner auto-fixed RUF100 (unused noqa directive) across 15 files. The PLC0415 noqa in mobius_model_builder.py was stale — ruff does not enable PLC0415 in this repo, so the directive was unused. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
| ), | ||
| ), | ||
| "execution_provider": PassConfigParam( | ||
| type_=str, |
There was a problem hiding this comment.
we could create an enum of the supported eps for automatic validation like in
Olive/olive/passes/pytorch/autoawq.py
Line 27 in 8b1957e
unless you think the options might keep growing and it would be hard to keep it in sync across versions
…files to model_attributes Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
…ify test docstring Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Olive's RunConfig uses Pydantic with extra='forbid' on EngineConfig, which causes validation errors when unknown top-level fields like 'comment' are present. Remove them so the configs validate correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
Replace GptqQuantizer (requires auto_gptq) with the built-in OnnxBlockWiseRtnQuantization pass which works out of the box. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
Replace free-form string with StrEnumBase enum matching the pattern from AutoAWQQuantizer.ModelDtype. Supports: default, cpu, cuda, dml, webgpu, trt-rtx, onnx-standard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
21ab3e2 to
2af889f
Compare
Configs moved to microsoft/olive-recipes per repo convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
|
will mobius generate genai_config.json and related files for ort genai? Also, does mobius support customized naming for different component for multi components model? I can see all component models are named as "model.onnx" |
Add 'runtime' config param (default: ort-genai) that generates genai_config.json, tokenizer files, and processor configs alongside ONNX models via write_ort_genai_config(). Set to 'none' to skip. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
dcce655 to
68ed349
Compare
| ) | ||
|
|
||
|
|
||
| class _combine_patches: |
Summary
Adds a new Olive pass (
MobiusModelBuilder) that wraps mobiusbuild()to produce ONNX models from HuggingFace model IDs.ONNXModelHandlerCompositeModelHandlerolive_config.jsonasMobiusModelBuilderexamples/gemma4/gemma4_int4_cuda.jsonValidated: Gemma4 INT4 Quantization Pipeline
Successfully tested
MobiusModelBuilder→OnnxBlockWiseRtnQuantizationongoogle/gemma-4-E2B-it:Quantized ops per component
Weight quantization coverage
Output structure (2.8GB total, down from ~5GB fp16)
Pipeline timing
MobiusModelBuilder(fp16 build)OnnxBlockWiseRtnQuantization(int4)