Skip to content

feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export#2406

Open
justinchuby wants to merge 19 commits intomainfrom
justinchu/mobius-model-builder
Open

feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export#2406
justinchuby wants to merge 19 commits intomainfrom
justinchu/mobius-model-builder

Conversation

@justinchuby
Copy link
Copy Markdown
Contributor

@justinchuby justinchuby commented Apr 9, 2026

Summary

Adds a new Olive pass (MobiusModelBuilder) that wraps mobius build() to produce ONNX models from HuggingFace model IDs.

  • Single-component models (LLMs) → ONNXModelHandler
  • Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler
  • EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu)
  • Precision: fp32 (default), fp16, bf16
  • Registered in olive_config.json as MobiusModelBuilder
  • Example pipeline config: examples/gemma4/gemma4_int4_cuda.json
  • 14 unit tests covering single/multi-component, EP detection, and error cases

Validated: Gemma4 INT4 Quantization Pipeline

Successfully tested MobiusModelBuilderOnnxBlockWiseRtnQuantization on google/gemma-4-E2B-it:

Quantized ops per component

Component Total nodes MatMulNBits GatherBlockQuantized Other ops
decoder 1,277 277 39 961
audio 1,465 135 0 1,330
vision 1,488 114 66 1,308
embedding 24 0 1 23

Weight quantization coverage

Component Quantized (UINT8/INT4) Non-quantized (FP16) % quantized by size
decoder 316 tensors (2.4G elements) 585 tensors (71M elements) 97%
audio 135 tensors (154M elements) 768 tensors (2.8M elements) 98%
embedding 1 tensor (201M elements) 4 tensors (3.1M elements) 98%
vision 147 tensors (90M elements) 698 tensors (1.7M elements) 98%

Output structure (2.8GB total, down from ~5GB fp16)

models/gemma4-e2b-int4-cuda/
├── decoder.onnx      (853K) + decoder.onnx.data   (2.4G)
├── audio.onnx        (1.2M) + audio.onnx.data     (152M)
├── embedding.onnx    (8.5K) + embedding.onnx.data  (199M)
├── vision.onnx       (1.2M) + vision.onnx.data     (89M)
└── model_config.json

Pipeline timing

Pass Time
MobiusModelBuilder (fp16 build) 77s
OnnxBlockWiseRtnQuantization (int4) 129s
Total ~3.5 min

Justin Chu and others added 4 commits April 9, 2026 14:04
Adds a new Olive pass that wraps mobius's build() function to produce
ONNX models directly from HuggingFace model IDs.

- Single-component models (LLMs) → ONNXModelHandler
- Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler
- EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu)
- Precision: fp32 (default), fp16, bf16
- Registered in olive_config.json as 'MobiusModelBuilder'
- Example pipeline config: examples/gemma4/gemma4_int4_pipeline.json
- 10 unit tests covering single/multi-component, EP detection, and error cases

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- test_ep_map_covers_common_providers now asserts DML and WebGPU in
  addition to CPU and CUDA, verifying full EP coverage
- Add examples/gemma4/gemma4_fp32_cpu.json showing CPU/fp32 deployment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Use official model IDs:
- google/gemma-4-E2B-it and google/gemma-4-E4B-it: Any-to-Any
  (vision + audio + text)
- google/gemma-4-26B-A4B-it and google/gemma-4-31B-it: Image-Text
  to Text only (no audio encoder)

Updated both example configs to use google/gemma-4-E2B-it and added
comment strings documenting the audio-capable vs image-only distinction.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
…lds)

Fix invalid RunConfig fields in both example configs:
- Remove output_name and system (not valid engine fields)
- Move target reference to engine.target
- Use log_severity_level=1

Verified E2E with HuggingFaceTB/SmolLM2-135M-Instruct:
- olive run completed successfully
- model.onnx + model.onnx.data produced
- ORT loaded the model, correct causal-LM I/O (input_ids -> logits + KV cache)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Copilot AI review requested due to automatic review settings April 9, 2026 21:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ONNX pass (MobiusModelBuilder) that uses the mobius package to build ONNX models directly from HuggingFace model IDs, returning either a single ONNXModelHandler or a CompositeModelHandler for multi-component exports.

Changes:

  • Introduces olive/passes/onnx/mobius_model_builder.py implementing the new pass (EP mapping, precision mapping, trust_remote_code passthrough).
  • Registers the pass in olive/olive_config.json and adds two Gemma4 example run configs.
  • Adds unit tests for single-component, multi-component, EP selection, and error paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
olive/passes/onnx/mobius_model_builder.py New pass wrapping mobius.build() and emitting Olive model handlers.
olive/olive_config.json Registers MobiusModelBuilder and declares extras for its dependencies.
examples/gemma4/gemma4_int4_pipeline.json Example pipeline: mobius export (fp16 CUDA) then INT4 quantization.
examples/gemma4/gemma4_fp32_cpu.json Example pipeline: mobius export (fp32 CPU).
test/passes/onnx/test_mobius_model_builder.py New unit tests for config, handler types, EP mapping, and missing dependency behavior.

Comment thread olive/olive_config.json
Comment thread test/passes/onnx/test_mobius_model_builder.py Outdated
Comment thread test/passes/onnx/test_mobius_model_builder.py Outdated
Comment thread test/passes/onnx/test_mobius_model_builder.py Outdated
Comment thread test/passes/onnx/test_mobius_model_builder.py Outdated
Comment thread olive/passes/onnx/mobius_model_builder.py
Comment thread examples/gemma4/gemma4_fp32_cpu.json Fixed
Comment thread examples/gemma4/gemma4_int4_cuda.json Fixed
@@ -0,0 +1,182 @@
# -------------------------------------------------------------------------
Comment thread olive/passes/onnx/mobius_model_builder.py Fixed
Comment thread olive/passes/onnx/mobius_model_builder.py Fixed
Comment thread olive/passes/onnx/mobius_model_builder.py Fixed
Comment thread test/passes/onnx/test_mobius_model_builder.py Fixed
Comment thread test/passes/onnx/test_mobius_model_builder.py Fixed
Comment thread test/passes/onnx/test_mobius_model_builder.py Fixed
@justinchuby justinchuby marked this pull request as draft April 9, 2026 22:00
Justin Chu and others added 4 commits April 9, 2026 19:56
- _PRECISION_TO_DTYPE: add inline comments explaining each dtype string
  (f32 = float32, f16 = float16, bf16 = bfloat16) and when to use a
  downstream quantization pass for INT4/INT8 instead
- Remove explicit execution_provider from CUDA example config so both
  gemma4 configs consistently rely on auto-detection from the accelerator
  spec; the CPU config already did this
- olive_config.json: add mobius-genai to top-level extra_dependencies map
  so 'olive run' can surface the install hint; remove onnx_ir (transitive
  dep of mobius-genai) from the pass entry
- Move AcceleratorSpec import to TYPE_CHECKING block (RUFF TC001) —
  safe because the file already has 'from __future__ import annotations'
- Use X | Y union syntax instead of Union[X, Y] (RUFF UP007)
- Remove redundant 'import onnx_ir' check; ImportError message now
  correctly says 'pip install mobius-genai' (PYLINT W0611)
- Rename unused _fake_pkg 'output_dir' param to '_output_dir' to
  suppress lint warning (PYLINT W0613)
- Wrap long AcceleratorSpec(…) lines to stay under 120 chars (RUFF format)
- Collapse nested 'with' into single 'with' (RUFF SIM117)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- EP_MAP: tighten annotation to ClassVar[dict[ExecutionProvider, str]]
  (keys are enum instances, not plain strings)
- olive_config.json: add onnx-ir (correct pip hyphenated name) to both
  the pass extra_dependencies and the top-level extra_dependencies map;
  was previously using wrong underscore spelling 'onnx_ir'
- Rename examples/gemma4/gemma4_int4_pipeline.json ->
  gemma4_int4_cuda.json so both example configs follow the same
  {precision}_{device}.json naming pattern
- _patch_build: expand docstring explaining why 'mobius.build' is the
  correct patch target (lazy import inside function body, not module-level)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
…delBuilder

- After pkg.save(), verify each expected model.onnx exists and raise
  RuntimeError with a clear message if missing (single-component and
  per-component in multi-component paths)
- Log a WARNING when trust_remote_code=True is passed so users are
  reminded to only use this with trusted model sources
- Add 4 new tests: missing output raises RuntimeError (single and
  multi-component), trust_remote_code warning emitted, no warning
  when False (14/14 passing)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Comment thread examples/gemma4/gemma4_int4_cuda.json Fixed
@justinchuby justinchuby self-assigned this Apr 10, 2026
- Add module-scoped _stub_mobius_module fixture that injects a fake
  'mobius' stub into sys.modules when the package is not installed,
  ensuring patch('mobius.build') works in Olive CI without mobius-genai
- Add '# pylint: disable=protected-access' on _default_config test line
  (PYLINT W0212 — intentional test access to a pass internals method)
- Add '# noqa: PLC0415' on lazy 'from mobius import build' inside
  _run_for_config — import is intentionally deferred to surface a clear
  ImportError only when the pass actually runs
- Run 'lintrunner -a' to auto-apply RUFF-FORMAT and FORMAT-JSON patches
  on mobius_model_builder.py, test file, and both example configs
- 14/14 tests pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Comment thread olive/passes/onnx/mobius_model_builder.py Fixed
Comment thread olive/passes/onnx/mobius_model_builder.py Fixed
Change all references from 'mobius-genai' to 'mobius-ai':
- olive_config.json: extra_dependencies key/value and top-level mapping
- mobius_model_builder.py: docstring install snippet and ImportError message
- test file: fixture docstring comment

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Comment thread olive/passes/onnx/mobius_model_builder.py Fixed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comment thread examples/gemma4/gemma4_int4_cuda.json Outdated
Comment thread examples/gemma4/gemma4_int4_cuda.json Outdated
Comment thread olive/passes/onnx/mobius_model_builder.py Outdated
Comment thread olive/passes/onnx/mobius_model_builder.py
lintrunner auto-fixed RUF100 (unused noqa directive) across 15 files.
The PLC0415 noqa in mobius_model_builder.py was stale — ruff does not
enable PLC0415 in this repo, so the directive was unused.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Comment thread examples/gemma4/gemma4_fp32_cpu.json Outdated
Comment thread olive/passes/onnx/mobius_model_builder.py Outdated
),
),
"execution_provider": PassConfigParam(
type_=str,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could create an enum of the supported eps for automatic validation like in

class ModelDtype(StrEnumBase):
.
unless you think the options might keep growing and it would be hard to keep it in sync across versions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread olive/passes/onnx/mobius_model_builder.py
Comment thread examples/gemma4/gemma4_int4_cuda.json Outdated
Copilot AI and others added 2 commits April 10, 2026 19:18
…files to model_attributes

Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
…ify test docstring

Agent-Logs-Url: https://github.com/microsoft/Olive/sessions/d99664b1-ed7e-44a8-b3a1-4efbc09c7259

Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Comment thread olive/cli/optimize.py Outdated
justinchuby and others added 3 commits April 23, 2026 22:58
Olive's RunConfig uses Pydantic with extra='forbid' on EngineConfig,
which causes validation errors when unknown top-level fields like
'comment' are present. Remove them so the configs validate correctly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
Replace GptqQuantizer (requires auto_gptq) with the built-in
OnnxBlockWiseRtnQuantization pass which works out of the box.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
Replace free-form string with StrEnumBase enum matching the
pattern from AutoAWQQuantizer.ModelDtype. Supports: default, cpu,
cuda, dml, webgpu, trt-rtx, onnx-standard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
@justinchuby justinchuby force-pushed the justinchu/mobius-model-builder branch from 21ab3e2 to 2af889f Compare April 23, 2026 23:39
justinchuby and others added 2 commits April 23, 2026 23:41
Configs moved to microsoft/olive-recipes per repo convention.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
@xiaoyu-work
Copy link
Copy Markdown
Collaborator

will mobius generate genai_config.json and related files for ort genai? Also, does mobius support customized naming for different component for multi components model? I can see all component models are named as "model.onnx"

Add 'runtime' config param (default: ort-genai) that generates
genai_config.json, tokenizer files, and processor configs alongside
ONNX models via write_ort_genai_config(). Set to 'none' to skip.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchu@microsoft.com>
@justinchuby justinchuby force-pushed the justinchu/mobius-model-builder branch from dcce655 to 68ed349 Compare April 24, 2026 04:41
)


class _combine_patches:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants