Add ORTGenAI backend option to benchmark CLI#2420
Add ORTGenAI backend option to benchmark CLI#2420GopalakrishnanN wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds an explicit backend selection option to the olive benchmark CLI so users can choose between ONNX Runtime and ORTGenAI evaluation backends when benchmarking ONNX inputs, while keeping the default automatic behavior unchanged.
Changes:
- Added
--backend {auto,ort,ortgenai}to the benchmark CLI (default:auto). - Implemented fast, offline validation to reject explicit backends for non-ONNX inputs before any HuggingFace hub checks.
- Added CLI tests to confirm
model_classwiring forortgenaiand to ensure invalid usage fails without hitting the HF hub.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
olive/cli/benchmark.py |
Adds the --backend flag, performs offline ONNX validation, and sets evaluator model_class when backend is explicitly selected. |
test/cli/test_cli.py |
Adds coverage for --backend ortgenai config generation and for early error behavior on non-ONNX inputs without HF hub access. |
6e4e1b3 to
60a5f37
Compare
|
@microsoft-github-policy-service agree company="Microsoft" |
| def _get_run_config(self, tempdir: str) -> dict: | ||
| config = deepcopy(TEMPLATE) | ||
|
|
||
| # Validate --backend before get_input_model_config, which may trigger a |
There was a problem hiding this comment.
the copilot suggestion are overcomplicating things. i think it's better to remove the is_local_onnx_model changes completely. and just check that the input_model_config that you get after line 103 is onnxmodel when args.backend is not auto
7491f39 to
eda6f0b
Compare
|
End-to-end verification on real ONNX model Ran
Confirms the full chain: CLI Side note (not part of this PR): Olive's evaluation cache keys on |
eda6f0b to
236701a
Compare
vraspar
left a comment
There was a problem hiding this comment.
Nice, clean PR. Minimal surface, follows the existing to_replace pattern, and correctly implements jambayk's feedback from #2396. A few actionable items below, none blocking.
1. Missing backward-compat assertion in existing tests
The existing test_benchmark_command_hfmodel and test_benchmark_command_onnxmodel don't assert that model_class is absent from the config when --backend is omitted entirely. Adding assert "model_class" not in config["evaluators"]["evaluator"] to those two tests would lock in the backward-compat guarantee.
2. Cache key follow-up
As you noted in the comments, evaluation cache doesn't include model_class (or tasks, limit, batch_size, etc.), so switching --backend on the same model silently reuses stale results. This PR makes that easier to hit. Not in scope here, but worth a tracking issue.
| type=str, | ||
| default="auto", | ||
| choices=["auto", "ort", "ortgenai"], | ||
| help="Backend for ONNX model evaluation. Use 'auto' to infer backend from model type.", |
There was a problem hiding this comment.
nit: The help string says "Backend for ONNX model evaluation" but auto is also valid (and the default) for HF/PT models. It just falls through to evaluator auto-detection. Consider something like:
"Backend for lm-eval model evaluation. 'ort' and 'ortgenai' require ONNX input. 'auto' infers backend from model type."
| "onnxmodel", | ||
| }, "Only HfModel, PyTorchModel and OnnxModel are supported in benchmark command." | ||
|
|
||
| if self.args.backend != "auto" and input_model_config["type"].lower() != "onnxmodel": |
There was a problem hiding this comment.
Optional: ortgenai requires GenAI-packaged model assets (genai_config.json, etc.), not just any .onnx file. Right now a user can pass --backend ortgenai on a plain ONNX model and get a confusing runtime error deep inside lm_eval.
If you want to keep this simple (and I think you should), maybe just extend the error message or help text to hint at the requirement. No need for asset validation here.
|
Note: this review was generated with help from GitHub Copilot CLI. |
Context
The benchmark command currently defaults to the ONNX Runtime lm-eval model path. Olive already has ORTGenAI lm-eval support in the evaluator layer, but benchmark CLI had no way to select it.
This PR exposes that capability through benchmark CLI while preserving existing defaults.
What This Changes
--backendwith choices:auto(default)ortortgenaimodel_classwhen backend is notauto.--backend autois used (or omitted).--backendis only accepted for ONNX input models.Why This Approach
User-Facing Behavior
Examples:
olive benchmark -m <model> --tasks arc_easyolive benchmark -m <onnx_model> --tasks arc_easy --backend ortolive benchmark -m <onnx_model> --tasks arc_easy --backend ortgenaiIf
--backendis provided for non-ONNX inputs, benchmark now raises a clear error.Tests Added/Updated
--backend ortgenaiand writes evaluatormodel_class=ortgenai.ValueError.Validation
pip install -e .python -m olive --helppython -m olive benchmark --helppython -m pytest test/cli/test_cli.py -k benchmark_command -q