Skip to content

Add ORTGenAI backend option to benchmark CLI#2420

Open
GopalakrishnanN wants to merge 1 commit intomainfrom
dev/AddORTGenAIBackEndOption
Open

Add ORTGenAI backend option to benchmark CLI#2420
GopalakrishnanN wants to merge 1 commit intomainfrom
dev/AddORTGenAIBackEndOption

Conversation

@GopalakrishnanN
Copy link
Copy Markdown

@GopalakrishnanN GopalakrishnanN commented Apr 17, 2026

Context

The benchmark command currently defaults to the ONNX Runtime lm-eval model path. Olive already has ORTGenAI lm-eval support in the evaluator layer, but benchmark CLI had no way to select it.

This PR exposes that capability through benchmark CLI while preserving existing defaults.

What This Changes

  • Adds a new benchmark CLI argument: --backend with choices:
    • auto (default)
    • ort
    • ortgenai
  • Wires explicit backend selection into generated workflow config by setting evaluator model_class when backend is not auto.
  • Keeps current behavior unchanged when --backend auto is used (or omitted).
  • Adds validation: explicit --backend is only accepted for ONNX input models.

Why This Approach

  • Non-breaking by default: existing benchmark flows continue to infer model class automatically.
  • Minimal change surface: only benchmark CLI config generation and tests are touched.
  • Leverages existing evaluator support rather than introducing new runtime logic.

User-Facing Behavior

Examples:

  • Existing behavior (unchanged):
    • olive benchmark -m <model> --tasks arc_easy
  • Explicit ORT:
    • olive benchmark -m <onnx_model> --tasks arc_easy --backend ort
  • Explicit ORTGenAI:
    • olive benchmark -m <onnx_model> --tasks arc_easy --backend ortgenai

If --backend is provided for non-ONNX inputs, benchmark now raises a clear error.

Tests Added/Updated

  • Verifies ONNX benchmark accepts --backend ortgenai and writes evaluator model_class=ortgenai.
  • Verifies non-ONNX model with explicit backend raises expected ValueError.
  • Existing benchmark tests continue to pass.

Validation

  • pip install -e .
  • python -m olive --help
  • python -m olive benchmark --help
  • python -m pytest test/cli/test_cli.py -k benchmark_command -q

Copilot AI review requested due to automatic review settings April 17, 2026 18:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an explicit backend selection option to the olive benchmark CLI so users can choose between ONNX Runtime and ORTGenAI evaluation backends when benchmarking ONNX inputs, while keeping the default automatic behavior unchanged.

Changes:

  • Added --backend {auto,ort,ortgenai} to the benchmark CLI (default: auto).
  • Implemented fast, offline validation to reject explicit backends for non-ONNX inputs before any HuggingFace hub checks.
  • Added CLI tests to confirm model_class wiring for ortgenai and to ensure invalid usage fails without hitting the HF hub.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
olive/cli/benchmark.py Adds the --backend flag, performs offline ONNX validation, and sets evaluator model_class when backend is explicitly selected.
test/cli/test_cli.py Adds coverage for --backend ortgenai config generation and for early error behavior on non-ONNX inputs without HF hub access.

Comment thread olive/cli/benchmark.py Outdated
@GopalakrishnanN GopalakrishnanN force-pushed the dev/AddORTGenAIBackEndOption branch from 6e4e1b3 to 60a5f37 Compare April 17, 2026 18:41
@GopalakrishnanN
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree company="Microsoft"

Comment thread olive/cli/benchmark.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread olive/cli/benchmark.py Outdated
Comment thread olive/cli/benchmark.py
Comment thread olive/cli/benchmark.py Outdated
def _get_run_config(self, tempdir: str) -> dict:
config = deepcopy(TEMPLATE)

# Validate --backend before get_input_model_config, which may trigger a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the copilot suggestion are overcomplicating things. i think it's better to remove the is_local_onnx_model changes completely. and just check that the input_model_config that you get after line 103 is onnxmodel when args.backend is not auto

@GopalakrishnanN GopalakrishnanN force-pushed the dev/AddORTGenAIBackEndOption branch 4 times, most recently from 7491f39 to eda6f0b Compare April 23, 2026 01:22
@GopalakrishnanN
Copy link
Copy Markdown
Author

End-to-end verification on real ONNX model

Ran olive benchmark against microsoft/Phi-3-mini-4k-instruct-onnx (cpu-int4-rtn-block-32-acc-level-4, GenAI-packaged with genai_config.json) with --tasks arc_easy --device cpu --limit 5 --batch_size 1, exercising both backends end-to-end.

--backend ortgenai --backend ort
Generated workflow model_class "ortgenai" "ort"
Loglikelihood requests 20/20 @ 1.44 it/s 20/20 @ 1.43 it/s
arc_easy-acc / acc_norm 0.6 / 0.6 0.6 / 0.6
Underlying runtime og.Config / og.Model / og.Generator (LMEvalORTGenAIEvaluator) onnxruntime.InferenceSession (LMEvalORTEvaluator)

Confirms the full chain: CLI --backend flag -> workflow config evaluators.evaluator.model_class -> LMEvaluator.evaluate() -> lm_eval.api.registry.get_model(model_class) -> @register_model("ort") / @register_model("ortgenai") class in olive/evaluator/lmeval_ort.py. Accuracy numbers agree across backends on the same weights, validating both paths produce sensible results.

Side note (not part of this PR): Olive's evaluation cache keys on model_id and does not include model_class; back-to-back runs with different --backend values on the same model currently reuse the first result unless .olive-cache/<workflow>/evaluations/ is cleared. Flagging for a possible follow-up.

@GopalakrishnanN GopalakrishnanN force-pushed the dev/AddORTGenAIBackEndOption branch from eda6f0b to 236701a Compare April 23, 2026 19:54
Copy link
Copy Markdown

@vraspar vraspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, clean PR. Minimal surface, follows the existing to_replace pattern, and correctly implements jambayk's feedback from #2396. A few actionable items below, none blocking.

1. Missing backward-compat assertion in existing tests
The existing test_benchmark_command_hfmodel and test_benchmark_command_onnxmodel don't assert that model_class is absent from the config when --backend is omitted entirely. Adding assert "model_class" not in config["evaluators"]["evaluator"] to those two tests would lock in the backward-compat guarantee.

2. Cache key follow-up
As you noted in the comments, evaluation cache doesn't include model_class (or tasks, limit, batch_size, etc.), so switching --backend on the same model silently reuses stale results. This PR makes that easier to hit. Not in scope here, but worth a tracking issue.

Comment thread olive/cli/benchmark.py
type=str,
default="auto",
choices=["auto", "ort", "ortgenai"],
help="Backend for ONNX model evaluation. Use 'auto' to infer backend from model type.",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The help string says "Backend for ONNX model evaluation" but auto is also valid (and the default) for HF/PT models. It just falls through to evaluator auto-detection. Consider something like:

"Backend for lm-eval model evaluation. 'ort' and 'ortgenai' require ONNX input. 'auto' infers backend from model type."

Comment thread olive/cli/benchmark.py
"onnxmodel",
}, "Only HfModel, PyTorchModel and OnnxModel are supported in benchmark command."

if self.args.backend != "auto" and input_model_config["type"].lower() != "onnxmodel":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: ortgenai requires GenAI-packaged model assets (genai_config.json, etc.), not just any .onnx file. Right now a user can pass --backend ortgenai on a plain ONNX model and get a confusing runtime error deep inside lm_eval.

If you want to keep this simple (and I think you should), maybe just extend the error message or help text to hint at the requirement. No need for asset validation here.

@vraspar
Copy link
Copy Markdown

vraspar commented Apr 24, 2026

Note: this review was generated with help from GitHub Copilot CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants