Performance Benchmarking suite comparing llama.cpp and OpenVINO GenAI

### Objective
Create a benchmarking suite that systematically compares popular LLMs across llama.cpp and OpenVINO GenAI, spanning multiple runtimes and hardware backends (CPU/GPU/NPU). The goal is to establish reproducible benchmarks that capture performance profiles and hardware utilization, helping the community understand model and backend strengths.

### Summary of Proposed Benchmark
- **Models:** ~15 leading LLMs, including Llama, Qwen, DeepSeek, Phi, Gemma,etc .
- **Frameworks:** 
  - OpenVINO GenAI (IR & GGUF)
  - llama.cpp (GGUF, Q4_0)
- Benchmarks run via OpenVINO GenAI `llm_bench` and llama.cpp `llama-bench`, covering backends: llama.cpp default CPU, Vulkan, and OpenVINO CPU/GPU/NPU.
- **Metrics:** Load/compile times, prompt evaluation speed, TTFT, token generation speed, memory use, quantization config, plus hardware and software details.
- **Output:** Tabular benchmarking results, observations, and reproducibility instructions/scripts.

---

## Intent
- Help guide users on model/framework/device best practices
- Expose any gaps or optimization opportunities
- Build a resource for others to contribute/compare performance


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Benchmarking suite comparing llama.cpp and OpenVINO GenAI #166

Objective

Summary of Proposed Benchmark

Intent

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance Benchmarking suite comparing llama.cpp and OpenVINO GenAI #166

Description

Objective

Summary of Proposed Benchmark

Intent

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions