Skip to content

Performance Benchmarking suite comparing llama.cpp and OpenVINO GenAI #166

@ravi9

Description

@ravi9

Objective

Create a benchmarking suite that systematically compares popular LLMs across llama.cpp and OpenVINO GenAI, spanning multiple runtimes and hardware backends (CPU/GPU/NPU). The goal is to establish reproducible benchmarks that capture performance profiles and hardware utilization, helping the community understand model and backend strengths.

Summary of Proposed Benchmark

  • Models: ~15 leading LLMs, including Llama, Qwen, DeepSeek, Phi, Gemma,etc .
  • Frameworks:
    • OpenVINO GenAI (IR & GGUF)
    • llama.cpp (GGUF, Q4_0)
  • Benchmarks run via OpenVINO GenAI llm_bench and llama.cpp llama-bench, covering backends: llama.cpp default CPU, Vulkan, and OpenVINO CPU/GPU/NPU.
  • Metrics: Load/compile times, prompt evaluation speed, TTFT, token generation speed, memory use, quantization config, plus hardware and software details.
  • Output: Tabular benchmarking results, observations, and reproducibility instructions/scripts.

Intent

  • Help guide users on model/framework/device best practices
  • Expose any gaps or optimization opportunities
  • Build a resource for others to contribute/compare performance

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions