Skip to content

feat: add [ LiteLLM AI Gateway ] for provider independence#186

Open
Aarish Alam (RheagalFire) wants to merge 3 commits intobraintrustdata:mainfrom
RheagalFire:feat/add-litellm-provider
Open

feat: add [ LiteLLM AI Gateway ] for provider independence#186
Aarish Alam (RheagalFire) wants to merge 3 commits intobraintrustdata:mainfrom
RheagalFire:feat/add-litellm-provider

Conversation

@RheagalFire
Copy link
Copy Markdown

@RheagalFire Aarish Alam (RheagalFire) commented Apr 21, 2026

Summary

  • Add LiteLLMClient / AsyncLiteLLMClient in py/autoevals/litellm.py: OpenAI-compatible adapters backed by litellm.completion() / litellm.acompletion() (plus embeddings and moderation).
  • Export both from py/autoevals/init.py so users can do init(client=LiteLLMClient()).
  • Add litellm to extras_require: install with pip install 'autoevals[litellm]'.
  • Add py/autoevals/test_litellm.py with 9 mocked unit tests covering chat, embeddings, moderation, async, end-to-end init() wiring, and the Responses-API shim.
  • Followup commit adds a Responses-API shim in LiteLLMClient.responses.create / AsyncLiteLLMClient.responses.create. Without it, init(client=LiteLLMClient()) with autoevals' default gpt-5-mini model would crash: oai.py routes gpt-5 models through prepare_responses_params which sends input=... and a flat tool schema, but litellm.completion expects messages=... with nested tool schema. The shim translates back.

Fits cleanly into the existing LLMClient architecture (py/autoevals/oai.py:129) which is duck-typed on the OpenAI v1 protocol. The adapter implements that surface; no changes to core.

Changes

  • py/autoevals/litellm.py: LiteLLMClient / AsyncLiteLLMClient + _LiteLLMResponses adapter that translates Responses-API params (input=, flat tool schema) back to Chat-Completions params (messages=, nested tool schema) before calling litellm.completion.

  • py/autoevals/init.py: re-exports the new clients.

  • setup.py: litellm optional extra.

  • py/autoevals/test_litellm.py: 9 mocked tests (adds coverage for Responses-API shim input→messages translation and flat→nested tool-schema translation).

    Testing & Usage

Unit tests (all pass):

  $ pytest py/autoevals/test_litellm.py -v
  py/autoevals/test_litellm.py::test_litellm_client_exposes_openai_v1_surface PASSED                                                                                                                                                                                                                                                                                      
  py/autoevals/test_litellm.py::test_litellm_chat_completions_forwards_to_litellm PASSED                                                                                                                                                                                                                                                                                  
  py/autoevals/test_litellm.py::test_litellm_client_without_api_key_does_not_forward_key PASSED                                                                                                                                                                                                                                                                           
  py/autoevals/test_litellm.py::test_litellm_embeddings_forwards_to_litellm PASSED                                                                                                                                                                                                                                                                                        
  py/autoevals/test_litellm.py::test_litellm_moderations_forwards_to_litellm PASSED                                                                                                                                                                                                                                                                                       
  py/autoevals/test_litellm.py::test_litellm_responses_create_translates_input_to_messages PASSED                                                                                                                                                                                                                                                                         
  py/autoevals/test_litellm.py::test_litellm_responses_create_translates_responses_api_tool_schema PASSED                                                                                                                                                                                                                                                                 
  py/autoevals/test_litellm.py::test_async_litellm_chat_completions_forwards PASSED                                                                                                                                                                                                                                                                                       
  py/autoevals/test_litellm.py::test_init_accepts_litellm_client PASSED                                                                                                                                                                                                                                                                                                   
  ============================== 9 passed in 0.61s ===============================                                                                                                                                                                                                                                                                                           

Live end-to-end smoke test against Azure OpenAI (azure/gpt-4o):

  [Test 1] LiteLLMClient.chat.completions.create, model=azure/gpt-4o                                                                                                                                                                                                                                                                                                      
    response: '4'
  [Test 2] Factuality scorer with init(client=LiteLLMClient())
    score: 0.6
    metadata: {'choice': 'B', 'rationale': 'Step 1: The expert answer states "George Washington." ... Step 3: Therefore, the submitted answer includes the information found in the expert answer and expresses it in a broader form, but remains fully consistent with the expert answer. Conclusion: The submitted answer is a superset of the expert answer and is
  fully consistent with it.'}
  
  [Test 3] Responses-API shim: client.responses.create(input=..., model=azure/gpt-4o)                                                                                                                                                                                                                                                                                     
           (Path autoevals takes for gpt-5 models. Shim translates input=                                                                                                                                                                                                                                                                                                 
           back to messages= before calling litellm.completion.)                                                                                                                                                                                                                                                                                                          
    response: '10'                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                                                                                                          
  autoevals LiteLLM live test PASSED (chat + scorer + responses-shim).                                                                                                                                                                                                                                                                                                                                                

This exercised three paths. (1) raw chat.completions.create routed to litellm.completion. (2) full scorer path init(client=LiteLLMClient()) → Factuality.eval() → LLMClient.complete → shim → litellm.completion → parsed score with rationale. (3) Responses-API shim with input=... kwarg, which translates to messages=... before reaching LiteLLM (exercises the fix for the default gpt-5-mini routing).

Example usage

from autoevals import init
from autoevals.litellm import LiteLLMClient
from autoevals.llm import Factuality

init(
client=LiteLLMClient(),
default_model="anthropic/claude-3-5-sonnet-20241022",
)

evaluator = Factuality()
result = evaluator.eval(input="...", output="...", expected="...")

init(client=LiteLLMClient(), default_model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0")
init(client=LiteLLMClient(), default_model="gemini/gemini-1.5-pro")
init(client=LiteLLMClient(), default_model="ollama/llama3")

from autoevals.litellm import AsyncLiteLLMClient
init(client=AsyncLiteLLMClient(), default_model="openai/gpt-4o-mini")

@RheagalFire
Copy link
Copy Markdown
Author

cc Ankur Goyal (@ankrgyl) Olmo Maldonado (@ibolmo). would like your review here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant