Introduce Generic Structured Output Handling (CrewAI Inspired)#244
Open
ZhengKai91 wants to merge 1 commit intobrowserbase:mainfrom
Open
Introduce Generic Structured Output Handling (CrewAI Inspired)#244ZhengKai91 wants to merge 1 commit intobrowserbase:mainfrom
ZhengKai91 wants to merge 1 commit intobrowserbase:mainfrom
Conversation
There was a problem hiding this comment.
1 issue found across 2 files
Prompt for AI agents (all 1 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="stagehand/llm/client.py">
<violation number="1" location="stagehand/llm/client.py:140">
P1: Catching all `BadRequestError` is too broad. This fallback to `StructuredOutputHandler` will fail if the error was unrelated to `response_format` (e.g., invalid model name, rate limits) or if `response_format` wasn't in the request. Consider checking if `response_format` is present in `filtered_params` before attempting the fallback, or catching a more specific error condition.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
| self.metrics_callback(response, inference_time_ms, function_name) | ||
|
|
||
| return response | ||
| except litellm.BadRequestError as e: |
There was a problem hiding this comment.
P1: Catching all BadRequestError is too broad. This fallback to StructuredOutputHandler will fail if the error was unrelated to response_format (e.g., invalid model name, rate limits) or if response_format wasn't in the request. Consider checking if response_format is present in filtered_params before attempting the fallback, or catching a more specific error condition.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At stagehand/llm/client.py, line 140:
<comment>Catching all `BadRequestError` is too broad. This fallback to `StructuredOutputHandler` will fail if the error was unrelated to `response_format` (e.g., invalid model name, rate limits) or if `response_format` wasn't in the request. Consider checking if `response_format` is present in `filtered_params` before attempting the fallback, or catching a more specific error condition.</comment>
<file context>
@@ -134,6 +137,13 @@ async def create_response(
self.metrics_callback(response, inference_time_ms, function_name)
return response
+ except litellm.BadRequestError as e:
+ handler = StructuredOutputHandler(litellm)
+ response = await handler.handle_structured_inference(**filtered_params)
</file context>
Author
|
@filip-michalsky Would you be able to review this PR? It’d be great to get your feedback before merging. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a robust mechanism to handle Structured Outputs (Pydantic BaseModel enforcement) for LLMs that do not natively support the response_format parameter (e.g., Deepseek Chat, certain open-source models).
Why
Currently, when attempting structured output using the response_format argument (in client.py)with models that don't support it (like deepseek/deepseek-chat), the call to litellm.acompletion fails with a litellm.BadRequestError:
litellm.BadRequestError: DeepseekException - {"error":{"message":"This response_format type is unavailable now","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}
This issue severely limits the range of models compatible with structured output tasks.
What Changed
The core logic has been refactored into a new class, StructuredOutputHandler, adopting a proven strategy implemented by projects like CrewAI (Kudos to the CrewAI team for pioneering this pattern! 👏).
The logic is encapsulated in the StructuredOutputHandler class.
We stop passing the response_format argument to litellm.acompletion.
Instead, the Pydantic Model's schema is parsed, optimized, and injected as a strict System Prompt instruction to guide the LLM's output format (format_messages function).
We introduce a Converter class that manages the post-processing workflow:
It attempts to parse the raw text output into the target Pydantic BaseModel.
If parsing fails (due to partial or invalid JSON), the logic attempts to extract the JSON robustly (handle_partial_json).
If necessary, it initiates retries (up to 3 attempts) by calling the LLM again, asking it to fix the improperly formatted output.
The successfully validated Pydantic BaseModel instance is converted to a Python dictionary (model.model_dump()) and replaces the original text content in the final response object.
Test Plan
The easiest way to verify this fix is by testing with a non-natively-supported model like Deepseek.
Set the environment variable DEEPSEEK_API_KEY.
Modify the model parameter in examples/quickstart.py from the current default to deepseek/deepseek-chat.
Run the example.
Expected Result: The program should now successfully execute the structured inference task and return a validated Pydantic-based dictionary, without encountering the BadRequestError related to the response_format parameter.
Summary by cubic
Adds a generic structured output path that enforces Pydantic models for LLMs without native response_format support (e.g., Deepseek), and updates the client to fall back to this handler on errors. Structured inference now works across more models without BadRequestError.
New Features
Bug Fixes
Written for commit 49f7054. Summary will update automatically on new commits.