Skip to content

Fix blank responses on reasoning models#13

Open
matteo-brandolino wants to merge 1 commit intoregolo-ai:masterfrom
matteo-brandolino:fix/reasoning-display
Open

Fix blank responses on reasoning models#13
matteo-brandolino wants to merge 1 commit intoregolo-ai:masterfrom
matteo-brandolino:fix/reasoning-display

Conversation

@matteo-brandolino
Copy link

Problem

When using regolo chat with reasoning models, the assistant appeared completely frozen — no output was printed,
neither during the thinking phase nor for the final response.

Two independent bugs caused this:

  1. max_tokens too low

run_chat had a hardcoded default of max_tokens=200. Reasoning models spend a large portion of the token budget on internal thinking
(reasoning_content) before producing the visible answer. With only 200 tokens, the budget was exhausted during reasoning, leaving 0
tokens for the final response.

  1. reasoning_content silently ignored

handle_search_text_chat_completions only read delta.get("content") and discarded reasoning_content entirely. During the whole thinking
phase, nothing was printed — the assistant looked stuck.


Solution

src/regolo/client/regolo_client.py

  • Added an inner resolve() function that distinguishes content from reasoning_content and returns a "thinking" role tag for reasoning
    tokens.
  • Added _state["in_reasoning"] to track the transition between thinking and answer phases.

src/regolo/cli.py

  • Added --max-tokens flag (default: 2048, user-configurable) forwarded to run_chat.
  • Added differentiated rendering for the thinking phase:
    • Thinking... header (dim + italic) on first reasoning token
    • Reasoning text rendered in dim style
    • Visual separator ───────────────────── on transition to the final answer

Test Plan

  • Run regolo chat with a reasoning model ) and verify thinking tokens stream in real-time with dim styling
  • Verify the ───────────────────── separator appears before the final answer, which is printed at full brightness
  • Test --max-tokens 512 and --max-tokens 4096 to confirm the flag is forwarded correctly
  • Verify non-reasoning models (no reasoning_content in deltas) continue to work as before

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant