Skip to content

feat: reverse streaming order so changes arrive before text#407

Open
elias-ba wants to merge 1 commit intomainfrom
tweaks-for-streaming
Open

feat: reverse streaming order so changes arrive before text#407
elias-ba wants to merge 1 commit intomainfrom
tweaks-for-streaming

Conversation

@elias-ba
Copy link
Collaborator

@elias-ba elias-ba commented Mar 9, 2026

Short Description

Reverses the streaming order in both job_chat and workflow_chat so that structured changes (code edits / workflow YAML) are sent to the client before the text explanation streams, enabling a better UX where users see results first.

Implementation Details

When the AI assistant streams a response, the text explanation ("I've updated your workflow to do X, Y, Z") appeared first, and then seconds later the actual changes landed on the canvas or code editor. Users would read about changes that hadn't happened yet.

The fix is to reverse the JSON field order via the assistant prefill so that structured data (code_edits or YAML) is generated first and text second. During streaming, the structured data is buffered silently. Once the delimiter between fields is detected, the structured data is parsed and sent to the client as a new custom SSE event called changes, and then the text explanation streams normally.

For job_chat, the raw code_edits are patches (replace/rewrite), so a new _resolve_code_edits() method applies them to the user's current code at stream time to produce the final code the client needs for diff preview. This is a best-effort version of apply_code_edits() without error correction to avoid blocking the stream.

Both services also add _unescape_json_string() to convert JSON escape sequences (\n, \") back to real characters during streaming, since Claude generates text inside a JSON string value.

This works together with a companion Lightning PR that handles the changes event on the client side.

AI Usage

  • Code generation (copilot but not intellisense)
  • Strategy / design

Reverses the JSON field order in both job_chat and workflow_chat so that
structured data (code_edits / YAML) is generated first and text explanation
second. This allows the client to apply changes to the canvas or editor
before the text explanation finishes streaming.

Key changes:

- Reverse assistant prefill: code_edits/yaml generated before text_answer/text
- Add send_changes() to StreamManager for custom SSE "changes" event
- job_chat: resolve code_edits into final code before sending changes event
- workflow_chat: send parsed YAML in changes event before text streams
- Add _unescape_json_string() to fix markdown rendering during streaming
- Update prompt examples to match reversed field order
@josephjclark
Copy link
Collaborator

This looks a bit better to me when I run in the app - although I did seem to get different results.

My big concern here is that the model generates the text before it generates the code. So we're artificially suppressing the text, delaying the time to give the user information, until the code is finished.

But surely the benefit of streaming is that we can serve content to the user a soon as it's ready?

I totally agree that that it's weird for the model to say "I've changed x to y" and the user sees that before they see the code. Really bothers me.

But if we're going to change the order, don't we have to look for a deeper fix? Have the model generate the code first and then the explanation? Maybe even over two calls?

I'd like to get @hanna-paasivirta 's take on this tomorrow.

I'm a bit nervous about _resolve_code_edits - why is this needed now? Was it not needed in prod?

On the escaping - I note that the prompt explicitly asks for the json output to be escaped. Now we're adding logic to unescape it. I feel like we're coding around in a circle there - maybe we need both steps but I'd like to give it deeper thought.

@josephjclark
Copy link
Collaborator

Chatting with Hanna

  • Happy with the escaping stuff, looks legit!
  • Happy with order reversing
  • We need to investigate resolve_code_edits

@hanna-paasivirta will investigate further

@hanna-paasivirta
Copy link
Contributor

On the code/text order: @josephjclark I initially wrongly referenced a paper – starting with code changes can be helpful in training only, as you would expect (https://proceedings.mlr.press/v267/liu25ah.html). In inference, thinking should always be first. The reason I still think it’s probably okay to switch the order of the text and the code here, is that our text field is not a proper structured reasoning step, and so it’s unlikely to affect the quality of the output much. We’ll eventually add reasoning before both of these fields in the form of Anthropic thinking blocks, or more generally as the planner agent’s actions in the multi-agent system. So the order is roughly think (extended thinking) → act (write code) → brief status text in standard coding assistants.

I referred to these on the topic of CoT
Structured Chain-of-Thought Prompting for Code Generation" — Jia Li, Ge Li, Yongmin Li, Zhi Jin (2023)
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs" — Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, et al. (2025)

We should watch out for differences in quality or increased warnings/errors in code generation. The proper fix in that case wouldn’t be to revert to the current order, but to add a thinking step in some form.

@hanna-paasivirta
Copy link
Contributor

@elias-ba What does the user see in this new implementation? Do they see a flash of code that might be unchanged/partially changed (code edits failed to apply), which then quickly changes as more edits come in (corrections applied)? My priority is to make sure that we know that the first code block is provisional, and we don’t forget to display the final corrected code that is not streamed and only returned in the final payload.

Second, if it’s ok for us to wait to resolve code edits, why is it necessary to postpone the error correction that happens in apply_code_edits? This introduces quite a bit of complexity with the preliminary version of the code handled by resolve_code_edits. Is it unacceptable to occasionally wait for the code to be corrected?

@hanna-paasivirta
Copy link
Contributor

hanna-paasivirta commented Mar 11, 2026

This order changing + streaming + split of the code into preliminary and final code is quite tricky. For reference here's the potential options I'm comparing the resolve_code_edits implementation to:

a) just call apply_code edits, even if that causes streaming to pause, so that we deal with just one version of the code (so it might look more like [codeblock, large-accumulated-textblock, textblock, textblock])
b) just call apply_code edits, but have streaming work in a thread so that we return the code whenever we're ready, but keep receiving AND sending tokens from the stream meanwhile, and these might be interleaved but the frontend can distinguish from the stream event types (so the streamed output to front end might look like [textblock, codeblock, textblock, textblock, textblock])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants