feat: reverse streaming order so changes arrive before text#407
feat: reverse streaming order so changes arrive before text#407
Conversation
Reverses the JSON field order in both job_chat and workflow_chat so that structured data (code_edits / YAML) is generated first and text explanation second. This allows the client to apply changes to the canvas or editor before the text explanation finishes streaming. Key changes: - Reverse assistant prefill: code_edits/yaml generated before text_answer/text - Add send_changes() to StreamManager for custom SSE "changes" event - job_chat: resolve code_edits into final code before sending changes event - workflow_chat: send parsed YAML in changes event before text streams - Add _unescape_json_string() to fix markdown rendering during streaming - Update prompt examples to match reversed field order
|
This looks a bit better to me when I run in the app - although I did seem to get different results. My big concern here is that the model generates the text before it generates the code. So we're artificially suppressing the text, delaying the time to give the user information, until the code is finished. But surely the benefit of streaming is that we can serve content to the user a soon as it's ready? I totally agree that that it's weird for the model to say "I've changed x to y" and the user sees that before they see the code. Really bothers me. But if we're going to change the order, don't we have to look for a deeper fix? Have the model generate the code first and then the explanation? Maybe even over two calls? I'd like to get @hanna-paasivirta 's take on this tomorrow. I'm a bit nervous about On the escaping - I note that the prompt explicitly asks for the json output to be escaped. Now we're adding logic to unescape it. I feel like we're coding around in a circle there - maybe we need both steps but I'd like to give it deeper thought. |
|
Chatting with Hanna
@hanna-paasivirta will investigate further |
|
On the code/text order: @josephjclark I initially wrongly referenced a paper – starting with code changes can be helpful in training only, as you would expect (https://proceedings.mlr.press/v267/liu25ah.html). In inference, thinking should always be first. The reason I still think it’s probably okay to switch the order of the text and the code here, is that our text field is not a proper structured reasoning step, and so it’s unlikely to affect the quality of the output much. We’ll eventually add reasoning before both of these fields in the form of Anthropic thinking blocks, or more generally as the planner agent’s actions in the multi-agent system. So the order is roughly think (extended thinking) → act (write code) → brief status text in standard coding assistants. I referred to these on the topic of CoT We should watch out for differences in quality or increased warnings/errors in code generation. The proper fix in that case wouldn’t be to revert to the current order, but to add a thinking step in some form. |
|
@elias-ba What does the user see in this new implementation? Do they see a flash of code that might be unchanged/partially changed (code edits failed to apply), which then quickly changes as more edits come in (corrections applied)? My priority is to make sure that we know that the first code block is provisional, and we don’t forget to display the final corrected code that is not streamed and only returned in the final payload. Second, if it’s ok for us to wait to resolve code edits, why is it necessary to postpone the error correction that happens in apply_code_edits? This introduces quite a bit of complexity with the preliminary version of the code handled by resolve_code_edits. Is it unacceptable to occasionally wait for the code to be corrected? |
|
This order changing + streaming + split of the code into preliminary and final code is quite tricky. For reference here's the potential options I'm comparing the a) just call |
Short Description
Reverses the streaming order in both job_chat and workflow_chat so that structured changes (code edits / workflow YAML) are sent to the client before the text explanation streams, enabling a better UX where users see results first.
Implementation Details
When the AI assistant streams a response, the text explanation ("I've updated your workflow to do X, Y, Z") appeared first, and then seconds later the actual changes landed on the canvas or code editor. Users would read about changes that hadn't happened yet.
The fix is to reverse the JSON field order via the assistant prefill so that structured data (code_edits or YAML) is generated first and text second. During streaming, the structured data is buffered silently. Once the delimiter between fields is detected, the structured data is parsed and sent to the client as a new custom SSE event called
changes, and then the text explanation streams normally.For job_chat, the raw code_edits are patches (replace/rewrite), so a new
_resolve_code_edits()method applies them to the user's current code at stream time to produce the final code the client needs for diff preview. This is a best-effort version ofapply_code_edits()without error correction to avoid blocking the stream.Both services also add
_unescape_json_string()to convert JSON escape sequences (\n, \") back to real characters during streaming, since Claude generates text inside a JSON string value.This works together with a companion Lightning PR that handles the
changesevent on the client side.AI Usage