Keep live KV reusable when clients strip transient metadata blocks#378
Open
adv0r wants to merge 1 commit into
Open
Keep live KV reusable when clients strip transient metadata blocks#378adv0r wants to merge 1 commit into
adv0r wants to merge 1 commit into
Conversation
This was referenced Jun 10, 2026
Author
|
in this other PR I pointed out that maybe fable doesn't work with LLM-dev tasks, and there is no way to find out. heads up |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What: keep the live KV checkpoint reusable when clients strip transient metadata blocks (
<environment_details>,<system-reminder>) from historical user messages.Why: those blocks are fossilized into the live KV, so every user turn token-mismatches at the stripped span and pays a full reprefill. Related to #364.
Verified:
makeand./ds4_test --serveron Apple Silicon / Metal; three new unit tests cover the stripping helper, the remember gate, and the stripped-key-is-a-byte-prefix-of-the-next-render invariant.This is the same shape as hidden thinking — live state richer than the visible replay — so the fix reuses the existing visible-key continuation instead of adding a new mechanism: after a finished turn (final answer or tool call), remember the transcript the next request is expected to render, with transient spans stripped, keyed to the live frontier via
thinking_live. The next turn then continues from live KV and tokenizes only the new suffix. No KV rewrite, no change to what the model sees, and clients that replay the blocks verbatim never match the key and keep exact token-prefix matching. The stripped key also flows into the disk-cache text key throughkv_cache_store_current(), so recovery after restart aligns too.The known tags live in a static two-entry array (
transient_block_tags); happy to wire a flag or rework the approach if you prefer.