Skip to content

fix(tui): resolve checkpoint mismatch and optimize rendering performance#34

Merged
EngineerProjects merged 30 commits into
devfrom
fix/tui-rewrite
Jun 8, 2026
Merged

fix(tui): resolve checkpoint mismatch and optimize rendering performance#34
EngineerProjects merged 30 commits into
devfrom
fix/tui-rewrite

Conversation

@EngineerProjects

Copy link
Copy Markdown
Owner

Resolves the checkpoint mismatch issue on session load/delete and optimizes the TUI rendering performance to eliminate keystroke and scrolling latency.

Header now shows ● execute (muted) / ◈ plan (orange) / ◎ pair (lighter orange)
via ExecutionMode() instead of the binary PlanMode() check.

AddToolProgress intercepts enter/exit_pair_programming_mode alongside the
existing plan mode tools — updates pairDepth counter, suppresses tool rows.
…anel

The engine's FormatTextWithLineNumbers embeds "File:/Lines:" header and
N→ line-number prefixes in the content metadata field. Glamour cannot parse
that as markdown, producing raw symbol noise in the detail sidebar.

parseReadContent strips the header block and N→ prefixes, returning the
clean file body plus the 0-based start line. detailBody and inlinePreview
now use the clean body; code files pass startLine as offset to renderCodeBody
so line numbers still reflect actual file positions.
Deleting the active session from the sessions panel now immediately:
- clears m.activeSession and resets to the welcome screen
- clears the chat (messages, tool selections, planDepth, pairDepth)
- clears lastTurnErr / lastErr / busy so no stale error leaks into new sessions

Clear() also resets planDepth and pairDepth so mode badges start clean
on every session switch.
Previously, loading a previous session showed only "Resumed session" with
an empty chat. Now the full conversation is replayed from the stored
transcript: user messages, assistant text, thinking blocks, and completed
tool rows (with their input/result metadata for the detail sidebar).

The conversion happens in buildSessionHistory (cmd/cli/tui.go), which
pairs each ToolUseContent with its matching ToolResultContent, then sends
the result as []HistoryEntry in SessionLoadedMsg.

Also exports sdk.ToolResultContent from pkg/sdk/types.go.
buildSessionHistory now reads ToolResultContent.Metadata (already written
by buildToolResultMessages in the engine) which carries the complete TUI
metadata map: content, execution_duration_ms, lines_added, lines_removed,
exit_code, cwd, type, url, title, provider, result_count, etc.

The replay loop in model.go copies this map and injects tool_input so all
detail panel renderers (file content, diff, bash output, web results) work
exactly as during the live session.

HistoryTool.Result removed in favour of HistoryTool.Metadata; fallback to
{content: rawString} for old sessions that predate this change.
Migration 20260607_007_session_files creates session_files with:
  (session_id, file_path, operation, timestamp_unix, lines_added, lines_removed)
Two indexes: by session (for fast per-session lookup) and by path
(for cross-session "who touched this file?" queries).

Live recording: onProgress writes to session_files whenever write_file,
edit_file, or apply_patch completes during an active session.

Backfill: when LoadSession runs, if no session_files rows exist for that
session the transcript is scanned and rows are inserted retroactively,
covering all sessions created before this change.

operation values: "create" | "update" (write_file), "edit" (edit_file),
"patch" (apply_patch). file_path and line counters come from the
ToolResultContent.Metadata already stored in the transcript.
…tadata

Without tool_use_id, retrieving a diff from session_files required scanning
the entire transcript JSON. With tool_use_id stored:

  session_files.tool_use_id → session_transcript_entries.entry_json
    → ToolResultContent.Metadata["structured_patch" | "git_diff" | "content"]

The migration is updated before it ever ran (table didn't exist in live DB),
so no ALTER TABLE needed. A dedicated index on tool_use_id is added for
direct lookup by tool call.
…SW RAG backend

DB / SQLite:
- Add perf pragmas: 20 MB page cache, 128 MB mmap, WAL temp in RAM, autocheckpoint
- Run PRAGMA optimize on Close for query-planner housekeeping
- Fix UpsertSessionFile to INSERT OR IGNORE + unique partial index on tool_use_id
- Fix HasSessionFileEntry and HasNamespace to use SELECT EXISTS instead of COUNT(*)
- Fix DeleteSession to rely solely on FK CASCADE (single DELETE FROM session_metadata)
- Fix GetTeamAgents to use raw SQL SELECT DISTINCT instead of full-struct GORM scan
- Add migration 008: dedup session_files + mailbox unread/history partial indexes
- Add migration 009: FTS5 virtual table session_transcript_fts with insert/delete
  triggers that stay in sync with CASCADE deletes from session_metadata

Vector / RAG:
- Add BackendHNSW: pure-Go HNSW store (github.com/coder/hnsw), no CGO, no external service
- Per-namespace persistence: <slug>.hnsw (graph) + <slug>.meta.json (text + metadata)
- O(log n) ANN search vs previous O(n) brute-force; scores normalized to cosine similarity
- Hybrid keyword blend when HybridWeight > 0 + QueryText set
- Wire HNSW backend into CLI via buildRAGService; activates only when
  RAG_EMBEDDING_URL + RAG_EMBEDDING_MODEL env vars are present
- Add HNSWDataDir helper to runtimepath
- Add tests: upsert/search/persistence/delete and hybrid keyword ranking
- Add complete database schema doc (docs/database-schema.md)
- Replace mouse-click hint with ctrl+t keyboard hint in thinking block footer
- Replace HandleMouseDown/Up tool detail zone click with HasSelectedTool + ToggleDetails
  (tool detail pane is now keyboard-driven, not mouse-zone-click-driven)
- Update golden snapshots to match new rendering
- Update TUI roadmap: mark config isolation, credentials DB, clipboard paste as done;
  expand in-progress and upcoming sections
C1 — Session leak in task manager:
  Add committed bool + defer pattern; session.Close() is called if
  RegisterTools fails before the goroutine takes ownership.

C2 — HNSW partial write undetected:
  Replace two separate error checks with errors.Join so both saveErr
  and metaErr are always surfaced to the caller.

C3 — FTS5 migration errors silently ignored:
  Replace `_ = err` with log.Printf in both migrateSQLiteVectorFTS5
  and migrateSQLiteTranscriptFTS5; startup no longer fails but the
  operator sees a warning when hybrid search degrades to LIKE scan.

C4 — JSON metadata unmarshal silently ignored:
  Replace `_ = json.Unmarshal(...)` with explicit error logging in
  hnsw_store.go and sqlite_store.go; corrupted metadata is visible
  in logs instead of silently returning nil.

C5 — context.Background() hardcoded in sqlite_backend:
  Add dbCtx() helper returning context.WithTimeout; DeleteSession uses
  10s timeout, AppendTranscriptEntries and ReplaceTranscript use 30s.
  Full ctx propagation on the Backend interface is tracked as L-A.

M1 — Embedding dimension never validated:
  Both embedOpenAI and embedOllama now check that every returned vector
  is non-empty and that all vectors in a batch share the same dimension.

Also fix: HNSW hybrid search result order
  hnswBlendKeyword was modifying scores but not re-sorting, causing
  keyword-boosted records to be returned out of order. Add sort.Slice
  descending by score after blending. Caught by TestHNSWStore_HybridKeywordBlend.

Add docs/audit/codebase-audit-2026-06.md with full audit findings
split into NOW (fixed) and LATER (community issues L-A through L-M).
…eld aliases

Three root causes identified from runtime observation (agents completing in 20-52ms,
never making LLM calls):

1. Missing InputSchema on AgentTool.Definition():
   The 'agent' tool had no JSON Schema, forcing the LLM to guess field names from
   description text. With spawn_agent (which has a full schema) registered in the
   same registry, the LLM was cross-contaminating field names.
   Fix: add InputSchema with type enum, task, maxTurns, run_in_background, fork,
   isolation, and tools properties — matching the Description contract exactly.

2. 'agent_type' alias not handled:
   Call() only read parsedInput["type"], not parsedInput["agent_type"]. The LLM
   used "agent_type" (spawn_agent convention) causing agentType=="" → fast return
   "type is required" every time.
   Fix: accept "agent_type" as fallback alias for "type".

3. Error message for missing type hid valid values:
   "Error: type is required" gave the LLM nothing to self-correct with.
   Fix: include the full list of available agent types in the error response,
   matching the existing behavior for "unknown agent type".

Also improve wait_agent error hint when the agent_id looks like a tool_use_id
(UUID format), helping the LLM distinguish spawn_agent IDs from tool_use_ids.
Redirect TUI stdlib log output to ~/.config/nexus-cli/logs/cli.log so
errors are observable in TUI mode instead of silently discarded.

Strip orphaned tool_result blocks before sending to OpenAI-compat APIs
to prevent invalid_request_message_order errors (z-ai/GLM-4.5 etc.)
when parallel agent failures leave tool_results without a matching
assistant tool_call. Sanitizer is a no-op when no assistant tool_calls
exist in the conversation, preserving valid single-turn tool results.
Pass cumulative toolUses count through RunConfig.Callback so AsyncAgent.ToolUses
stays accurate turn-by-turn instead of remaining 0 during execution. Sync final
ToolUses from RunResult after RunAgent() completes to cover any missed updates.

Call Cleanup() in Shutdown() after all goroutines finish to release memory held
by completed/failed/cancelled agents. Removed lazy cleanup from StartAgent() since
it would break wait_agent by deleting completed agents before the LLM retrieves them.
ESC (and ctrl+c) now cancels the running agent turn immediately by
cancelling the per-submit context, stopping the API call in progress.
The footer shows "interrupting…" while waiting for the goroutine to
drain. context.Canceled errors are suppressed so no red error banner
appears after a deliberate user interrupt.

SearXNG is now configurable from the web search panel: pressing Enter
on SearXNG opens an "Instance URL" field (not masked, not a secret).
The URL is persisted to the DB under "SEARXNG_BASE_URL" and applied as
an env var at startup via loadCredsIntoConfig, so NewSearXNGProvider()
picks it up on every run. The mode selector can then be set to "searxng"
to route all web searches through the configured self-hosted instance.
Root cause: SetSize() reset detailKey to "" which forced GotoTop() on
every streaming update (chat height grows → SetSize called → detailKey=""
→ next render sees detailKey≠key → GotoTop). Removed the reset.

Rewrote renderToolDetail cache logic with three distinct cases:
- New tool selected (detailToolID changed): reset to top
- Same tool, content grew (streaming) or size changed: preserve yOffset
- Size only changed, identical content: re-layout preserving yOffset

Also fixed ctrl+o auto-switching focus to uiFocusMain when the sidebar
opens, so arrow keys scroll immediately without requiring an extra Tab
press. Closing the sidebar returns focus to the editor input.
Background sub-agents spawned via spawn_agent were running with the
parent session's turn context. When that turn ended, defer cancel()
fired and killed the still-running sub-agent's API calls and permission
prompts, producing 'permission denied: prompt failed: context canceled'.

Three changes:
- async.go runAgent: replace config.Context with agent.Ctx so the
  goroutine uses its own independent context regardless of parent state
- runner.go RunConfig: add PermissionMode field to let callers override
  the session's permission mode after creation
- spawn_agent.go: set PermissionMode=bypass so background agents auto-
  approve tools without blocking on interactive prompts that no longer
  have a valid TUI context
Sub-agents now automatically collect every file path, URL, and search
query they consult during execution. Sources are deduplicated and
attached to RunResult.Sources as []SourceRef{Type, Value}.

The parent agent receives sources in the tool_result data payload
(agent tool, wait_agent, fork mode, worktree mode). wait_agent also
appends a formatted source list at the end of its Content string so
the parent LLM sees them inline without parsing JSON.

Extracted automatically from: read_file, write_file, edit_file, glob,
grep, web_search, web_fetch, web_crawl, web_map, browser_navigate,
browser_open, wikipedia, scholarly_search, langsearch.
Sub-agents now persist their session ID in RunResult and AsyncAgent after each
run. A new resume_agent tool reopens the persisted session via Engine.OpenSession
and submits a new task into the existing conversation history, so the agent
retains full context of everything it read, fetched, and wrote previously.

Changes:
- engine.go: add OpenSession(ctx, sessionID) via optional sessionRestorer interface
- session.go: add GetSessionID() accessor
- runner.go: add SessionID to RunResult, ResumeFromSessionID to RunConfig;
  RunAgent branches on ResumeFromSessionID to restore instead of create
- async.go: add SessionID field to AsyncAgent, captured from RunResult on completion
- wait_agent.go: expose session_id in result JSON + update description
- resume_agent.go: new tool accepting session_id (or agent_id) + task;
  supports sync (blocking) and async (background) modes
- sdk/client.go: register resume_agent alongside spawn_agent
Three bugs fixed in the session browser (Ctrl+S):

1. UpdatedAt/CreatedAt not propagated — sessions always showed "—" for age.
   cmd/cli/tui.go was discarding the int64 unix timestamps from state.SessionInfo
   instead of converting them to time.Time for tui.SessionInfo.

2. Session load errors silently swallowed — when RestoreSessionState failed
   (checkpoint mismatch, compaction boundary error, etc.) the error was stored
   in m.lastErr which is never rendered, leaving the user with a blank chat and
   no feedback. Now also sets m.lastTurnErr so the status bar shows the failure.

3. No session preview — the session picker only showed an 8-char ID, age, and
   turn count with no context about what the session was about. Like Codex, the
   first user message is now extracted during SaveSessionState, stored in
   metadata.Additional["canonical_transcript"]["first_user_message"], surfaced
   in SessionInfo.Preview, and rendered below the meta line in the picker.
   Search now also matches against the preview text.
…earer

api.z.ai returns 'x-api-key header is required' on 401, meaning the endpoint
uses Anthropic-style authentication despite serving OpenAI-compat request bodies.

- Add zAiAdapter that embeds openAICompatAdapter (same /chat/completions body
  and response format) but overrides applyAuthHeaders to send x-api-key
- Update adapterForProvider to route APIProviderZAi to zAiAdapter
- Update BuildAuthHeaders in config.go to use x-api-key for ZAi
- Update the provider test to match the new auth header
DeleteSession now removes associated browser artifacts from storage
(screenshots, downloads) and cleans up in-memory plan state/files.
Session list shows the first user message line as the primary title
with ID/age/turns as secondary metadata, matching Codex's approach.
All session data now lives under sessions/{session_id}/ directly in the
app root (~/.config/nexus-cli/). Screenshots go to sessions/{id}/images/,
downloads to sessions/{id}/tools/, and plan files to sessions/{id}/plans/.

Deleting a session is now two calls: store.DeleteSession for the DB and
appdir.DeleteSessionDir (os.RemoveAll) for all physical files — no more
per-namespace artifact listing.

New appdir package centralises all path resolution for cmd/cli. Session
directories are created via EnsureSessionDir at session open/resume.
Web-scraped content moves from global artifacts/web/ into
sessions/{id}/artifacts/web/, making it session-scoped and cleaned up
automatically on session delete. Browser screenshots move to
sessions/{id}/screenshots/ (renamed from images/ to avoid ambiguity).

Adds key builders and store functions for:
- GeneratedImageKey / StoreGeneratedImageRef → sessions/{id}/artifacts/images/
- AudioKey / StoreAudioRef                  → sessions/{id}/artifacts/audio/
- WebArtifactKey / StoreWebArtifactRef       → sessions/{id}/artifacts/web/

Threads sessionID through the fetch pipeline (Fetch → fetchViaHTTP →
persistArtifact) so web fetches land in the right session directory.
Storage GC reaper no longer needed for these namespaces.
Root causes fixed:
- loadRuntimeOptions applied overrides.Model AFTER loadCredsIntoConfig,
  so the credential lookup used the wrong provider (anthropic default)
  and returned an empty API key for z-ai
- SaveProviderField called reloadClient with no model set, building a
  keyless client that raced with SetModel's correct reload
- CreateSession and LoadSession goroutines could grab w.client before
  SetModel's reloadClient goroutine finished (reloadMu added to serialise)

Also fixed:
- LangSearch API key not restored from DB on startup
- LangSearch missing from 'nexus config --search' and config summary
- pendingSubmitMsg dropped when Enter pressed before session created
- /cli binary added to .gitignore
Refactors the TUI and configuration logic to ensure immediate application of API keys and prevent accidental file path insertions during setup.

- Fixes circular dependency in loadCredsIntoConfig.
- Triggers client reload on all sensitive configuration changes in TUI.
- Normalizes Z.ai provider identifiers (zai, z-ai, z.ai) across the stack.
- Strictly disables file completions during configuration states.
- Refactors TUI chat components into specialized files for better maintainability.
@EngineerProjects EngineerProjects merged commit 85eec1c into dev Jun 8, 2026
5 checks passed
@EngineerProjects EngineerProjects deleted the fix/tui-rewrite branch June 8, 2026 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant