Skip to content

feat(agents-server-ui): show per-response token usage in meta row#4502

Open
kevin-dp wants to merge 4 commits into
mainfrom
kevin/agent-token-usage
Open

feat(agents-server-ui): show per-response token usage in meta row#4502
kevin-dp wants to merge 4 commits into
mainfrom
kevin/agent-token-usage

Conversation

@kevin-dp

@kevin-dp kevin-dp commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a token-usage label to the agent response meta row, e.g.
Thinking · 12s · 1.2k ↑ 412 ↓ while streaming and
✓ done in 12s · 1.2k ↑ 412 ↓ · 14:18 once settled. Counter updates at
step boundaries — for a single-turn LLM call it lands once at done;
for tool-using runs it jumps as each step completes (the LLM SDK only
emits usage at end-of-step, so we can't tick smoothly between
streamed tokens — the elapsed-time ticker still ticks every second
alongside it).

Plumbing

The runtime already had the token data — pi-adapter.ts:358-359
extracts tokenInput/tokenOutput from the provider's per-step
usage payload — but the bridge silently dropped them before
persistence. This PR closes that gap and surfaces them all the way to
the UI:

  • StepValue gains optional input_tokens / output_tokens columns
    (Zod + TS). Strictly additive: events recorded before this change
    still validate (both fields optional), so no migration is needed.
  • outbound-bridge.ts:onStepEnd now persists the values it was
    already receiving from pi-adapter.ts.
  • IncludesStep / EntityTimelineStepItem surface the new fields,
    and the three .select() blocks that materialize step rows include
    them.
  • The cached agent_response section grows a
    tokens?: { input?, output? } summed across the run's steps at
    section-build time, and the fingerprintRun cache key includes
    per-step token deltas so a late-arriving onStepEnd invalidates a
    stale cached section.
  • New <TokenUsage> component in agents-server-ui with
    tabular-nums so digits don't jitter, locale-aware compact
    formatting via Intl.NumberFormat. Renders next to <ElapsedTime>
    in both the live and cached meta rows.

Test plan

  • pnpm typecheck clean in agents-runtime + agents-server-ui
  • pnpm test in agents-server-ui (66 passed)
  • pnpm test outbound-bridge use-chat entity-timeline in
    agents-runtime (74 passed)
  • Full agents-runtime test suite: my branch matches the same
    pre-existing 401 failures observed on clean main (unrelated
    permission-system breakage in the test harness, not introduced
    by this PR)
  • Manual: launch a turn that uses tools and watch the counter
    jump at each step boundary
  • Manual: pure-text turn — counter lands once at done

Notes

  • Historical responses recorded before this change have no token data
    persisted (older steps rows lack the columns). The tokens field
    is conditional on at least one step reporting a number, so those
    sections continue to render with no token row instead of "0 / 0".
  • Display format 1.2k ↑ 412 ↓ chosen for compactness in the meta
    row. Open to changing to 1.2k in / 412 out or similar if the
    arrow direction is unclear — input goes up to the model, output
    comes down.

🤖 Generated with Claude Code

Sums input/output tokens across every step of the run and renders them
next to the elapsed-time ticker (e.g. `Thinking · 12s · 1.2k ↑ 412 ↓`).
Counter updates at step boundaries — the LLM SDK only reports `usage`
at end-of-step, so within a single text stream the value stays flat;
tool-using runs see jumps as each step settles.

Token plumbing (additive, no migration):

- `StepValue` Zod + TS gains optional `input_tokens` / `output_tokens`
- `outbound-bridge.ts:onStepEnd` now persists the `tokenInput` /
  `tokenOutput` values it was already receiving but dropping
- `IncludesStep` / `EntityTimelineStepItem` and the three step
  `.select()` blocks surface the new fields
- The cached `agent_response` section gets a summed `tokens?: { input?,
  output? }`, and the section-cache fingerprint includes per-step token
  deltas so a late `onStepEnd` invalidates a stale section
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Desktop Builds

Build artifacts for commit 36ccc20.

Platform Status Artifact
macOS Apple Silicon Passed DMG
macOS Intel Passed DMG
Windows x64 Passed Installer
Linux x64 Passed AppImage / deb

Workflow run

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 49.78541% with 117 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@7709c9a). Learn more about missing BASE report.
⚠️ Report is 22 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/agents-runtime/src/entity-timeline.ts 47.82% 84 Missing ⚠️
.../agents-server-ui/src/components/AgentResponse.tsx 0.00% 17 Missing ⚠️
...ges/agents-server-ui/src/components/TokenUsage.tsx 0.00% 14 Missing ⚠️
packages/agents-runtime/src/pi-adapter.ts 93.54% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4502   +/-   ##
=======================================
  Coverage        ?   56.56%           
=======================================
  Files           ?      359           
  Lines           ?    39243           
  Branches        ?    11028           
=======================================
  Hits            ?    22198           
  Misses          ?    16974           
  Partials        ?       71           
Flag Coverage Δ
packages/agents 70.75% <ø> (?)
packages/agents-mcp 77.54% <ø> (?)
packages/agents-mobile 66.92% <ø> (?)
packages/agents-runtime 80.07% <57.42%> (?)
packages/agents-server 74.16% <ø> (?)
packages/agents-server-ui 6.20% <0.00%> (?)
packages/electric-ax 46.42% <ø> (?)
packages/experimental 87.73% <ø> (?)
packages/react-hooks 86.48% <ø> (?)
packages/start 82.83% <ø> (?)
packages/typescript-client 91.83% <ø> (?)
packages/y-electric 56.05% <ø> (?)
typescript 56.56% <49.78%> (?)
unit-tests 56.56% <49.78%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Electric Agents Mobile Build

Local mobile checks ran for commit 36ccc20.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

@samwillis samwillis left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interactive review with GPT.

Thanks for wiring this through. I traced the token usage path end-to-end:

pi-adapter message_end.usage
bridge.onStepEnd({ tokenInput, tokenOutput })
steps.update({ input_tokens, output_tokens })
→ timeline step rows
→ UI meta row.

Overall the stream write looks sound: token usage is attached to the step completion update, so it appears once at the end of a pure text generation step, and jumps at step boundaries for tool-using runs.

A couple of suggestions/questions:

  1. Can we compute the per-run total in the timeline query/view model?

Right now AgentResponseLive subscribes to run.steps and sums input_tokens / output_tokens in React, while buildAgentSection separately performs the same aggregation for materialized sections.

If TanStack DB supports this shape cleanly, I think it would be better to expose a single per-run tokens field from createEntityTimelineQuery / the includes query, e.g. via a scoped aggregate over steps for the run. Then the UI only renders run.tokens / section.tokens, and the aggregation logic lives in one layer.

Since usage only lands at step end, not token-by-token, updating the parent run/timeline row at those boundaries seems acceptable to me.

  1. Missing usage fields currently become real zeroes

In pi-adapter, when msg.usage exists, missing sides are coerced to 0:

...(usage && {
  tokenInput: usage.input ?? usage.inputTokens ?? 0,
  tokenOutput: usage.output ?? usage.outputTokens ?? 0,
}),

Now that these values are persisted and displayed, that can make an unknown side look like a real 0. If pi-ai guarantees both input and output are always present, a small regression test would be useful. Otherwise I’d preserve undefined for missing sides and only write numeric values.

  1. Test coverage

I’d like to see a targeted regression test that proves token usage reaches the steps.update event, and another around the timeline/view-model total if we move the aggregation there. That would lock down the important stream contract introduced by this PR.

@KyleAMathews

Copy link
Copy Markdown
Contributor

👍 yeah just showing total tokens for the run at the bottom makes sense to me

kevin-dp and others added 3 commits June 8, 2026 11:17
… layer

Move the input/output token summation out of `AgentResponseLive` and
`buildAgentSection` into a single `leftJoin` against a per-run aggregate
subquery (groupBy run_id, sum + count) in both `createEntityTimelineQuery`
and `createEntityIncludesQuery`. Consumers now read `run.tokens` directly
without re-summing step rows.

- `IncludesRun` and `EntityTimelineRunRow` gain `tokens?: { input?, output? }`.
- `buildIncludesRuns` (in-memory builder) computes the same shape from
  materialized steps.
- `fingerprintRun` hashes the resolved tokens instead of per-step deltas.
- `AgentResponseLive` drops the React-side step reducer; coerces
  TanStack DB's SQL-style `null` for absent sides to `undefined` so
  `TokenUsage`'s `!= null` checks stay correct.
- Adds 7 regression tests covering the live query path, in-memory
  builder, and section plumbing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-adapter

`pi-adapter`'s `message_end` handler was coercing a missing
`usage.input` / `usage.output` to `0` via `?? 0`, making an unreported
side indistinguishable from a real zero-token step in the rendered
meta row. With the per-run aggregate now landing in the query layer
(see prior commit), a fabricated `0` would also poison
`count(input_tokens)` / `count(output_tokens)`, marking absent sides
as present.

Forward `undefined` for any side that doesn't arrive as a `number`;
`onStepEnd` already conditionally writes those columns, so the
`steps` row stays null on the missing side.

Adds regression tests for both ends of the contract:
- `outbound-bridge.test.ts`: `onStepEnd` with token fields produces a
  `steps.update` event whose `input_tokens` / `output_tokens` match,
  and omits a column when the corresponding arg is undefined.
- `pi-adapter.test.ts`: a synthetic `message_end` with a `Usage`
  payload routes through to the step update; deleting a side from
  the payload omits that column instead of writing `0`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pi-ai's `Usage` splits the input side across three counters: `input`
(new uncached tokens), `cacheRead` (prompt-cache hits — the system
prompt + history once the cache is warm) and `cacheWrite` (tokens
added to the cache this turn). The adapter was reading only
`usage.input`, which on second+ turns of any cache-using provider
(Anthropic, etc.) collapses to a handful of tokens because everything
else hits the cache. The meta row was showing "3 ↑" regardless of how
large the actual prompt was.

Sum all three input-side counters into `tokenInput` so the displayed
total reflects the prompt volume the model actually saw. `inputTokens`
remains as a legacy flat-field fallback for non-pi-ai providers that
don't split the cache columns.

Adds a regression test that a `Usage` payload with all three counters
populated yields the sum on the `steps.update` event.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@netlify

netlify Bot commented Jun 8, 2026

Copy link
Copy Markdown

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 36ccc20
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/6a26912d2ee12f0008b8dbd8
😎 Deploy Preview https://deploy-preview-4502--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@kevin-dp kevin-dp requested a review from samwillis June 8, 2026 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants