Skip to content

enhance(daily): failure-notification slug fallback + chunked delivery (#423 §C)#492

Merged
eanzhao merged 3 commits intodevfrom
feat/2026-04-28_skill-runner-failure-fallback-and-chunked-delivery
Apr 28, 2026
Merged

enhance(daily): failure-notification slug fallback + chunked delivery (#423 §C)#492
eanzhao merged 3 commits intodevfrom
feat/2026-04-28_skill-runner-failure-fallback-and-chunked-delivery

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented Apr 28, 2026

Summary

Closes the remaining acceptance items on #423 — the §C cross-cutting safety net.
After PRs #458 (§A structured prompt) and #469 (§B streaming-edit), this PR
delivers the last two items so the issue can close cleanly:

  1. §C-1 Failure-notification cross-tenant fallback. When the primary outbound
    proxy is structurally rejected (e.g. cross-tenant 99992364), the previous
    TrySendFailureAsync retried through the same proxy and the user saw
    nothing. Now the agent captures the inbound channel-bot's NyxID slug at
    create time — by definition reachable, since the user just messaged it —
    and TrySendFailureAsync tries that slug first, falling back to the primary
    only when the captured slug also rejects (or wasn't captured / equals
    primary).

  2. §C-2 Section-boundary chunked delivery. Outputs > 30 KB used to be
    truncated at the cap with a …[truncated] marker. The new
    SkillRunnerOutputChunker.Split() paragraph-aware splitter keeps the full
    report by emitting [part k/N • continued ↑/continues ↓] chunks. chunk[0]
    flows through the existing streaming-edit sink (the in-flight message
    lands as part 1); chunks 1..N go as fresh one-shot POSTs.

Implementation

Layer Change
Proto (skill_runner.proto) SkillRunnerOutboundConfig.failure_notification_provider_slug (field 12)
Channel runtime (ChannelMetadataKeys.cs, ChannelConversationTurnRunner.cs) New InboundChannelBotProxySlug key; BuildReplyMetadata populates it from ChannelInboundEvent.NyxProviderSlug
Agent builder (AgentBuilderTool.cs) Refactored ResolveProxyServiceIdsAsync to also return the eligible slug map (no second /user-services round-trip). ResolveFailureNotificationContext decides whether to capture the inbound slug + expand allowed_service_ids to include its UserService.id
Skill runner (SkillRunnerGAgent.cs) SendOutputAsync parameterized by providerSlugOverride; TrySendFailureAsync tries failure slug first then falls back to primary; ExecuteSkillAsync invokes SkillRunnerOutputChunker.Split() and dispatches chunk[0] via sink + chunks 1..N via SendOutputAsync
Output chunker (SkillRunnerOutputChunker.cs, new) Pure-function paragraph-boundary splitter with char-boundary fallback for runaway single-paragraph inputs
Docs (docs/canon/daily-command-pipeline.md) §3, §8, §9.3, §11, §17 updated to mark §C landed and reflect the new flow

The two paths are independent and could ship separately, but they're both
small and both close the same issue, so one PR keeps the change ergonomically
scoped to "everything #423 §C asked for".

Out of scope (deliberate)

  • Catalog (UserAgentCatalogEntry) not extended. Runtime routing reads from
    actor State; the catalog is for listing/status. If a future workflow agent
    needs the same fallback exposed at the catalog reader level, add the field
    then — premature now.
  • Streaming sink does NOT mid-stream switch messages when accumulated text
    passes 30 KB.
    Truncate-and-finalize-then-chunk keeps the streaming-edit
    UX (one growing message) for the common case and only fans out to N
    messages after the LLM finishes. The issue text recommends "split on section
    boundaries, send N messages" — this is what landed. A future enhancement
    could do multi-message streaming-edit, but it's a bigger surface and not
    required for §C.

Test coverage

  • 15 new tests, 633 existing tests still pass:
    • SkillRunnerOutputChunkerTests (7) — empty, exactly-at-cap, multi-paragraph, no-paragraph hard split, content preservation, tail-only overflow, all-content-round-trip
    • SkillRunnerGAgentTests (5) — primary distinct → failure slug used; failure slug rejects → fall back; failure slug == primary → no double POST; empty failure slug (legacy) → primary path; both reject → swallowed
    • AgentBuilderToolTests (3) — captured + svc-id in allowed_service_ids; inbound == primary → empty; inbound missing from user-services → empty + creation succeeds + allowed_service_ids unchanged

Test plan

  • dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/... — 648 passed
  • dotnet test test/Aevatar.AI.Tests/... — 483 passed
  • dotnet test test/Aevatar.AI.Core.Tests/... — 68 passed
  • dotnet test test/Aevatar.Architecture.Tests/... — 105 passed
  • dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests/... — 123 passed
  • Targeted CI guards (query_projection_priming, projection_state_version, projection_state_mirror_current_state, workflow_binding_boundary, test_stability) — all pass
  • bash tools/docs/lint.sh — 32 files, 0 errors
  • Manual smoke: revoke outbound s/api-lark-bot access for a test user → trigger /run-agent → verify the failure message lands via the inbound channel-bot slug
  • Manual smoke: force a > 30 KB daily report → verify chunked delivery (streamed part 1 + N additional messages with continuation markers)

Closes #423.

🤖 Generated with Claude Code

eanzhao and others added 2 commits April 28, 2026 17:20
…#423 §C)

Closes the remaining acceptance items on #423. Two paired changes that complete
the §C cross-cutting safety net:

§C-1 — failure-notification cross-tenant fallback
  - New proto field `SkillRunnerOutboundConfig.failure_notification_provider_slug`
    captures the inbound channel-bot's NyxID slug at agent-create time, surfaced
    via new `ChannelMetadataKeys.InboundChannelBotProxySlug` (populated by
    `BuildReplyMetadata` from `ChannelInboundEvent.NyxProviderSlug`)
  - `AgentBuilderTool` resolves the inbound slug against `/user-services` and,
    when distinct from the primary slug AND eligible, both stores it on
    `OutboundConfig.FailureNotificationProviderSlug` AND appends its per-user
    UserService.id to the new API key's `allowed_service_ids` so proxy
    enforcement (#418) actually permits routing through it at runtime
  - Refactored `ResolveProxyServiceIdsAsync` to also return the eligible slug
    map so optional slug lookups don't require a second `/user-services` round
    trip and don't risk breaking creation when an optional slug is missing
  - `SkillRunnerGAgent.SendOutputAsync` parameterized by `providerSlugOverride`
    (default null = primary slug, current behavior). `TrySendFailureAsync` now
    tries the captured slug FIRST when set + distinct, falling back to the
    primary slug on rejection. Both attempts swallow exceptions —
    HandleTriggerAsync's failure-event persist must not be masked

§C-2 — section-boundary chunked delivery
  - New `SkillRunnerOutputChunker.Split()` paragraph-aware splitter (`\n\n`
    boundaries when present, char-boundary fallback for runaway single-
    paragraph inputs). Each chunk fits under `MaxLarkTextLength=30000` with
    headroom reserved for `[part k/N • continued ↑/continues ↓]` markers
  - `ExecuteSkillAsync` invokes the chunker after the stream loop. chunk[0]
    flows through the streaming-edit sink (so the in-flight message lands
    cleanly as part 1); chunks 1..N go as fresh `SendOutputAsync` POSTs.
    `≤30K` outputs return a single-element chunk list, collapsing the
    dispatch loop to the existing single-message path — no behavior change
    for the common case
  - The pre-existing in-stream `TruncateForLark` cap stays as a runtime
    safety net for runaway streams that exceed 30K mid-flight; chunking
    operates on the full final output so the truncation marker only fires
    if the stream itself overflows before finalize

Test coverage:
  - 7 new tests in `SkillRunnerOutputChunkerTests` pinning split behavior
    (empty, exact-cap, paragraph boundaries, no-paragraph fallback,
    content preservation, tail-only overflow)
  - 5 new tests in `SkillRunnerGAgentTests` pinning failure-slug routing
    (primary distinct → use failure slug; failure slug rejects → fall back
    to primary; failure slug == primary → skip duplicate attempt; legacy
    state with empty failure slug → primary path; both reject → swallow)
  - 3 new tests in `AgentBuilderToolTests` pinning capture (inbound slug
    distinct + in user-services → captured + svc-id in allowed_service_ids;
    inbound slug == primary → empty; inbound slug missing from user-services
    → empty + creation succeeds + allowed_service_ids unchanged)
  - `docs/canon/daily-command-pipeline.md` updated: §C marked landed,
    proto field added, "已知短板" turned into "已落地"

Out of scope:
  - The catalog (`UserAgentCatalogEntry`) is NOT extended with the failure-
    notification slug — runtime routing reads from actor State, and the
    catalog is for listing/status. If a future workflow agent needs the
    same fallback at the catalog reader level, add the field then.
  - The streaming sink does NOT switch to a new message mid-stream when
    accumulated text exceeds 30K. The truncate-and-finalize-then-chunk
    design keeps streaming-edit UX (one growing message) for the common
    case and only fans out to additional messages once the LLM finishes
    and the full length is known. A future enhancement could switch to
    multi-message streaming-edit, but the current design is well within
    "ship-it" budget for §C and matches the issue's recommendation
    ("split on section boundaries, send N messages")

Test plan:
  - [x] `dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/...` —
    648 passed (15 new + 633 existing regression)
  - [x] `dotnet test test/Aevatar.AI.Tests/...` — 483 passed
  - [x] `dotnet test test/Aevatar.AI.Core.Tests/...` — 68 passed
  - [x] `dotnet test test/Aevatar.Architecture.Tests/...` — 105 passed
  - [x] `dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests/...` —
    123 passed
  - [x] `bash tools/docs/lint.sh` — 32 files, 0 errors
  - [x] Targeted CI guards: query_projection_priming,
    projection_state_version, projection_state_mirror_current_state,
    workflow_binding_boundary, test_stability — all pass
  - [ ] Manual smoke: revoke the outbound `s/api-lark-bot` access for a
    test user, trigger `/run-agent`, verify the failure-notification
    message lands through the inbound channel-bot's slug instead of
    being silently dropped
  - [ ] Manual smoke: force a >30 KB daily report (e.g. user with many
    repos), verify chunked delivery — first message is the streamed-edit
    "[part 1/N]", subsequent messages arrive in order with continuation
    markers

Closes #423.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…runner-failure-fallback-and-chunked-delivery

# Conflicts:
#	agents/Aevatar.GAgents.Scheduled/SkillRunnerGAgent.cs
Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one blocking edge case in the optional failure-notification fallback path.

// Dedupe — if the inbound slug's UserService.id is already in requiredIds the
// expanded list is identical, but we still surface the slug on OutboundConfig so
// the runtime knows to use it for failure notifications.
var allowed = requiredIds.Contains(inboundId, StringComparer.Ordinal)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optional fallback can accidentally make /daily creation fail for org-shared inbound bots. /api/v1/user-services includes org-inherited rows as credential_source.type = org with allowed = true, so eligibleIdBySlug can return an org-owned UserService.id. But NyxID POST /api/v1/api-keys validates allowed_service_ids against the new key owner's own active UserService rows (user_id == key owner). If the required services are personal but the inbound channel-bot slug only resolves through an org-owned service, this append sends an org-owned id in allowed_service_ids and the key creation fails with “UserService ... not found or not owned by user”. That turns the optional failure-notification fallback into a creation blocker. Please either only capture personal rows for this optional slug, or create/scope the child key through a contract that can accept the org-owned id, and add a regression test with credential_source: { type: "org", allowed: true }.

Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Overall this is a well-structured PR — two independent features (failure-notification fallback + chunked delivery) cleanly separated with solid test coverage (15 new tests). The proto field numbering, backward-compatibility handling, and graceful degradation paths are all correct.

I found a few issues worth addressing before merge:


1. Case-sensitivity mismatch in slug comparison (should fix)

AgentBuilderTool.cs:1255ResolveFailureNotificationContext compares inboundSlug vs primarySlug with StringComparison.Ordinal (case-sensitive), but the eligibleIdBySlug dictionary is built with StringComparer.OrdinalIgnoreCase. If inbound and primary slugs differ only in casing (e.g. "API-LARK-BOT" vs "api-lark-bot"), the equality check passes through, the dictionary lookup succeeds, and the failure-notification slug gets set to what is effectively the same proxy — the exact scenario the guard is meant to prevent.

// Line 1255 — Ordinal but dictionary is OrdinalIgnoreCase
if (string.Equals(inboundSlug, primarySlug, StringComparison.Ordinal))

Should use StringComparison.OrdinalIgnoreCase to match the dictionary semantics, or align both consistently.

2. MarkerOverhead = 60 is insufficient for >=100 chunks (theoretical but worth a defensive fix)

SkillRunnerOutputChunker.cs:35 — For outputs producing >=100 chunks (each ~30K), marker text like "[part 100/100 • continues ↓]" is 31 chars. A middle chunk with both prefix and suffix would be 31 + rawContent + 31 = 30,002, exceeding MaxLarkTextLength = 30,000.

Practically this requires ~3MB output which won't happen for daily reports, but the invariant "every rendered chunk <= MaxLarkTextLength" should hold regardless. Options:

  • Compute overhead dynamically from rawChunks.Count (two-digit vs three-digit formatting width)
  • Or clamp and re-split at the render stage

3. Sink FinalizeAsync applies TruncateForLark on top of chunker output (harmless but confusing)

SkillRunnerStreamingReplySink.cs:179FinalizeAsync calls TruncateForLark which truncates at 30K. The chunker already ensures each chunk is <= 30K, so this truncation never fires for chunked output. However, it means the sink silently re-truncates if someone changes the chunker logic without realizing the sink has its own cap. Consider adding a brief comment or assertion that the sink's truncation is a safety net for chunk[0] only.

4. Missing integration test for end-to-end chunked delivery

There's no test that exercises DispatchOutputChunksAsync with multi-chunk output through the sink path — verifying chunk[0] lands via FinalizeAsync and chunks 1..N via SendOutputAsync. The chunker tests and agent tests cover their respective units, but the glue between them is untested. A single integration test producing >30K output would close this gap.

5. Hard-split test doesn't assert rendered chunk size

SkillRunnerOutputChunkerTests.Split_WithNoParagraphBoundaries_ShouldHardSplitAtBudget doesn't assert that rendered chunks (with markers) stay <= MaxLarkTextLength. The paragraph-boundary test does this check; adding it to the hard-split test would catch a future MarkerOverhead regression.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.03%. Comparing base (f6fee45) to head (7de878a).
⚠️ Report is 4 commits behind head on dev.

@@           Coverage Diff           @@
##              dev     #492   +/-   ##
=======================================
  Coverage   71.03%   71.03%           
=======================================
  Files        1230     1230           
  Lines       89062    89062           
  Branches    11654    11654           
=======================================
+ Hits        63267    63269    +2     
+ Misses      21259    21258    -1     
+ Partials     4536     4535    -1     
Flag Coverage Δ
ci 71.03% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 28, 2026

Review: §C failure-notification fallback + chunked delivery

Overall: clean design, good test coverage (15 new tests), and the two features are well-scoped to #423. One real bug, a few nits.

P1: Org-owned UserService rows can block /daily creation

Already raised by @eanzhaoeligibleIdBySlug includes org-inherited rows (credential_source.type = org), but POST /api/v1/api-keys validates allowed_service_ids against the owner's personal UserService rows. If the inbound channel-bot only resolves through an org-owned service, appending its id to allowed_service_ids will fail key creation with "not found or not owned by user". The optional fallback becomes a creation blocker.

Fix: filter eligibleIdBySlug to personal-only rows, or catch the key-creation error and retry without the inbound slug.

P2: Chunker LastIndexOf comment is misleading

SkillRunnerOutputChunker.cs:68-73 — the comment says the count argument "intentionally overshoots by 1 at the trailing edge so a \n\n anchored at (offset + contentBudget - 1) with its second \n just outside the budget is still picked up". But LastIndexOf(string, int startIndex, int count) searches the range [startIndex - count + 1, startIndex] — with count = contentBudget and startIndex = offset + contentBudget - 1, the window is [offset, offset + contentBudget - 1]. A \n\n starting at offset + contentBudget - 1 has its second char at offset + contentBudget, which is outside the window and won't be found. The "overshoot" doesn't help here.

Practical impact: if the only \n\n in the budget window straddles the boundary, we hard-split instead of splitting at the paragraph. No data loss, just slightly suboptimal chunking. Either fix the comment to match reality, or adjust the search to count = contentBudget + 1 to actually catch the straddling case.

P3: Case-sensitivity inconsistency in slug comparison

ResolveFailureNotificationContext compares inboundSlug vs primarySlug with StringComparison.Ordinal (case-sensitive), but the eligibleIdBySlug dictionary uses StringComparer.OrdinalIgnoreCase. If NyxID ever returns mixed-case slugs, the equality check could miss a match that the dictionary finds. Low risk today (slugs are lowercase), but worth a StringComparison.OrdinalIgnoreCase on the equality check for consistency.

Test quality

Tests are thorough: happy path, fallback, same-slug skip, legacy empty-slug, both-reject swallow. The SequencedHandler for multi-request assertions is a nice pattern. Chunker tests cover empty, exact-cap, paragraph boundaries, no-paragraph hard split, content preservation, and tail-only overflow. Solid.

Verdict: P1 blocks merge (can break /daily creation for org-shared bots). P2-P3 are follow-ups.

@eanzhao eanzhao merged commit dff5b4b into dev Apr 28, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enhance(daily): richer report content + progressive delivery (streaming-edit or batched)

1 participant