enhance(daily): failure-notification slug fallback + chunked delivery (#423 §C) by eanzhao · Pull Request #492 · aevatarAI/aevatar

eanzhao · 2026-04-28T09:21:04Z

Summary

Closes the remaining acceptance items on #423 — the §C cross-cutting safety net.
After PRs #458 (§A structured prompt) and #469 (§B streaming-edit), this PR
delivers the last two items so the issue can close cleanly:

§C-1 Failure-notification cross-tenant fallback. When the primary outbound
proxy is structurally rejected (e.g. cross-tenant 99992364), the previous
TrySendFailureAsync retried through the same proxy and the user saw
nothing. Now the agent captures the inbound channel-bot's NyxID slug at
create time — by definition reachable, since the user just messaged it —
and TrySendFailureAsync tries that slug first, falling back to the primary
only when the captured slug also rejects (or wasn't captured / equals
primary).
§C-2 Section-boundary chunked delivery. Outputs > 30 KB used to be
truncated at the cap with a …[truncated] marker. The new
SkillRunnerOutputChunker.Split() paragraph-aware splitter keeps the full
report by emitting [part k/N • continued ↑/continues ↓] chunks. chunk[0]
flows through the existing streaming-edit sink (the in-flight message
lands as part 1); chunks 1..N go as fresh one-shot POSTs.

Implementation

Layer	Change
Proto (`skill_runner.proto`)	`SkillRunnerOutboundConfig.failure_notification_provider_slug` (field 12)
Channel runtime (`ChannelMetadataKeys.cs`, `ChannelConversationTurnRunner.cs`)	New `InboundChannelBotProxySlug` key; `BuildReplyMetadata` populates it from `ChannelInboundEvent.NyxProviderSlug`
Agent builder (`AgentBuilderTool.cs`)	Refactored `ResolveProxyServiceIdsAsync` to also return the eligible slug map (no second `/user-services` round-trip). `ResolveFailureNotificationContext` decides whether to capture the inbound slug + expand `allowed_service_ids` to include its `UserService.id`
Skill runner (`SkillRunnerGAgent.cs`)	`SendOutputAsync` parameterized by `providerSlugOverride`; `TrySendFailureAsync` tries failure slug first then falls back to primary; `ExecuteSkillAsync` invokes `SkillRunnerOutputChunker.Split()` and dispatches chunk[0] via sink + chunks 1..N via `SendOutputAsync`
Output chunker (`SkillRunnerOutputChunker.cs`, new)	Pure-function paragraph-boundary splitter with char-boundary fallback for runaway single-paragraph inputs
Docs (`docs/canon/daily-command-pipeline.md`)	§3, §8, §9.3, §11, §17 updated to mark §C landed and reflect the new flow

The two paths are independent and could ship separately, but they're both
small and both close the same issue, so one PR keeps the change ergonomically
scoped to "everything #423 §C asked for".

Out of scope (deliberate)

Catalog (UserAgentCatalogEntry) not extended. Runtime routing reads from
actor State; the catalog is for listing/status. If a future workflow agent
needs the same fallback exposed at the catalog reader level, add the field
then — premature now.
Streaming sink does NOT mid-stream switch messages when accumulated text
passes 30 KB. Truncate-and-finalize-then-chunk keeps the streaming-edit
UX (one growing message) for the common case and only fans out to N
messages after the LLM finishes. The issue text recommends "split on section
boundaries, send N messages" — this is what landed. A future enhancement
could do multi-message streaming-edit, but it's a bigger surface and not
required for §C.

Test coverage

15 new tests, 633 existing tests still pass:
- SkillRunnerOutputChunkerTests (7) — empty, exactly-at-cap, multi-paragraph, no-paragraph hard split, content preservation, tail-only overflow, all-content-round-trip
- SkillRunnerGAgentTests (5) — primary distinct → failure slug used; failure slug rejects → fall back; failure slug == primary → no double POST; empty failure slug (legacy) → primary path; both reject → swallowed
- AgentBuilderToolTests (3) — captured + svc-id in allowed_service_ids; inbound == primary → empty; inbound missing from user-services → empty + creation succeeds + allowed_service_ids unchanged

Test plan

dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/... — 648 passed
dotnet test test/Aevatar.AI.Tests/... — 483 passed
dotnet test test/Aevatar.AI.Core.Tests/... — 68 passed
dotnet test test/Aevatar.Architecture.Tests/... — 105 passed
dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests/... — 123 passed
Targeted CI guards (query_projection_priming, projection_state_version, projection_state_mirror_current_state, workflow_binding_boundary, test_stability) — all pass
bash tools/docs/lint.sh — 32 files, 0 errors
Manual smoke: revoke outbound s/api-lark-bot access for a test user → trigger /run-agent → verify the failure message lands via the inbound channel-bot slug
Manual smoke: force a > 30 KB daily report → verify chunked delivery (streamed part 1 + N additional messages with continuation markers)

Closes #423.

🤖 Generated with Claude Code

…#423 §C) Closes the remaining acceptance items on #423. Two paired changes that complete the §C cross-cutting safety net: §C-1 — failure-notification cross-tenant fallback - New proto field `SkillRunnerOutboundConfig.failure_notification_provider_slug` captures the inbound channel-bot's NyxID slug at agent-create time, surfaced via new `ChannelMetadataKeys.InboundChannelBotProxySlug` (populated by `BuildReplyMetadata` from `ChannelInboundEvent.NyxProviderSlug`) - `AgentBuilderTool` resolves the inbound slug against `/user-services` and, when distinct from the primary slug AND eligible, both stores it on `OutboundConfig.FailureNotificationProviderSlug` AND appends its per-user UserService.id to the new API key's `allowed_service_ids` so proxy enforcement (#418) actually permits routing through it at runtime - Refactored `ResolveProxyServiceIdsAsync` to also return the eligible slug map so optional slug lookups don't require a second `/user-services` round trip and don't risk breaking creation when an optional slug is missing - `SkillRunnerGAgent.SendOutputAsync` parameterized by `providerSlugOverride` (default null = primary slug, current behavior). `TrySendFailureAsync` now tries the captured slug FIRST when set + distinct, falling back to the primary slug on rejection. Both attempts swallow exceptions — HandleTriggerAsync's failure-event persist must not be masked §C-2 — section-boundary chunked delivery - New `SkillRunnerOutputChunker.Split()` paragraph-aware splitter (`\n\n` boundaries when present, char-boundary fallback for runaway single- paragraph inputs). Each chunk fits under `MaxLarkTextLength=30000` with headroom reserved for `[part k/N • continued ↑/continues ↓]` markers - `ExecuteSkillAsync` invokes the chunker after the stream loop. chunk[0] flows through the streaming-edit sink (so the in-flight message lands cleanly as part 1); chunks 1..N go as fresh `SendOutputAsync` POSTs. `≤30K` outputs return a single-element chunk list, collapsing the dispatch loop to the existing single-message path — no behavior change for the common case - The pre-existing in-stream `TruncateForLark` cap stays as a runtime safety net for runaway streams that exceed 30K mid-flight; chunking operates on the full final output so the truncation marker only fires if the stream itself overflows before finalize Test coverage: - 7 new tests in `SkillRunnerOutputChunkerTests` pinning split behavior (empty, exact-cap, paragraph boundaries, no-paragraph fallback, content preservation, tail-only overflow) - 5 new tests in `SkillRunnerGAgentTests` pinning failure-slug routing (primary distinct → use failure slug; failure slug rejects → fall back to primary; failure slug == primary → skip duplicate attempt; legacy state with empty failure slug → primary path; both reject → swallow) - 3 new tests in `AgentBuilderToolTests` pinning capture (inbound slug distinct + in user-services → captured + svc-id in allowed_service_ids; inbound slug == primary → empty; inbound slug missing from user-services → empty + creation succeeds + allowed_service_ids unchanged) - `docs/canon/daily-command-pipeline.md` updated: §C marked landed, proto field added, "已知短板" turned into "已落地" Out of scope: - The catalog (`UserAgentCatalogEntry`) is NOT extended with the failure- notification slug — runtime routing reads from actor State, and the catalog is for listing/status. If a future workflow agent needs the same fallback at the catalog reader level, add the field then. - The streaming sink does NOT switch to a new message mid-stream when accumulated text exceeds 30K. The truncate-and-finalize-then-chunk design keeps streaming-edit UX (one growing message) for the common case and only fans out to additional messages once the LLM finishes and the full length is known. A future enhancement could switch to multi-message streaming-edit, but the current design is well within "ship-it" budget for §C and matches the issue's recommendation ("split on section boundaries, send N messages") Test plan: - [x] `dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/...` — 648 passed (15 new + 633 existing regression) - [x] `dotnet test test/Aevatar.AI.Tests/...` — 483 passed - [x] `dotnet test test/Aevatar.AI.Core.Tests/...` — 68 passed - [x] `dotnet test test/Aevatar.Architecture.Tests/...` — 105 passed - [x] `dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests/...` — 123 passed - [x] `bash tools/docs/lint.sh` — 32 files, 0 errors - [x] Targeted CI guards: query_projection_priming, projection_state_version, projection_state_mirror_current_state, workflow_binding_boundary, test_stability — all pass - [ ] Manual smoke: revoke the outbound `s/api-lark-bot` access for a test user, trigger `/run-agent`, verify the failure-notification message lands through the inbound channel-bot's slug instead of being silently dropped - [ ] Manual smoke: force a >30 KB daily report (e.g. user with many repos), verify chunked delivery — first message is the streamed-edit "[part 1/N]", subsequent messages arrive in order with continuation markers Closes #423. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…runner-failure-fallback-and-chunked-delivery # Conflicts: # agents/Aevatar.GAgents.Scheduled/SkillRunnerGAgent.cs

eanzhao

Found one blocking edge case in the optional failure-notification fallback path.

eanzhao · 2026-04-28T09:40:35Z

+        // Dedupe — if the inbound slug's UserService.id is already in requiredIds the
+        // expanded list is identical, but we still surface the slug on OutboundConfig so
+        // the runtime knows to use it for failure notifications.
+        var allowed = requiredIds.Contains(inboundId, StringComparer.Ordinal)


This optional fallback can accidentally make /daily creation fail for org-shared inbound bots. /api/v1/user-services includes org-inherited rows as credential_source.type = org with allowed = true, so eligibleIdBySlug can return an org-owned UserService.id. But NyxID POST /api/v1/api-keys validates allowed_service_ids against the new key owner's own active UserService rows (user_id == key owner). If the required services are personal but the inbound channel-bot slug only resolves through an org-owned service, this append sends an org-owned id in allowed_service_ids and the key creation fails with “UserService ... not found or not owned by user”. That turns the optional failure-notification fallback into a creation blocker. Please either only capture personal rows for this optional slug, or create/scope the child key through a contract that can accept the org-owned id, and add a regression test with credential_source: { type: "org", allowed: true }.

eanzhao

Review Summary

Overall this is a well-structured PR — two independent features (failure-notification fallback + chunked delivery) cleanly separated with solid test coverage (15 new tests). The proto field numbering, backward-compatibility handling, and graceful degradation paths are all correct.

I found a few issues worth addressing before merge:

1. Case-sensitivity mismatch in slug comparison (should fix)

AgentBuilderTool.cs:1255 — ResolveFailureNotificationContext compares inboundSlug vs primarySlug with StringComparison.Ordinal (case-sensitive), but the eligibleIdBySlug dictionary is built with StringComparer.OrdinalIgnoreCase. If inbound and primary slugs differ only in casing (e.g. "API-LARK-BOT" vs "api-lark-bot"), the equality check passes through, the dictionary lookup succeeds, and the failure-notification slug gets set to what is effectively the same proxy — the exact scenario the guard is meant to prevent.

// Line 1255 — Ordinal but dictionary is OrdinalIgnoreCase
if (string.Equals(inboundSlug, primarySlug, StringComparison.Ordinal))

Should use StringComparison.OrdinalIgnoreCase to match the dictionary semantics, or align both consistently.

2. `MarkerOverhead = 60` is insufficient for >=100 chunks (theoretical but worth a defensive fix)

SkillRunnerOutputChunker.cs:35 — For outputs producing >=100 chunks (each ~30K), marker text like "[part 100/100 • continues ↓]" is 31 chars. A middle chunk with both prefix and suffix would be 31 + rawContent + 31 = 30,002, exceeding MaxLarkTextLength = 30,000.

Practically this requires ~3MB output which won't happen for daily reports, but the invariant "every rendered chunk <= MaxLarkTextLength" should hold regardless. Options:

Compute overhead dynamically from rawChunks.Count (two-digit vs three-digit formatting width)
Or clamp and re-split at the render stage

3. Sink `FinalizeAsync` applies `TruncateForLark` on top of chunker output (harmless but confusing)

SkillRunnerStreamingReplySink.cs:179 — FinalizeAsync calls TruncateForLark which truncates at 30K. The chunker already ensures each chunk is <= 30K, so this truncation never fires for chunked output. However, it means the sink silently re-truncates if someone changes the chunker logic without realizing the sink has its own cap. Consider adding a brief comment or assertion that the sink's truncation is a safety net for chunk[0] only.

4. Missing integration test for end-to-end chunked delivery

There's no test that exercises DispatchOutputChunksAsync with multi-chunk output through the sink path — verifying chunk[0] lands via FinalizeAsync and chunks 1..N via SendOutputAsync. The chunker tests and agent tests cover their respective units, but the glue between them is untested. A single integration test producing >30K output would close this gap.

5. Hard-split test doesn't assert rendered chunk size

SkillRunnerOutputChunkerTests.Split_WithNoParagraphBoundaries_ShouldHardSplitAtBudget doesn't assert that rendered chunks (with markers) stay <= MaxLarkTextLength. The paragraph-boundary test does this check; adding it to the hard-split test would catch a future MarkerOverhead regression.

codecov · 2026-04-28T09:56:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.03%. Comparing base (f6fee45) to head (7de878a).
⚠️ Report is 4 commits behind head on dev.

@@           Coverage Diff           @@
##              dev     #492   +/-   ##
=======================================
  Coverage   71.03%   71.03%           
=======================================
  Files        1230     1230           
  Lines       89062    89062           
  Branches    11654    11654           
=======================================
+ Hits        63267    63269    +2     
+ Misses      21259    21258    -1     
+ Partials     4536     4535    -1

Flag	Coverage Δ
ci	`71.03% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

eanzhao · 2026-04-28T09:59:27Z

Review: §C failure-notification fallback + chunked delivery

Overall: clean design, good test coverage (15 new tests), and the two features are well-scoped to #423. One real bug, a few nits.

P1: Org-owned UserService rows can block `/daily` creation

Already raised by @eanzhao — eligibleIdBySlug includes org-inherited rows (credential_source.type = org), but POST /api/v1/api-keys validates allowed_service_ids against the owner's personal UserService rows. If the inbound channel-bot only resolves through an org-owned service, appending its id to allowed_service_ids will fail key creation with "not found or not owned by user". The optional fallback becomes a creation blocker.

Fix: filter eligibleIdBySlug to personal-only rows, or catch the key-creation error and retry without the inbound slug.

P2: Chunker `LastIndexOf` comment is misleading

SkillRunnerOutputChunker.cs:68-73 — the comment says the count argument "intentionally overshoots by 1 at the trailing edge so a \n\n anchored at (offset + contentBudget - 1) with its second \n just outside the budget is still picked up". But LastIndexOf(string, int startIndex, int count) searches the range [startIndex - count + 1, startIndex] — with count = contentBudget and startIndex = offset + contentBudget - 1, the window is [offset, offset + contentBudget - 1]. A \n\n starting at offset + contentBudget - 1 has its second char at offset + contentBudget, which is outside the window and won't be found. The "overshoot" doesn't help here.

Practical impact: if the only \n\n in the budget window straddles the boundary, we hard-split instead of splitting at the paragraph. No data loss, just slightly suboptimal chunking. Either fix the comment to match reality, or adjust the search to count = contentBudget + 1 to actually catch the straddling case.

P3: Case-sensitivity inconsistency in slug comparison

ResolveFailureNotificationContext compares inboundSlug vs primarySlug with StringComparison.Ordinal (case-sensitive), but the eligibleIdBySlug dictionary uses StringComparer.OrdinalIgnoreCase. If NyxID ever returns mixed-case slugs, the equality check could miss a match that the dictionary finds. Low risk today (slugs are lowercase), but worth a StringComparison.OrdinalIgnoreCase on the equality check for consistency.

Test quality

Tests are thorough: happy path, fallback, same-slug skip, legacy empty-slug, both-reject swallow. The SequencedHandler for multi-request assertions is a nice pattern. Chunker tests cover empty, exact-cap, paragraph boundaries, no-paragraph hard split, content preservation, and tail-only overflow. Solid.

Verdict: P1 blocks merge (can break /daily creation for org-shared bots). P2-P3 are follow-ups.

…-and-chunked-delivery

eanzhao and others added 2 commits April 28, 2026 17:20

Merge remote-tracking branch 'origin/dev' into feat/2026-04-28_skill-…

7cc193b

…runner-failure-fallback-and-chunked-delivery # Conflicts: # agents/Aevatar.GAgents.Scheduled/SkillRunnerGAgent.cs

eanzhao commented Apr 28, 2026

View reviewed changes

Merge branch 'dev' into feat/2026-04-28_skill-runner-failure-fallback…

7de878a

…-and-chunked-delivery

eanzhao merged commit dff5b4b into dev Apr 28, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhance(daily): failure-notification slug fallback + chunked delivery (#423 §C)#492

enhance(daily): failure-notification slug fallback + chunked delivery (#423 §C)#492
eanzhao merged 3 commits intodevfrom
feat/2026-04-28_skill-runner-failure-fallback-and-chunked-delivery

eanzhao commented Apr 28, 2026

Uh oh!

eanzhao left a comment

Uh oh!

eanzhao Apr 28, 2026

Uh oh!

eanzhao left a comment

Uh oh!

codecov Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

eanzhao commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eanzhao commented Apr 28, 2026

Summary

Implementation

Out of scope (deliberate)

Test coverage

Test plan

Uh oh!

eanzhao left a comment

Choose a reason for hiding this comment

Uh oh!

eanzhao Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

eanzhao left a comment

Choose a reason for hiding this comment

Review Summary

1. Case-sensitivity mismatch in slug comparison (should fix)

2. MarkerOverhead = 60 is insufficient for >=100 chunks (theoretical but worth a defensive fix)

3. Sink FinalizeAsync applies TruncateForLark on top of chunker output (harmless but confusing)

4. Missing integration test for end-to-end chunked delivery

5. Hard-split test doesn't assert rendered chunk size

Uh oh!

codecov Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

eanzhao commented Apr 28, 2026

Review: §C failure-notification fallback + chunked delivery

P1: Org-owned UserService rows can block /daily creation

P2: Chunker LastIndexOf comment is misleading

P3: Case-sensitivity inconsistency in slug comparison

Test quality

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

2. `MarkerOverhead = 60` is insufficient for >=100 chunks (theoretical but worth a defensive fix)

3. Sink `FinalizeAsync` applies `TruncateForLark` on top of chunker output (harmless but confusing)

codecov Bot commented Apr 28, 2026 •

edited

Loading

P1: Org-owned UserService rows can block `/daily` creation

P2: Chunker `LastIndexOf` comment is misleading