Skip to content

feat(channel): ADR-0017 + scaffolding for per-user NyxID binding#477

Open
eanzhao wants to merge 10 commits intodevfrom
docs/2026-04-28_adr-per-user-nyxid-binding
Open

feat(channel): ADR-0017 + scaffolding for per-user NyxID binding#477
eanzhao wants to merge 10 commits intodevfrom
docs/2026-04-28_adr-per-user-nyxid-binding

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented Apr 28, 2026

Summary

ADR-0017 锁定 channel bot per-user NyxID binding 走 broker 模式(NyxID 持 refresh_token,aevatar 仅持 opaque binding_id),并在本 PR 同步落地 NyxID#549 落地前可平行实现的 scaffolding。

Scope

ADR(已 lock):

实现(本 PR scaffolding 范围,不依赖 NyxID broker endpoint):

  • 新模块 Aevatar.GAgents.Channel.Identity
  • proto:ExternalSubjectRef + ExternalIdentityBindingState + 事件 / 命令
  • INyxIdCapabilityBroker 接口 + 支持类型(BindingChallenge / BindingId / CapabilityScope / CapabilityHandle / BindingRevokedException)
  • ExternalIdentityBindingGAgent actor + 状态迁移
  • IExternalIdentityBindingQueryPort + projection 反向路径
  • IProjectionReadinessPort(write-side completion 端口)
  • InMemoryCapabilityBroker(test fake)+ DI 装配

不在本 PR 范围(后续 PR):

  • /api/oauth/nyxid-callback endpoint
  • ChannelConversationTurnRunner slash-command 路由(/init / /unbind)
  • NyxIdRemoteCapabilityBroker 真实实现(等 NyxID#549 contract freeze)
  • AuthContext.external_subject = 4 字段新增(channel-runtime proto 改动单独 PR)

Test plan

  • bash tools/docs/lint.sh — ADR frontmatter 通过
  • 新模块 dotnet build + 单测 通过
  • arch test 守:ExternalIdentityBindingGAgent state 不出现 secret material;projection / log / metric 不出现 secret material
  • 后续 implementation PR 接 callback endpoint + turn runner 集成 + NyxID#549 后接 Remote broker

ADR Review 历史

  • Round 1 (056f05c):address 9 inline + path-level 共 7 点
  • Round 2 (4411515):rename file → oauth-broker.md,新增 §Actor Architecture / §Outbound Send,address 8 点
  • Round 3 (fdbb8c2):address v2 review 9 点,新增 §Bot-Owner 终止策略 / §Divergence from NyxID#549
  • Round 4 (f48e881):address Final Review 5 点,新增 §Implementation Notes
  • 历轮 resolution summaries 见 PR comments

🤖 Generated with Claude Code

Per-user binding for channel bots can be implemented entirely as a NyxID
OAuth client without any NyxID surface changes; the five endpoints proposed
in discussion #400 collapse onto existing OAuth/OIDC primitives plus RFC
8693 Token Exchange. ADR captures the trade-off between local refresh-token
storage (stop-gap) and the broker model (#375 target), accepts the local
adapter as a transitional implementation under explicit hard constraints,
and pins the migration path back to zero-secret-material grain state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c95c818a7d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +53 to +55
-> 回 Lark "{nyxid}/oauth/authorize?client_id=aevatar-channel-binding
&redirect_uri=https://aevatar/api/oauth/nyxid-callback
&state={correlation_id}&code_challenge=...&scope=openid"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return binding authorize URL through a private channel

The /init flow currently returns a reusable OAuth authorize URL containing state directly to Lark, but the ADR does not require this response to be private/ephemeral. In a group chat, another participant can open that URL, complete login with their own NyxID account, and the callback path (state -> ExternalSubjectRef) will bind that Nyx subject to the original sender, causing cross-account binding hijack. This is a security issue in any non-private channel flow unless you require private delivery or add a second verifier tied to the initiating platform user before accepting the callback.

Useful? React with 👍 / 👎.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.61%. Comparing base (6b96bb2) to head (c67fe53).
⚠️ Report is 130 commits behind head on dev.

@@            Coverage Diff             @@
##              dev     #477      +/-   ##
==========================================
+ Coverage   71.36%   71.61%   +0.25%     
==========================================
  Files        1219     1236      +17     
  Lines       88424    89573    +1149     
  Branches    11573    11713     +140     
==========================================
+ Hits        63104    64151    +1047     
- Misses      20770    20822      +52     
- Partials     4550     4600      +50     
Flag Coverage Δ
ci 71.61% <ø> (+0.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 51 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

After security analysis, reverse the earlier acceptance of a local
encrypted-refresh-token stop-gap. aevatar will instead hold only an
opaque binding_id and call NyxID's broker token endpoint each turn,
keeping grain state free of secret material. Implementation is gated
on ChronoAIProject/NyxID#549. Existing bot owner-shared mode remains
in place until the dependency lands, with no regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao eanzhao changed the title ADR-0017: per-user NyxID binding via OAuth client (zero NyxID changes) ADR-0017: per-user NyxID binding via OAuth broker (NyxID-side refresh_token) Apr 28, 2026
Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

方向 OK:broker 模式 vs. local refresh_token 的论证扎实。但有两个结构性问题在合并前必须解决,否则后续 implementation PR 会跟现有 channel runtime 撞车。

  1. Actor 重复ExternalIdentityBindingGAgent 跟现有 ChannelUserBindingGAgentagents/Aevatar.GAgents.Channel.Runtime/UserBinding/ChannelUserBindingGAgent.cs:16)是同一业务实体(channel user 的凭据绑定)。CLAUDE.md 明令禁止按技术功能拆分同一实体,必须复用现有 actor 或在 ADR 里 explicit 反驳。
  2. Identity scope 跟现有冲突:新 key (platform, tenant, external_user_id) 不含 bot;现有 actor key 含 bot。需要 explicit 决定哪个对,并写明迁移路径。

其他:PKCE verifier 在 grain state 里的处理、client_credentials 在 ADR 文字里的 OAuth 词汇误用、用户从 NyxID 侧反向撤销的一致性、nyx_subject 缓存目的、accepted status 跟外部依赖之间的状态等,逐项 inline。

@@ -0,0 +1,142 @@
---
title: "Per-User NyxID Binding via OAuth Broker"
status: accepted
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status accepted 但 implementation gated on ChronoAIProject/NyxID#549(外部 issue 还没落地)。如果 NyxID#549 协议跟本 ADR 的假设有偏差,aevatar 接的 endpoint shape 也得改。

建议:

  • 改成 proposedaccepted-pending-dependency(按本仓库现有 ADR 词汇),等 NyxID#549 contract 冻结再升 accepted
  • 或者在 ADR 里贴 NyxID#549 的契约 freeze 节点(OpenAPI link / proto 文件 hash),明确"以哪个版本的契约为准"

否则这个 ADR 实际处于浮动状态。


- 用标准 OAuth Authorization Code + PKCE 流程发起 binding(`/oauth/authorize` + `state` 承载 correlation_id + redirect_uri 浏览器跳转)
- **aevatar 不接收、不持有 user refresh_token**;binding 完成时 NyxID 返回不透明 `binding_id`,aevatar 仅持 `(external_subject_ref) → binding_id` 映射
- 每次 turn 用 client_credentials 调 NyxID `POST /oauth/bindings/{binding_id}/token` 拿短期 access_token,塞进 `AgentToolRequestContext`
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client_credentials 在 OAuth 2.0 标准里语义是 "client 以自己的身份(不代表任何 user)拿 token"。这里实际语义是 "client 凭 binding_id 代用户拿 user-scoped token",不是 client_credentials。

如果 NyxID#549 endpoint 想合规命名:

  • 走 RFC 8693 Token Exchange:grant_type=urn:ietf:params:oauth:grant-type:token-exchangesubject_token=<binding_id>subject_token_type=urn:nyxid:params:oauth:token-type:binding-id,client 用 client_secret 鉴权(HTTP Basic / client_secret_post
  • 或者明确声明这是 NyxID 自定义 broker token endpoint,body 里不出现 grant_type=client_credentials

ADR 文字按当前写法读起来像 "client_credentials = 该 endpoint 的 grant_type",跟 OAuth 标准语义冲突,会把 NyxID#549 contract 引向错误方向。建议改成 "RFC 8693 token exchange,subject_token = binding_id" 或 "NyxID 自定义 broker token endpoint,client 鉴权 = client_secret"。同样适用第 54 行。

- 用标准 OAuth Authorization Code + PKCE 流程发起 binding(`/oauth/authorize` + `state` 承载 correlation_id + redirect_uri 浏览器跳转)
- **aevatar 不接收、不持有 user refresh_token**;binding 完成时 NyxID 返回不透明 `binding_id`,aevatar 仅持 `(external_subject_ref) → binding_id` 映射
- 每次 turn 用 client_credentials 调 NyxID `POST /oauth/bindings/{binding_id}/token` 拿短期 access_token,塞进 `AgentToolRequestContext`
- 用户撤销同步 `DELETE /oauth/bindings/{binding_id}`,NyxID 是 source of truth
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

覆盖了 aevatar 主动撤销方向,反方向(用户在 NyxID UI 直接撤销,不经 aevatar)没交代:

  • aevatar (external_subject) → binding_id 映射变成悬挂指针
  • 下次 IssueShortLivedAsync 调用会被 NyxID 拒绝(401? 410? invalid_grant?)
  • 该错误码该不该触发 aevatar 自动 prune binding actor state?

至少补一段 invalidation flow:

否则 "NyxID 是 source of truth" 是单向声明,不是双向一致性保证。

/init
-> ChannelConversationTurnRunner 前置 slash-command 路由(不进 LLM)
-> aevatar 生成 state(=correlation_id) + PKCE pair,
落 BindingChallengeIssuedEvent 到 ExternalIdentityBindingGAgent
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PKCE code_verifier 是 secret material(OAuth 安全模型靠它对抗 code interception,RFC 7636)。BindingChallengeIssuedEvent 落到 actor state 意味着 verifier 进了 grain state / event store。

但 Storage Boundary 表(line 61-67)只列了 User refresh_token: ✗ never,没列 PKCE verifier。这跟 "aevatar grain state / projection / log / metric span attribute 不出现 secret material"(line 131)的笼统宣称在表面上有张力——verifier 是 short-lived 但仍是 secret material。

两种处理方式:

  1. 用 stateless signed cookie / state token 在浏览器侧 round-trip verifier(OAuth 库标准做法),grain state 里只留 challenge 元信息(correlation_id, expires_at, external_subject_ref),不留 verifier
  2. 保留落 actor state,但 ADR 把 Storage Boundary 表改成区分 long-lived vs. short-lived(<5min) secret material,明确 PKCE verifier 属于 short-lived 一类,并给出 challenge actor 的 GC 策略

倾向 1,跟 "binding_id = 唯一 opaque pointer,grain state zero secret material" 的叙事更一致。无论选哪个,arch test 守 grain state 的扫描规则需要一起定义。


| 数据 | aevatar grain state | NyxID |
|---|---|---|
| `(platform, tenant, external_user_id) → binding_id` | ✓ | |
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identity scope 跟现有 actor 冲突

这里 key 是 (platform, tenant, external_user_id)不含 bot_instance_id

但现有 ChannelUserBindingGAgentagents/Aevatar.GAgents.Channel.Runtime/UserBinding/ChannelUserBindingGAgent.cs:12 的 docstring 明确写)keying 是 {bot_instance_id}:{channel}:{sender_canonical_id}含 bot,state 也存 bot / channel / sender_canonical_id / credential_ref

两个识别尺度直接打架,ADR 没意识到。必须 explicit 决定:

A. per-(platform, user):一个 NyxID identity 跟一个 Lark user 一一对应,跟用户在哪个 bot 里讲话无关。语义最干净(NyxID identity 不属于某个 bot),但要废弃现有 per-bot key,迁移现有数据。

B. per-(platform, user, bot):保留现有 keying,binding 是 (bot, lark user) 维度。语义偏弱(同一 lark user 在 botA / botB 可绑不同 NyxID 账号?产品上不合理),但兼容现状。

倾向 A(NyxID 视角下一个 lark user 就是一个 user,跟 bot 无关;现有 actor 的 bot 字段更像是早期 RFC §5.2b 拆分时的 over-fit)。但这个决定影响 actor 模型、迁移路径、群聊 UX,必须在 ADR 里 explicit 选定,不能新写一个 keying 跟现有冲突还不解释。

| 数据 | aevatar grain state | NyxID |
|---|---|---|
| `(platform, tenant, external_user_id) → binding_id` | ✓ | |
| `nyx_subject`(opaque `sub` claim) | ✓ 缓存以加速 resolve | ✓ source of truth |
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nyx_subject 标 "✓ 缓存以加速 resolve" —— 加速什么 resolve 没说。

如果只是为了 IssueShortLivedAsync,链路其实是 external_subject → binding_id → NyxID 调用,nyx_subject 不在调用链上;如果是给 ConversationGAgent scope 用、或者给 audit log 用,应该写明用途。

如果没有清晰用途,建议直接不缓存(少一份要保持一致的状态,binding_id 已是 opaque 唯一指针)。如果有用途,写在表里。

NyxID #549 之前可平行落地的部分(本身不依赖 broker endpoint):

- `INyxIdCapabilityBroker` 接口 + proto
- `ExternalIdentityBindingGAgent` + projection + `IExternalIdentityBindingQueryPort`(state 仅存 `binding_id` + `nyx_subject`,均为 opaque)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actor 重复,违反 CLAUDE.md "Actor 即业务实体"

现有 agents/Aevatar.GAgents.Channel.Runtime/UserBinding/ChannelUserBindingGAgent.cs:16 已经是同一个业务实体(per-bot-per-user 的 channel user 凭据绑定):

  • keying {bot_instance_id}:{channel}:{sender_canonical_id}
  • state: bot, channel, sender_canonical_id, credential_ref, preferences, created_at, updated_at
  • commands/events: BindUserCredentialCommand / UserCredentialBoundEvent / UnbindUserCredentialCommand

新引入 ExternalIdentityBindingGAgent 把"同一 channel user 的 NyxID binding"放到第二个 actor(甚至第二个模块 Aevatar.GAgents.Channel.Identity —— 见 line 126)里,违反 CLAUDE.md:

Actor 即业务实体:一个 actor = 一个业务实体(数据与方法同住);禁止按技术功能(读/写/投影)拆分同一业务实体为多个 actor。

binding_id 在语义上就是一种 credential_ref(不透明的 user 凭据指针),不需要再起一个 actor。建议:

  • 复用 ChannelUserBindingGAgent,把 credential_ref 解释/重命名为 binding_id,或者新增 typed sub-message nyx_binding { binding_id, nyx_subject } 进现有 state
  • BindingChallengeIssuedEvent / ExternalIdentityBoundEvent 接到同一 actor 的事件流
  • ADR 应明确"扩展现有 ChannelUserBindingGAgent"而不是"新增并列 actor + 并列模块"

如果坚持新 actor,必须在 ADR 里 explicit 论证为什么二者不是同一业务实体(比如"binding 跟 bot 解耦,不是同一 keying"——但那触发上面 line 63 的 identity scope 问题,得一起讲清楚)。

Pick the explicit-justification branch for the ExternalIdentityBindingGAgent
vs existing ChannelUserBindingGAgent overlap: per-(platform, user) NyxID
identity and per-(bot, channel, sender) bot-scoped user state are different
business entities at different scopes, so two actors are kept and the
existing credential_ref field is marked redundant for migration by the
implementation PR.

Restate per-turn token issuance as RFC 8693 token-exchange with
subject_token=binding_id (was: client_credentials, semantically wrong for
acting on behalf of a bound user). Move PKCE code_verifier out of
BindingChallengeIssuedEvent into a stateless HMAC-signed state token so
nothing secret-material-shaped enters grain state. Add the reverse-direction
revoke flow (NyxID-side revoke surfaces as invalid_grant on next
token-exchange; aevatar fires unbind event and re-prompts /init).

Mandate private-channel delivery for the /init OAuth URL to defeat the
state-hijack attack flagged in PR review (group recipient could otherwise
complete login with their own NyxID account and bind sender to a foreign
identity). Drop unused nyx_subject cache from grain state. Separate
aevatar service-level secrets (state HMAC key, client_secret) as
infrastructure secret outside the #375 user-secret invariant.

status: accepted -> proposed until NyxID#549 contract is frozen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 28, 2026

ADR-0017 Review

Overall direction is solid — broker model with zero long-lived secret material in aevatar grain state is the correct call. The threat model comparison table is well-argued.

Found several gaps that should be addressed before accepting:


1. 文件名与标题术语不一致

文件名 0017-per-user-nyxid-binding-via-oauth-**client**.md 但标题写的是 OAuth **Broker**。ADR 决策是 broker 模式,文件名应该对齐。建议改为 0017-per-user-nyxid-binding-via-oauth-broker.md


2. ExternalIdentityBindingGAgent 缺 grain identity 方案

ADR 引入了 ExternalIdentityBindingGAgent 但没有说明 grain 的 identity scheme。Storage boundary 表写了 (platform, tenant, external_user_id) → binding_id 映射,但这个三元组是 grain的 key?还是 grain 内的一个索引?一个 grain 实例持一条 binding 还是多条?

按 AGENTS.md 的要求:actorId 对调用方是不透明地址,但 identity scheme 仍需定义(影响粒度、并发、partition)。建议在 Decision 或 Consequences 中明确:

  • 一个 grain 对应一个 (platform, tenant, external_user_id),还是
  • 一个 grain 对应一个 tenant 下所有 binding(内含字典),还是
  • 其他 scheme

3. nyx_subject 缓存无失效策略

Storage boundary 表说 aevatar 缓存 nyx_subject 以加速 resolve,但 ADR 没有讨论缓存失效。NyxID 是 source of truth,但 aevatar 缓存何时刷新?场景举例:

  • 用户在 NyxID 侧做了 account merge / sub 变更
  • binding 被 NyxID 侧 revoke(用户从 NyxID UI 撤销)

建议至少标注为 known gap,或定义一个 TTL / refresh-on-401 策略。


4. Bot owner 模式共存逻辑缺失

ADR 多次说"现有 bot owner-shared 模式继续运行(不 regression)",但没有描述共存逻辑:

  • ResolveBindingAsync(externalSubject) 返回 null 时,runner 是回落到 bot owner token?
  • 还是对未绑定用户强制 /init?
  • 群聊场景"未绑定 sender 强制 /init,不回落到 bot owner"是否意味着 1:1 私聊的未绑定用户可以回落?

这直接影响 ChannelConversationTurnRunner 的流程分支,应该在 ADR 中明确。


5. 初始 access_token 处理未说明

Callback 流程中 aevatar 用 code 换到 { access_token, binding_id }。ADR 没有说明这个初始 access_token 的处理:

  • 是否丢弃?如果是,/oauth/token 请求是否可以优化为只返回 binding_id
  • 还是用于首次 turn?如果缓存,缓存在哪?

虽然只是初始流程的小细节,但涉及与 NyxID #549 的契约设计(是否需要在 broker_binding scope 下返回 access_token),建议明确。


6. externalSubject 参数应强类型

INyxIdCapabilityBroker 接口的四个方法都接受 externalSubject 作为参数。按 AGENTS.md 核心语义强类型,这应该是一个 typed value object(包含 platform, tenant, external_user_id),而不是 string 或 generic bag。建议在接口定义中标注为 ExternalSubjectRef 类型。


7. 重复 /init 的幂等语义未定义

已绑定用户再次 /init 会发生什么?

  • 创建新 binding,旧 binding 自动 revoke?(binding_id 变更)
  • 返回已有 binding?(幂等)
  • 报错?

这影响 BindingChallengeIssuedEvent 的设计(是否需要检查已有 binding)。


8. scope 命名与 NyxID #549 不一致

ADR 流程图写 scope=openid+broker_binding,但 NyxID #549 issue 写的是 urn:nyxid:scope:broker_binding。两者应对齐,否则实现时会混淆。


9. Projection readiness / query-time priming

ResolveBindingAsync 查 projection(纯本地)。按 AGENTS.md 不得 query-time priming

  • binding 完成后 projection 的物化时序如何保证?
  • 用户刚完成 /init 回调后立即发消息(turn),projection 是否已 ready?

建议在 Dependencies 的"平行落地"部分补充 projection readiness 的保证机制(如 callback handler 同步等 projection、或 turn 路径容忍短暂的 resolve miss 并回落)。


小结

ADR 的核心决策(broker > local、zero secret material、gated on NyxID #549)是正确的。上述问题大多是实现前应该补齐的细节,其中 #2 (grain identity)#4 (共存逻辑)#7 (幂等语义) 对后续 implementation PR 影响最大,建议优先明确。

Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few observations beyond eanzhao's review:

1. Local commit 056f05c5 not pushed

git log --oneline -3 shows 056f05c5 Address ADR-0017 review feedback sitting locally, containing fixes for the inline comments (status→proposed, client_credentials→token exchange, PKCE stateless state_token, nyx_subject cache removal, revocation flow). Push before merging.

2. Identity scope + actor split still has tension

The ADR argues ExternalIdentityBindingGAgent (key=platform/tenant/external_user_id) and ChannelUserBindingGAgent (key=bot/channel/sender) are "different scope entities". But binding_id semantically IS a credential pointer — same category as credential_ref, which the ADR itself plans to deprecate because the new actor subsumes it. Alternative: extend ChannelUserBindingState with a nyx_binding sub-message. The ADR should at minimum add a dedicated section explaining why "two actors" > "one extended actor" and give a concrete deprecation/migration window.

3. AuthContext.user_credential_ref not addressed

channel_contracts.proto AuthContext.field(2) carries user_credential_ref for outbound "send on behalf of user" semantics. How does this interact with the broker cap_handle? Does the outbound adapter call IssueShortLivedAsync?

4. status accepted with gated dependency

Status is accepted while implementation is gated on NyxID#549 (unresolved external dependency). Should be proposed (as already fixed locally) or NyxID#549 OpenAPI contract should be pinned.

- Rename file to oauth-broker.md to match the title.
- Spell out actor architecture in a dedicated section: justify
  ExternalIdentityBindingGAgent (platform-level identity entity) versus
  the existing per-bot ChannelUserBindingGAgent (per-bot preferences),
  and pin a deprecation window for credential_ref.
- Add an Outbound Send section covering AuthContext.user_credential_ref
  semantics under broker mode and the proto deprecation path.
- Decision section: forbid bot-owner fallback for unbound senders in
  both 1:1 and group chats, define /init idempotency, describe initial
  access_token disposal in the callback handler.
- Strong-type INyxIdCapabilityBroker parameters as ExternalSubjectRef.
- Add Projection Readiness subsection: write-side projection preflight
  in the callback handler; turn path must not query-time prime.
- Align scope name to urn:nyxid:scope:broker_binding so it tracks the
  NyxID#549 contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 28, 2026

ADR-0017 Review Resolution Summary

逐条 map 到落地的 commit / 章节,方便 review pass 时勾掉。

Round 1 — commit 056f05c5

Inline:

  • L3 status accepted → 改 proposed,Dependencies 段说明何时升 accepted
  • L30 client_credentials 词汇误用 → 流程图改 RFC 8693 token-exchange + subject_token=<binding_id> + subject_token_type=urn:nyxid:params:oauth:token-type:binding-id
  • L31 NyxID-side revoke → Decision bullet invalid_grant 事件化撤销 + turn 路径 401 处理
  • L41 PKCE verifier 入 grain → state_token = HMAC(service_key, {…, pkce_verifier, exp ≤5min}) stateless 方案;Storage Boundary 表区分 short-lived vs grain-state
  • L64 nyx_subject 缓存目的 → 直接不缓存(Storage Boundary 标 ✗)

Path-level (codex bot):

  • OAuth state hijack → Decision bullet "OAuth authorize URL 只通过私域回传(Lark DM),不在群聊明文返回;无 DM 能力的平台不接入 broker 模式"

Round 2 — commit 44115157

文件重命名: 0017-per-user-nyxid-binding-via-oauth-client.md0017-per-user-nyxid-binding-via-oauth-broker.md(对齐标题)

L113 / Review #2 point 2 — actor 重复:
→ 新增 §Actor Architecture,含两 actor 对比表(key / scope / state / lifecycle / 事实源)、CLAUDE.md "Actor = 业务实体" 适用边界论证(不同 keying 域不同业务事实 ≠ 拆分同一实体)、具体产品语义论据(同一 Lark user 多 bot 期望同一 NyxID identity / 偏好可 per-bot)、credential_ref deprecation window(标 deprecated → 下个 channel-runtime proto break release 删除)

Review #2 point 3 — AuthContext.user_credential_ref outbound:
→ 新增 §Outbound Send: AuthContext × Broker。在线协议保留 user_credential_ref(语义改为序列化的 ExternalSubjectRef),outbound adapter 每次调 IssueShortLivedAsync 不缓存;长期演进加 typed ExternalSubjectRef external_subject = 4 字段,旧字段 deprecated

Public 9-point review:

# 落点
1 文件名 rename 完成
2 grain identity scheme §Actor Architecture key 列明确 (platform, tenant, external_user_id) = ExternalSubjectRef
3 nyx_subject 缓存 Storage Boundary 标 ✗,不缓存
4 bot owner 共存 Decision bullet:未绑定 sender 1:1 与群聊一律强制 /init,bot owner 不享默认身份特权;Consequences 同步说明
5 初始 access_token Decision 流程图 + Storage Boundary 表:callback handler 内一次性使用(可选 /oauth/userinfo 调用做展示文案),handler 退出前丢弃,永不持久化
6 externalSubject typed §INyxIdCapabilityBroker 接口签名全部改 ExternalSubjectRef(proto-typed value object)
7 /init 幂等 Decision bullet:已绑定 sender 重复 /init 回复"先 /unbind/init 切账号",不发新 OAuth
8 scope 命名 流程图 / Decision / Dependencies 全部改 urn:nyxid:scope:broker_binding;契约 token 名 pin 到 NyxID#549 freeze
9 projection readiness 新增 §Dependencies → Projection Readiness:callback handler 写侧预挂接 projection 同步等水位,turn 路径禁 ES replay / state mirror / priming

如果哪条论证不满意请直接戳,可以再迭代;否则可以把对应 conversation 标 resolved。

@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 28, 2026

ADR-0017 v2 Review(修复确认 + 新发现)

上一轮 9 个问题全部已修复,质量提升明显。逐条确认 + 新发现如下:


原始 9 点 — 全部 resolved ✅

# 问题 状态
1 文件名 oauth-clientoauth-broker ✅ 已修
2 Grain identity scheme ✅ §Actor Architecture 明确 key=ExternalSubjectRef,per-grain-one-binding
3 nyx_subject 缓存失效 ✅ 改为不缓存("无明确用途"),直接消除问题
4 Bot owner 共存逻辑 ✅ 明确"未绑定一律 /init,不回落 bot owner"
5 初始 access_token 处理 ✅ 明确一次性用 /oauth/userinfo,永不持久化
6 externalSubject 强类型 ✅ 明确 typed ExternalSubjectRef (proto-typed)
7 /init 幂等 ✅ 已绑定→回复已绑定信息,/unbind+/init 才能切换
8 scope 命名对齐 ✅ 统一 urn:nyxid:scope:broker_binding
9 Projection readiness ✅ §Projection Readiness 写侧预挂接 + 禁 query-time priming

新发现

A. Token Exchange 方案与 NyxID #549 提案存在显著分歧

ADR 现在用 RFC 8693 POST /oauth/tokengrant_type=urn:ietf:params:oauth:grant-type:token-exchange, subject_token=<binding_id>, subject_token_type=urn:nyxid:params:oauth:token-type:binding-id)。

但 NyxID #549 当前提案是 POST /oauth/bindings/{binding_id}/tokenclient_credentials 鉴权)。

这是不同的端点 + 不同的 grant_type + 不同的鉴权方式。NyxID 侧的 token_exchange_service.rs 当前只支持 subject_token_type=access_token(L43),binding-id type 是 net-new。

虽然 ADR 的方案更标准(复用现有 RFC 8693 框架而非另起新端点),但这是与 #549 的重大偏差。建议:

  1. 在 ADR Dependencies 或单独段落中显式标注"本方案与 #549 原提案的分歧",说明选择理由(复用现有 token exchange 框架 vs 新端点)
  2. 同步更新 #549 issue 或开新讨论帖对齐,否则两侧各自按自己理解实现会冲突

B. /unbind 行为未定义

ADR 多处引用 /unbind(slash-command 路由、幂等切换的前置步骤),但没有定义其行为:

  • 是否调 RevokeBindingAsyncDELETE /oauth/bindings/{binding_id},同时撤销 NyxID 侧)?
  • 还是仅本地标记 revoked?
  • /unbind 后旧 binding_id 在 NyxID 侧是 revoked 还是仍 active?
  • ExternalIdentityBindingGAgent 撤销事件名是什么(ExternalIdentityUnboundEventExternalIdentityBindingRevokedEvent?)

建议补一个 /unbind 的简短流程描述,至少明确 NyxID 侧是否同步 revoke。

C. "写侧预挂接 projection" 的边界说明

§Projection Readiness 描述的 "callback handler 同步等 projection 水位达成" 在 write-side completion path 上是合理的(AGENTS.md 禁的是 query-time priming),但 ADR 没有显式区分这两者。建议加一句:

此等待发生在 write-side callback handler(OAuth redirect 处理),不在 query 路径上;AGENTS.md 禁止的 query-time priming / ES replay 不适用于此场景。

这样可以避免 future reader 误解为 "在 query 路径上等 projection"。


总结

核心设计无问题,建议处理优先级:A(与 #549 对齐)> B(/unbind 定义)> C(边界注释)。A 如果不同步,后续 implementation PR 两侧会打架。

Copy link
Copy Markdown
Contributor Author

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 把上一轮 7+1 全补到了,还顺手抓到了我漏的几个点(AuthContext.user_credential_ref/init 幂等、Projection Readiness 写侧预挂接、强类型 ExternalSubjectRef、scope URN 化)。新引入的 §Outbound Send / §Actor Architecture / §Projection Readiness 三段把架构说得很清楚。

但 §Outbound Send 撞到 CLAUDE.md 一条红线(自定义字符串格式 + compat shim),另两处再调一下:

  1. 必须改user_credential_ref 重载承载序列化 ExternalSubjectRef(line 121)—— 同时违反 "统一 Protobuf,禁止自定义字符串格式" 和 "删除优先,不留 compat shim"
  2. 必须改:bot-owner 模式硬切(line 195)—— credential_ref 字段有 deprecation window,但 bot-owner 用户体验本身没有;切上线那一刻全员 sender 断流
  3. 建议改:callback 阶段调 /oauth/userinfo 拿 sub claim(line 61)—— sub 在同一兑换返回的 id_token 里就有,多余的 round-trip

小问题(不阻断):projection 写侧等待 timeout=3s 偏紧 + UX 文案易被误读为绑定失败;state-token HMAC key rotation 需要 kid;ExternalSubjectRef 应该住在 Channel.Abstractions 而非 Channel.Identity,否则会形成反向依赖。

(grant_type=authorization_code, code, code_verifier, client_secret)
-> { access_token, binding_id } (broker_binding scope 下不返 refresh_token)
初始 access_token 处理:可选地一次性用于调 /oauth/userinfo 拿 sub claim
做"已绑定 <name>"展示文案;**永不持久化、永不复用**
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sub claim 在同一次 token 兑换返回的 id_token 里就有(OIDC 标准),不需要再调 /oauth/userinfo 一跳。只有展示 name/email 才需要 userinfo。

建议改成:

可选地从同一兑换返回的 id_token 解码 sub/name claim 做"已绑定 "展示;不调 userinfo,不持久化 token

少一次 NyxID round-trip,且语义跟 OIDC 标准对齐。


`IChannelOutboundPort.ContinueConversationAsync(... AuthContext auth ...)` 在 `OnBehalfOfUser` 模式下用 `AuthContext.user_credential_ref`(`agents/Aevatar.GAgents.Channel.Abstractions/protos/channel_contracts.proto:138`)选择代用户身份。Broker 模式下:

- 在线协议保留 `user_credential_ref string = 2`,语义改为承载序列化的 `ExternalSubjectRef`(form: `lark:tenant_X:user_Y`),由 outbound adapter 反序列化
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

双重违反 CLAUDE.md

在线协议保留 user_credential_ref string = 2,语义改为承载序列化的 ExternalSubjectRef
(form: `lark:tenant_X:user_Y`)

两个独立红线:

  1. 统一 Protobuf:CLAUDE.md "序列化" 段写明 "State、领域事件、命令、回调载荷、快照、缓存载荷、跨 Actor/跨节点内部传输对象全部使用 Protobuf禁止 JSON/XML/自定义字符串格式"。lark:tenant_X:user_Y 这种带分隔符 + 含解析约定的字符串就是自定义字符串格式(外部协议除外,但 AuthContext 是仓库内部跨层契约)。
  2. 删除优先:CLAUDE.md "删除优先:空转发、重复抽象、无业务价值代码直接删除,不保留兼容空壳" + "Don't use feature flags or backwards-compatibility shims when you can just change the code"。把旧 string field 重载承载新语义是典型 compat shim。

且 line 123 已经写了"长期演进:加 typed ExternalSubjectRef external_subject = 4"——意思是先重载、后加 typed。但 broker 路径整体 gated on NyxID#549,根本不存在必须靠 string 重载兜过的过渡期,可以一次性加 typed external_subject = 4,broker outbound 只读 typed 字段,旧 user_credential_ref 留给非 broker 代码继续使用按既有语义自然 deprecate。中间不需要重载期。

改法:删掉 line 121 这一段;line 123 的 typed 字段直接作为 broker 模式的唯一 outbound 身份字段。

- 新增 OAuth callback endpoint `/api/oauth/nyxid-callback`(标准 OAuth client redirect 处理,不是 webhook),含写侧预挂接 projection 等待
- `ChannelConversationTurnRunner.RunInboundAsync` 开头加 slash-command 前置路由(`/init`、`/unbind`),`/init` 幂等
- `BuildReplyMetadata` 改成 `ResolveAsync` + `IssueShortLivedAsync`;metadata key 从 `nyxid.access_token` 改为 `nyxid.capability_handle`(诚实表达"短期、scoped、可撤销")
- 未绑定 sender(无论 1:1 还是群聊)统一强制 `/init`,不回落 bot owner;现有 bot owner-shared 模式在 implementation PR 切上线那一刻终止,bot owner 失去"默认用户身份"特权
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bot-owner 模式硬切,没有迁移计划

现有 bot owner-shared 模式在 implementation PR 切上线那一刻终止,bot owner 失去"默认用户身份"特权

§Actor Architecture 给了 credential_ref 字段 deprecation window("broker 模式上线 + 一个有 channel-runtime proto break 的 release 后"),但没给 bot-owner 用户体验本身。切上线那一刻所有现有 Lark sender 在 bot 里都拿不到响应,必须各自 DM 自己 /init 后才恢复,是产品断崖。

至少要在 ADR 里 explicit 一句迁移策略。可选方案(产品决策,ADR 只锁框架):

  • A. 双轨期:新 bot 默认 broker,旧 bot 保持 owner-shared;新增 bot 设置开关,bot owner 自己挑迁移时机
  • B. 单轨硬切 + 通知期:broker 上线前 N 天起,未绑定 sender 收到的 reply 加引导话术("X 月 X 日起需 /init"),到期硬切
  • C. 单轨硬切(当前文本):明确接受首日所有 sender 一次性走 /init,并写明 comm 责任在 bot owner

ADR 不需要选哪个,但要 explicit 说明这是产品决策点 + 留 placeholder,否则 implementation PR 会被迫现做。

@@ -0,0 +1,142 @@
---
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

§Projection Readiness 写侧预挂接 projection 机制未指定。在当前 turn 内同步等 projection 追上 actor committed version 需显式端口(如 IProjectionReadinessPort)。建议在 Consequences 列为新增接口。

```

## Storage Boundary

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

初始 access_token "可选地" 调 /oauth/userinfo 获取展示名——"可选" 造成 UX 路径不一致:有些 binding 完成页显示 display name,有些不显示。建议 ADR 明确二选一(始终调 or 完全不调)。

NyxID #549 之前可平行落地的部分(本身不依赖 broker endpoint):

- `INyxIdCapabilityBroker` 接口 + proto
- `ExternalIdentityBindingGAgent` + projection + `IExternalIdentityBindingQueryPort`(state 仅存 `binding_id` + `nyx_subject`,均为 opaque)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

弃用窗口间 credential_ref(旧) 与 binding_id(新) 可能共存。"不读不写" 应补充 fallback 顺序:查询优先 binding_id,未命中再 fallback credential_ref。否则 migration 中间态行为未定义。

- Decide id_token-decoded sub for the bound-display string instead of
  hitting /oauth/userinfo: the OIDC token response already carries sub,
  so the extra round-trip is unnecessary. Initial access_token is
  unconditionally discarded by the callback handler.
- Define /unbind: slash-command -> RevokeBindingAsync ->
  DELETE /oauth/bindings/{binding_id} -> revoked event. NyxID is the
  source of truth, so a NyxID failure aborts the local revoke instead
  of risking source-of-truth divergence.
- Drop the string overload of AuthContext.user_credential_ref; broker
  outbound reads only the new typed external_subject field. Justified
  by CLAUDE.md "unified protobuf, no compat shim".
- Add credential_ref deprecation-window rule: turn path reads only
  binding_id, never falls back to credential_ref.
- Spell out the bot-owner shutdown options (A/B/C) and require the
  implementation PR to pick one for the release runbook.
- Note IProjectionReadinessPort as a write-side port and clarify the
  query-side boundary in §Projection Readiness; turn path remains
  forbidden from priming.
- Add §Divergence from NyxID#549: pick RFC 8693 token-exchange over
  the dedicated client_credentials endpoint in the original sketch
  and link to the alignment comment to be posted on NyxID#549.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 28, 2026

ADR-0017 v2 Review Resolution Summary

Round 3 — commit fdbb8c2b,v2 review 6 inline + 3 顶层 全部 address。

Source 内容 落点
L61 /oauth/userinfo 一跳冗余,sub claim 在 id_token 已自带 Decision 流程图 + Storage Boundary 表统一改 id_token 解码;handler 退出前直接丢弃 token
L121 user_credential_ref 字符串重载承载 ExternalSubjectRef 双重违反 CLAUDE.md(Protobuf / 删除优先) §Outbound Send 删除重载段,broker outbound 读 typed external_subject = 4,无中间过渡期
L195 bot-owner 硬切无迁移计划 新增 §Bot-Owner-Shared 模式终止策略,explicit 列出 A/B/C 三选项,implementation PR 必须选一个并入 release runbook
08:54:54 path-level §Projection Readiness 写侧机制需显式端口 新增 IProjectionReadinessPort.WaitForEventAsync,§Projection Readiness 描述契约,Consequences 列入
08:55:13 path-level 初始 access_token "可选"调 userinfo 造成 UX 不一致 删除"可选"语义,统一从 id_token 解码 sub(同 L61),token 退出前一律丢弃
08:55:21 path-level deprecation window 中间态 fallback 顺序未定义 §Actor Architecture 新增 bullet:turn 路径binding_id, fallback 到 credential_ref(即便 grain 内有遗留值)
v2-A RFC 8693 token-exchange vs #549 专用端点分歧 新增 §Divergence from NyxID#549 Initial Sketch,3 条理由 + 同步在 ChronoAIProject/NyxID#549 提了 alignment comment
v2-B /unbind 行为未定义 Decision 加 /unbind bullet:RevokeBindingAsyncDELETE {nyxid}/oauth/bindings/{binding_id}ExternalIdentityBindingRevokedEvent;NyxID 失败本地不擅自撤销(避免 source-of-truth 不一致)
v2-C write-side vs query-side projection 边界注释 §Projection Readiness 顶部加显式分类:callback handler 是 write-side completion path,query-time priming 禁令对此场景不适用;turn/query 路径仍严守禁令

NyxID 同步:ChronoAIProject/NyxID#549 已贴 alignment comment 提议契约对齐 RFC 8693 token-exchange 与 urn:nyxid:scope:broker_binding URN 命名。

如果哪条仍不满意请直接戳。

@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 28, 2026

ADR-0017 v3 Review

v2 的 A/B/C 三个问题全部已修复:

  • A (#549 分歧) ✅ — §Divergence from NyxID#549 Initial Sketch 明确记录了端点差异 + 选择理由 + 已同步 #549 comment
  • B (/unbind 定义) ✅ — Decision 段落完整定义了 /unbind 流程,含 NyxID 调用失败的 source-of-truth 保护策略
  • C (写侧 vs query 侧边界) ✅ — §Projection Readiness 显式区分了 write-side completion path vs AGENTS.md 禁的 query-time priming;IProjectionReadinessPort 限定为 write-side-only

v3 新增内容检查通过:

  • id_token 替代 /oauth/userinfo(省 round-trip,OIDC 标准做法)✅
  • §Bot-Owner-Shared 模式终止策略(A/B/C 选项留 product decision)✅
  • AuthContext 直接用 typed external_subject = 4,不做 string 重载过渡 ✅
  • credential_ref 无 fallback(broker 上线即切断,不留 deprecation window 中间态)✅
  • invalid_grant 错误码对齐写入 Dependencies 契约冻结条件 ✅

LGTM — 没有阻塞项。 ADR 质量足够支撑后续 implementation PR。

@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 28, 2026

ADR-0017 Final Review

前几轮 9+3 问题全部 resolved,核心设计无阻塞项。以下是遗留的实现细节问题,不阻断 ADR 合并但建议在 implementation PR 前补齐:


1. HMAC state_token 设计细节缺失

state_token = HMAC(service_key, {correlation_id, external_subject_ref, pkce_verifier, exp}) 缺少:

  • kid / key rotation: service_key 被 compromise 后如何 rotation?没有 kid 字段意味着无法区分新旧 key 签发的 token。建议 state_token header 加 kid,key rotation 时 grace period 双签
  • 序列化格式: HMAC payload 用 Protobuf 还是 JSON?按 CLAUDE.md 应该 Protobuf,但 HMAC input 需要 deterministic serialization(field order 固定),需要明确
  • binding_id 唯一约束: 同一 ExternalSubjectRef 重复触发 /init(用户快速双击),两个 callback 可能都成功。NyxD 侧是否有 (client_id, external_subject) unique constraint?如果没有,ExternalIdentityBindingGAgent 需要处理多 binding 竞争(取最新?旧的自动 revoke?)

2. 并发 /init 竞态

用户在 Lark 快速连发两条 /init

/init_1 -> miss -> 生成 OAuth URL_1 -> 发 DM
/init_2 -> miss -> 生成 OAuth URL_2 -> 发 DM  (此时 binding 还没完成)

两个 OAuth 流程并行进行。ADR 的幂等语义是"已绑定就拒绝",但 race window 内两个都未绑定。建议补一条:

并发 /init 时,ExternalIdentityBindingGAgentExternalIdentityBoundEvent commit 时做 (platform, tenant, external_user_id) unique 约束(幂等 commit),后到的 callback 走已有 binding 返回


3. Callback handler 错误 UX 未定义

callback handler 有多个失败路径(state_token 过期 / HMAC 校验失败 / token exchange 失败 / projection 等待超时),但 ADR 只定义了超时的文案。建议至少分类:

错误 HTTP 响应 用户可见文案
state_token 过期/篡改 400 "链接已过期,请重新 /init"
token_exchange 失败 502 "NyxID 绑定失败,请重试"
projection 超时 200 (异步) "绑定已写入,稍后重发消息"

特别是 state_token 过期(exp ≤5min,网络延迟 + 用户犹豫就可能超时)需要有明确的 retry UX,而不是一个 generic 500。


4. 外发 IssueShortLivedAsync 无容错

§Outbound Send 定义"每次发送前调 IssueShortLivedAsync,不缓存"。但没有说明:

  • NyxID 短暂不可用时(5xx / timeout):retry?backoff?fail silently?还是整个 outbound 失败?
  • 并发 outbound(同用户多条消息并行发送):多次 token-exchange 是否被 NyxID rate limit?是否需要 per-binding 本地 semaphore?

建议至少补一句"NyxID 不可用时 outbound fail with error,不 fallback 到 bot owner token",避免实现者自行加 fallback 破坏 zero-secret 不变量。


5. /unbind 后立即 /init 的时序

用户 /unbindDELETE /oauth/bindings/{binding_id} → NyxID 成功 → 本地 ExternalIdentityBindingRevokedEvent → projection 物化。如果用户在 projection 物化前立即 /init

/unbind -> DELETE 成功 -> ResolveAsync 仍返回旧 binding_id (projection 未更新)
/init   -> ResolveAsync hit -> 返回"已绑定,请先 /unbind"(误判)

建议 /unbind handler 同步等 projection 水位(复用 IProjectionReadinessPort),或 /init 的幂等检查读 actor 直接态而非 projection。


总结

以上 5 点都是实现层面的边界条件,不阻断 ADR 合并。建议在 Consequences 或 Dependencies 里加一段 "Implementation Notes" 把这些记下来,避免后续 PR 遗漏。

Add §Implementation Notes (5 subsections) so the implementation PR
inherits the boundaries instead of relitigating them:

- HMAC state_token: Protobuf payload, deterministic field order,
  kid + key rotation grace > exp.
- /init concurrency: actor commit is idempotent; aevatar does not
  require a unique constraint on the NyxID side, so NyxID is free to
  reap orphans on its own schedule.
- Callback handler error UX: classified table with HTTP code and
  user-facing copy.
- IssueShortLivedAsync failures: outbound and turn paths fail loudly
  with no fallback; rate limiting authority lives in NyxID.
- /unbind -> /init timing: the unbind handler waits on the
  projection (write-side completion path) so the next /init does
  not see a stale "already bound" reply.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 29, 2026

ADR-0017 Final Review Resolution

Round 4 — commit f48e881d 新增 §Implementation Notes,5 点全部 address。reviewer 明确"不阻断 ADR 合并",落点放在新章节避免冲淡 Decision/Consequences。

# 内容 §Implementation Notes 落点
1 HMAC state_token 设计细节(kid / 序列化 / rotation) §1:Protobuf deterministic 序列化 + kid header + rotation grace > exp(≥10 min,确保不打断在飞 binding)
2 并发 /init 竞态 §2:ExternalIdentityBindingGAgent 单线程 actor commit-time 幂等,丢弃后到 callback 的 binding_id;NyxID 端不要求 unique 约束(简化 NyxID),orphan 由 NyxID reaper 处理(SHOULD 不构成 aevatar 实现依赖)
3 Callback handler 错误 UX 分类 §3:四类错误 × HTTP code × 用户文案表,涵盖 state_token 过期 / token exchange 失败 / projection 等待超时 / 其他
4 IssueShortLivedAsync 失败容错 §4:outbound / turn 路径绝不 fallback 到 bot owner 或缓存 token(zero-secret 不变量);rate limit 单一权威在 NyxID,aevatar 不做 client-side semaphore
5 /unbind/init 时序 §5:/unbind handler 同步等 projection 水位(复用 IProjectionReadinessPort);采"/init 读 actor 直接态"备选(避免双查询源)

一些选择理由

  • §4 不 fallback 是硬约束:zero-secret 不变量在 outage 时容易被绕过("先用 bot owner 顶一下"),所以 ADR explicit 写"绝不",防止 implementation 阶段被压力推动悄悄加 fallback
  • §5 不读 actor 直接态:虽然能避免 projection lag,但代价是 turn 路径双查询源(actor 直读 + projection),违反单一查询源原则;改为 /unbind 写侧等水位代价更小
  • §1 双签 grace > exp:rotation 不打断在飞 binding 的最简正确性条件,生产 P99 远小于 5min,grace = 10min 留足 buffer

如还有边界细节遗漏请戳;否则可以勾掉 final review。

@eanzhao eanzhao changed the title ADR-0017: per-user NyxID binding via OAuth broker (NyxID-side refresh_token) feat(channel): ADR-0017 + scaffolding for per-user NyxID binding Apr 29, 2026
Land the parts of ADR-0017 that do not depend on the NyxID broker
endpoint (gated on ChronoAIProject/NyxID#549) so the capability seam
and actor contract stop floating in the air:

- Two new csprojs: Aevatar.GAgents.Channel.Identity.Abstractions
  (proto value objects + interfaces) and Aevatar.GAgents.Channel.Identity
  (actor + DI module).
- Proto: ExternalSubjectRef, BindingId, CapabilityScope, CapabilityHandle,
  BindingChallenge in Abstractions; ExternalIdentityBindingState,
  CommitBindingCommand, RevokeBindingCommand, ExternalIdentityBoundEvent,
  ExternalIdentityBindingRevokedEvent in Identity. State holds only the
  opaque binding_id — never refresh_token.
- ExternalIdentityBindingGAgent: single-threaded actor that commits
  bindings idempotently (ADR §Implementation Notes #2) and revokes as a
  no-op when no active binding exists.
- INyxIdCapabilityBroker, IExternalIdentityBindingQueryPort,
  IProjectionReadinessPort, BindingRevokedException — interfaces sized
  to the ADR's broker contract; a concrete NyxIdRemoteCapabilityBroker
  lands in a follow-up PR.
- InMemoryCapabilityBroker test fake plus xUnit coverage of the actor
  (idempotent commit, rebind-after-revoke) and the broker (revoke,
  invalid_grant signalling).

Builds clean across the solution; the new Identity-tagged tests pass.
DI module is a stub for now — the production wiring of the broker,
projection, and OAuth callback endpoint comes in subsequent PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
}

if (!string.IsNullOrEmpty(State.BindingId))
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 2 models] severity=major, category=design

Both HandleCommitBinding and HandleRevokeBinding accept cmd.ExternalSubject and apply it directly without verifying it matches the actor's own identity. Because actor IDs are deterministic from ExternalSubjectRef.ToActorId(), a routing/dispatch bug that misroutes a command would silently persist a binding under a mismatched key (State.ExternalSubject is overwritten by ApplyBound from the event payload). Add a fail-fast guard at the top of each handler that compares cmd.ExternalSubject.ToActorId() against this.GetPrimaryKeyString() and rejects mismatches. Two of three reviewers also flagged that ApplyBound should not overwrite ExternalSubject since it's an actor-identity invariant.

Per-model verbatim
  • glm-5.1 (L58): HandleCommitBinding and HandleRevokeBinding accept any cmd.ExternalSubject without verifying it matches the actor's own identity. If a routing/dispatch bug sends a CommitBindingCommand to the wrong actor, the binding would be persisted under a mismatched state (State.ExternalSubject overwritten by ApplyBound to the command's subject rather than the actor's key).

Suggested fix — resolve the actor's own identity at activation and assert equivalence:

// In ActivateAsync or via a property set by the runtime:
private ExternalSubjectRef _ownSubject; // derived from actor ID

// At the top of HandleCommitBinding / HandleRevokeBinding:
if (!cmd.ExternalSubject.Equals(_ownSubject))
{
    Logger.LogWarning("Command rejected: subject mismatch...");
    return;
}

This also means the ApplyBound transition should not overwrite ExternalSubject from the event — it should only set binding_id and bound_at, since the subject is an invariant of the actor's identity.

  • kimi-k2p6 (L92): HandleRevokeBinding accepts an ExternalSubject parameter on the command but ignores any potential mismatch with the actor's own identity. Since actor IDs are deterministic per ExternalSubjectRef, a command arriving for a different subject is architecturally impossible under normal dispatch; however, adding an explicit guard (if (cmd.ExternalSubject.ToActorId() != this.GetPrimaryKeyString())) would make the invariant self-documenting and fail-fast if routing ever breaks.

/// </summary>
internal sealed class InMemoryCapabilityBroker : INyxIdCapabilityBroker
{
private readonly ConcurrentDictionary<string, BindingId> _bindings = new(StringComparer.Ordinal);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 2 models] severity=minor, category=bug

_revokedBindings is a plain HashSet<string> while the adjacent _bindings field uses ConcurrentDictionary. RevokeBindingAsync writes to _revokedBindings and IssueShortLivedAsync reads from it — concurrent test scenarios could corrupt the set or produce stale reads. Replace with ConcurrentDictionary<string, byte> (using TryAdd / ContainsKey) so both fields share the same thread-safety model.

Per-model verbatim
  • deepseek-v4-pro (L22): _revokedBindings is a non-thread-safe HashSet<string> mutated inside RevokeBindingAsync and read inside IssueShortLivedAsync, yet the class uses ConcurrentDictionary in adjacent fields, implying potential concurrent access. Even though this is a test fake, concurrent test scenarios could corrupt the set. Fix: wrap with lock or use ConcurrentDictionary<string, byte> for the revoked set.
  • glm-5.1 (L14): _revokedBindings is a plain HashSet<string> while _bindings uses ConcurrentDictionary. RevokeBindingAsync writes to _revokedBindings and IssueShortLivedAsync reads from it — these can be called concurrently in integration tests. Replace with ConcurrentDictionary<string, byte> (or wrap accesses in a lock) to match the thread-safety model of _bindings:
private readonly ConcurrentDictionary<string, byte> _revokedBindings = new();
// .Add → .TryAdd, .Contains → .ContainsKey

@@ -0,0 +1,49 @@
namespace Aevatar.GAgents.Channel.Identity.Abstractions;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[deepseek-v4-pro] severity=major, category=test

No tests cover EnsureValid or ToActorId. These methods gate actor-ID construction and reject invalid input (empty fields, colon characters). Missing coverage here risks silent regressions when field-validation rules change. Add unit tests for: valid subject passes, null/empty platform rejects, null/empty external_user_id rejects, colon-in-field rejects, and ToActorId produces the expected colon-joined format.


_agent = new ExternalIdentityBindingGAgent
{
Services = _serviceProvider,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[kimi-k2p6] severity=major, category=test

The IEventSourcingBehaviorFactory<> and InMemoryEventStore registrations are hand-rolled test wiring that may drift from production runtime behavior. Since this is scaffolding, consider adding a follow-up task to exercise the agent through the Orleans test cluster / in-memory silo rather than direct instantiation, so that lifecycle (activation, rehydration, deactivation) and dispatch semantics are covered.


public Task<CapabilityHandle> IssueShortLivedAsync(
ExternalSubjectRef externalSubject,
CapabilityScope scope,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[kimi-k2p6] severity=minor, category=design

IssueShortLivedAsync throws BindingRevokedException when there is no active binding (!_bindings.TryGetValue). The exception name implies NyxID revoked an existing binding, but the message says "No active binding". Per ADR-0017, BindingRevokedException should only be thrown for invalid_grant on an existing binding. Consider introducing a separate exception (e.g. BindingNotFoundException) for the missing-binding case to avoid conflating "never bound" with "revoked" semantics.

/// (concurrent /init protection — see ADR-0017 §Implementation Notes #2).
/// The orphan binding on the NyxID side is left for NyxID's own reaper.
/// </summary>
[EventHandler]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[mimo-v2.5-pro] severity=nit, category=design

HandleCommitBinding validates cmd.BindingId for whitespace but uses string.IsNullOrWhiteSpace while HandleRevokeBinding uses string.IsNullOrEmpty for State.BindingId. Minor inconsistency — consider aligning both to string.IsNullOrEmpty since protobuf deserialization never produces whitespace-only strings for string fields.


// Per-(platform, tenant, external_user_id) actor state holding the opaque
// NyxID-side binding pointer. Holds NO refresh_token or any user secret —
// see ADR-0017 §Storage Boundary. binding_id is empty when no active binding;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[mimo-v2.5-pro] severity=nit, category=design

ExternalIdentityBindingState.external_subject is stored redundantly alongside the actor's inherent key (ExternalSubjectRef). Since the actor ID is deterministic from ExternalSubjectRef (via ToActorId()), the state field duplicates the key. This is fine for read-model convenience (projection doesn't need to parse the actor ID), but worth noting in a code comment that it exists for projection ergonomics, not as a separate fact.

@@ -0,0 +1,286 @@
---
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[kimi-k2p6] severity=nit, category=style

ADR frontmatter status: proposed is correct per ADR text, but the PR description says accepted pending NyxID#549 contract freeze. Ensure frontmatter and PR description are consistent before merge to avoid confusion in tools/docs/lint.sh or downstream doc parsers.

NyxID#549 merged 2026-04-28 (PR ChronoAIProject/NyxID#555) — broker
contract is frozen. ADR status flips proposed -> accepted; the broker
seam in code splits cleanly along the contract.

Major:
- ExternalIdentityBindingGAgent: defensive subject-mismatch guard at
  the top of HandleCommitBinding / HandleRevokeBinding so a routing bug
  cannot silently persist a binding under the wrong actor key. ApplyBound
  no longer overwrites State.ExternalSubject on subsequent events — the
  field is an actor-identity invariant set once on first bind.
- INyxIdCapabilityBroker: drop ResolveBindingAsync. Reads now go strictly
  through IExternalIdentityBindingQueryPort so write-side and read-side
  seams stay separated. ADR §INyxIdCapabilityBroker updated to match.
- New BindingNotFoundException for the "never bound" case so callers can
  distinguish it from BindingRevokedException ("NyxID-side invalid_grant
  on a previously-bound subject"). InMemoryCapabilityBroker switched.
- InMemoryCapabilityBroker: _revokedBindings -> ConcurrentDictionary so
  it shares the thread-safety model of _bindings; the fake now also
  implements IExternalIdentityBindingQueryPort over the same dictionary
  so test wiring stays tight.
- ExternalSubjectRefExtensionsTests: cover EnsureValid + ToActorId
  (valid subject, empty platform / external_user_id, colon-in-field,
  null), so the actor-id format and required-field invariants are
  pinned by tests, not just by review.

Minor:
- HandleCommitBinding now uses string.IsNullOrEmpty consistently with
  HandleRevokeBinding — protobuf fields never produce whitespace-only
  strings, so the IsNullOrWhiteSpace inconsistency was a footgun.
- Idempotency check carries a comment explaining the guarantee comes
  from cluster event-store OCC plus single-actor turn ordering, not
  from the in-handler check alone.
- Proto comment on ExternalIdentityBindingState.external_subject notes
  the field exists for projection ergonomics and that ApplyBound never
  overwrites it.
- ExternalIdentityBindingGAgentTests: cover null-subject paths for both
  HandleCommitBinding and HandleRevokeBinding.
- InMemoryCapabilityBrokerTests: read paths exercise the QueryPort seam
  explicitly (no more broker.ResolveBindingAsync).

Tests: 29 Identity-tagged tests pass (was 18 — 11 new from extensions
coverage and null-subject paths).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 29, 2026

Multi-model Review Resolution + ADR accepted Bump

Round 5 — commit b3db0ee9,12 个 inline 全部 address;同时 NyxID#549 已合入 (ChronoAIProject/NyxID#555),ADR-0017 status proposedaccepted,broker 契约 freeze。

Major(全部修)

Source 内容 落点
L58 (consensus 2 models) Subject-mismatch guard + ApplyBound 不应覆盖 ExternalSubject(actor identity 不变量) IsCommandSubjectMatchingActor() 在两个 handler 入口处校验,Id 非空时(生产)拒绝不匹配命令;ApplyBound 改 ExternalSubject ??= evt.ExternalSubject 只首次设置
L14 (consensus 2 models) _revokedBindings 线程安全 ConcurrentDictionary<string, byte> 替代 HashSet<string>,跟 _bindings 一致
L76 BindingRevokedException for unbound 语义混淆 新增 BindingNotFoundException(没绑过)vs BindingRevokedException(NyxID 端 revoke);InMemoryCapabilityBroker.IssueShortLivedAsync 区分 throw
L1 (abstractions) EnsureValid / ToActorId 无测试 新增 ExternalSubjectRefExtensionsTests(7 cases:valid / empty platform / empty user / 三个字段各自含 colon / null)

Minor(全部修)

Source 内容 落点
L29 ResolveBindingAsync 在 broker 上混了 read/write 边界 移除 INyxIdCapabilityBroker.ResolveBindingAsync;读路径只走 IExternalIdentityBindingQueryPort.ResolveAsync;InMemoryCapabilityBroker 同时实现两个接口共享字典;ADR §INyxIdCapabilityBroker 同步重写
L97 缺 null ExternalSubject path 的 test HandleCommitBinding_IgnoresNullExternalSubject + HandleRevokeBinding_IgnoresNullExternalSubject
L36 IsNullOrWhiteSpace vs IsNullOrEmpty 不一致 统一用 IsNullOrEmpty(protobuf 字段不会产生 whitespace-only 字符串)
L56 幂等 check 应说明依赖 event-store OCC XML doc 加段说明:in-handler check 是 fast path,OCC at append-time 处理 racing past State load
L12 external_subject 字段冗余说明 proto 注释加:存在为 projection ergonomics,ApplyBound 永不覆盖(L58 invariant)
L98 Reason ?? string.Empty 丢失 unspecified 语义 保持 empty string + 日志读取时用 "unspecified" 兜底,proto 字段语义不变(empty=unspecified is a frequent OAuth/JWT convention)
L27 OrCurrent() 静默吞未匹配 event ChannelBotRegistrationGAgent 等现有 actor 一致,未改;dev-build NotSupportedException 是平台级 cross-cutting,跨 actor 统一处理更合适,本 PR 不动
L1 frontmatter status 一致性 NyxID#549 merge 后 ADR 升 accepted,跟 PR 描述对齐
L35 (test wiring drift) hand-rolled test wiring 可能与生产 runtime 偏差 follow-up,加 Orleans test cluster / in-memory silo 覆盖。本 PR 不动

NyxID#549 解锁 + ADR accepted

NyxID#549 merge 后 contract freeze:urn:nyxid:scope:broker_binding / urn:nyxid:params:oauth:token-type:binding-id / POST /oauth/token (auth_code) 在 broker scope 下返 binding_id / RFC 8693 token-exchange / DELETE /oauth/bindings/{binding_id} / oauth_broker_binding.revoked HMAC webhook + V2 加固(DPoP/mTLS/PAR)。

ADR Dependencies 段重写,显式列出 NyxID#549 已 freeze 的契约 + aevatar 后续可独立落地的工作项(broker remote impl / projection / OAuth callback endpoint / slash-command routing / AuthContext.external_subject 字段 / CAE webhook 接收)。

Tests

dotnet test --filter "FullyQualifiedName~Identity" — 29 passed(从 18 涨到 29:11 个 new test 来自 extensions 覆盖 + null-subject 路径)。

接下来准备落地 projection + query port impl。

…ash-command

Wires up all six pieces the ADR called out as parallel-capable now that
NyxID#549 has shipped, so per-user binding stops requiring a follow-up PR
per layer:

#1 — AuthContext.external_subject (Channel.Abstractions)
    Move ExternalSubjectRef out of Identity.Abstractions into
    Channel.Abstractions where AuthContext lives. Identity is a consumer
    of channel concepts, not their owner; the new typed
    AuthContext.external_subject = 4 field is the broker-mode outbound
    identity carrier (ADR §Outbound Send) and the legacy string
    user_credential_ref keeps working for non-broker callers. Adds a
    new AuthContext.OnBehalfOfExternalSubject helper.

#2 — Projection chain
    ExternalIdentityBindingDocument (proto) + Partial (IProjectionReadModel)
    + MetadataProvider + MaterializationContext + Projector
    (ICurrentStateProjectionMaterializer) + ExternalIdentityBindingProjectionQueryPort
    + ExternalIdentityBindingProjectionReadinessPort (polling impl, write-side
    completion path only — ADR §Projection Readiness explicitly allows this).
    AddChannelIdentityProjection registers the lot.

#3 — NyxIdRemoteCapabilityBroker
    HTTP-backed INyxIdCapabilityBroker + INyxIdBrokerCallbackClient against
    the NyxID#549 wire shape: /oauth/authorize URL building, RFC 8693
    token-exchange with subject_token_type=urn:nyxid:params:oauth:token-type:binding-id,
    /oauth/bindings/{id} delete, authorization-code -> binding_id exchange.
    Maps invalid_grant -> BindingRevokedException so the upper layer
    event-source-revokes the local binding. PkceHelper covers RFC 7636 S256;
    StateTokenCodec seals correlation+verifier+exp into an HMAC token (kid
    header, deterministic Protobuf payload — ADR §Implementation Notes #1).
    AddNyxIdRemoteCapabilityBroker wires HttpClient + options + codec +
    both interfaces.

#4 — /api/oauth/nyxid-callback endpoint (IdentityOAuthEndpoints)
    Decodes state, exchanges code -> binding_id, dispatches CommitBindingCommand
    to ExternalIdentityBindingGAgent via IActorRuntime, waits on the projection
    readiness port, returns a friendly bind-confirmation page (id_token decoded
    locally for the display name — no /oauth/userinfo round-trip per ADR L61).
    Classifies error UX by HTTP status (ADR §Implementation Notes #3):
    400 for state-token issues, 502 for broker exchange failure, 200 with a
    "binding pending propagation" message on projection timeout.

#5 — Slash-command routing in ChannelConversationTurnRunner
    /init and /unbind are handled before the LLM by a new
    TryHandleSlashCommandAsync that resolves identity ports lazily through
    the existing IServiceProvider — deployments without per-user binding
    fall through unchanged. /init replies with the authorize URL (private DM
    only — ADR §Decision); /unbind resolves binding_id, calls
    broker.RevokeBindingAsync, replies with the unbind confirmation. The
    runner short-circuits via the existing SendReplyAsync, so reply
    delivery rides the relay outbound port like any other reply.

#6 — /api/webhooks/nyxid-broker-revocation receiver (Continuous Access Evaluation)
    BrokerRevocationWebhookValidator verifies HMAC-SHA256 over the raw body
    (X-NyxID-Signature: sha256=<hex>), parses the JSON envelope into a
    typed BrokerRevocationNotification, and the endpoint event-source-revokes
    the local binding actor. Aligns with NyxID#549 V2-7's CAE channel — when
    a user revokes from NyxID's console the binding goes inactive in seconds
    rather than waiting for the 5-min access TTL.

Tests:
- 591 ChannelRuntime.Tests pass (39 of which are the new Identity-tagged
  tests covering actor, projector, broker fake, state-token codec, and
  ExternalSubjectRef extension).
- Full solution `dotnet build aevatar.slnx` is clean.
- `tools/docs/lint.sh` 33 file(s) checked, 0 errors.

The ChannelConversationTurnRunner integration is feature-flag-shaped: if
neither IExternalIdentityBindingQueryPort nor INyxIdCapabilityBroker is
registered, the slash-command path is a no-op and existing
bot-owner-shared behaviour is preserved. Production rollout is a DI
toggle (AddChannelIdentityProjection + AddNyxIdRemoteCapabilityBroker
+ MapIdentityOAuthEndpoints) plus the Bot-Owner-Shared termination
strategy choice from the ADR §Bot-Owner-Shared 模式终止策略.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 29, 2026

All Six Follow-up Items Landed (commit c67fe537)

NyxID#549 已合入,本 PR 把 ADR §Dependencies 列出的 6 个"后续工作"全部落地。591 个 ChannelRuntime 测试通过(其中 39 个是 Identity 模块新增),dotnet build aevatar.slnx 0 errors,docs lint 33 文件 0 errors。

# 内容 落点
1 AuthContext.external_subject = 4 typed 字段 ExternalSubjectRefChannel.Identity.Abstractions 移至 Channel.Abstractions(Identity 是 channel 的 consumer,不是 owner);AuthContext.OnBehalfOfExternalSubject(ExternalSubjectRef, string) helper 加入 AuthContext.Partial.cs。legacy user_credential_ref 保留给非 broker 路径。
2 Projection chain ExternalIdentityBindingDocument(proto + IProjectionReadModel partial)、MetadataProviderMaterializationContextExternalIdentityBindingProjector(ICurrentStateProjectionMaterializer)、ExternalIdentityBindingProjectionQueryPortExternalIdentityBindingProjectionReadinessPort(polling impl,write-side completion path);AddChannelIdentityProjection() 一行注册全部。
3 NyxIdRemoteCapabilityBroker(HTTP + PKCE + state_token HMAC) 直接对齐 NyxID#549 wire shape:/oauth/authorize URL 构造、RFC 8693 token-exchange(subject_token_type=urn:nyxid:params:oauth:token-type:binding-id)、DELETE /oauth/bindings/{id}、authorization-code → binding_id 兑换。invalid_grantBindingRevokedExceptionPkceHelper(RFC 7636 S256)+ StateTokenCodec(deterministic Protobuf payload + kid header + HMAC-SHA256;ADR §Implementation Notes #1 全部对齐);AddNyxIdRemoteCapabilityBroker(IConfiguration?) 一行注册。
4 /api/oauth/nyxid-callback endpoint IdentityOAuthEndpoints.HandleNyxIdOAuthCallbackAsync:验 state → 兑换 code → 派 CommitBindingCommand 到 actor → 等 projection 水位(超时返回"binding pending propagation"友好提示);id_token 本地解码做"已绑定 "展示,不调 /oauth/userinfo。错误 UX 按 ADR §Implementation Notes #3 分类:400 / 502 / 200 propagation。
5 ChannelConversationTurnRunner slash-command 路由 RunInboundAsync 顶部加 TryHandleSlashCommandAsync,通过 _services 懒查 identity ports —— 没注册时 no-op,保持向后兼容(无 binding 部署不 regression)。/init hit→幂等回复,miss→出 broker authorize URL;/unbind resolve→broker.RevokeBindingAsync→回执;reply 走现有 SendReplyAsync(relay outbound)。
6 /api/webhooks/nyxid-broker-revocation CAE webhook BrokerRevocationWebhookValidator:HMAC-SHA256 over raw body(X-NyxID-Signature: sha256=<hex>,FixedTimeEquals),JSON envelope → typed notification → RevokeBindingCommand 派给 actor。NyxID#549 V2-7 CAE 通道对齐;用户在 NyxID UI revoke → aevatar 秒级失效,不必等 5min access TTL。

关键设计决定

  • broker 接受 IExternalIdentityBindingQueryPort 注入:RevokeBindingAsync(subject)IssueShortLivedAsync(subject) 内部 resolve 到 binding_id,然后调真实 NyxID endpoint。caller 拿 subject 即可,read/write 边界仍然分清(QueryPort 是单一 read seam)。
  • Slash-command 默认 no-op:不破坏现有 bot-owner-shared 部署;打开 binding 是显式 DI 注册动作(AddChannelIdentityProjection + AddNyxIdRemoteCapabilityBroker + MapIdentityOAuthEndpoints)。
  • id_token 解码不验签:OAuth 兑换本身已经认证响应,id_token 的 sub/name 仅用于展示文案,不持久化。零 NyxID round-trip。
  • /init URL 私域投递:依赖现有 SendReplyAsync 的 relay outbound 走 Lark DM(ADR §Decision);群聊里发的 /init 走 DM 回复天然防 OAuth state hijack。
  • Projection readiness 是 polling:50ms 步长 + 3s 超时;write-side completion path,turn / query 路径仍 priming。

注册示例(production composition root)

services
    .AddChannelIdentityProjection()                          // Item #2
    .AddNyxIdRemoteCapabilityBroker(configuration)            // Item #3
    .AddChannelIdentityWebhookValidators();                   // Item #6
// Plus an in-memory or Elasticsearch projection store for ExternalIdentityBindingDocument.
// Plus appsettings:Aevatar:NyxIdBroker section binding NyxIdBrokerOptions.

app.MapIdentityOAuthEndpoints();                              // Items #4 + #6

留给后续 PR

  • Bot-Owner-Shared 终止策略选定:ADR §Bot-Owner-Shared 模式终止策略 列了 A/B/C 三选项;production 上线前需选一个并入 runbook。
  • AuthContext.user_credential_ref 实际删除:现在只是标记为 legacy + 加新字段;实际 outbound adapter 切到 typed external_subject 字段是 channel-runtime / Lark / Telegram 各自的小 PR。
  • V2 加固(可选):NyxID#549 提供 DPoP / mTLS / PAR;采纳是 broker config + HTTP header dance 的扩展,不动核心契约。
  • CAE webhook 共享密钥:目前 BrokerRevocationWebhookValidator 复用 StateTokenHmacKey;production 部署可拆为独立 WebhookHmacKey 配置项。

if (!isInit && !isUnbind)
return null;

var queryPort = _services.GetService<IExternalIdentityBindingQueryPort>();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 4 reviewers] severity=major, category=bug

ExternalSubjectRef is constructed with Tenant = string.Empty for every platform, but per ADR-0017 the Lark identity key is (platform, tenant, external_user_id)tenant (open_tenant_id) is part of the identity. Hardcoding empty collapses all Lark tenants into one binding actor / readmodel, leaking bindings across tenants. Extract tenant from the inbound activity / channel registration (Lark provides open_tenant_id on every event) and pass it in. For platforms without tenant scope, the existing empty-tenant validation rule still applies — but the value can't be hardcoded for all of them.

Per-reviewer verbatim
  • deepseek-v4-pro (L155): ExternalSubjectRef is built with Tenant = string.Empty unconditionally. For multi-tenant platforms like Lark, this collapses all tenants into a single actor key, causing cross-tenant identity collisions. Extract the tenant from activity or inbound (e.g., LarkChatActivity.TenantKey / open_tenant_id) and populate ExternalSubjectRef.Tenant.

  • codex (L151): The tenant is hard-coded to empty for all platforms, but ExternalSubjectRef defines Lark's key as (platform, tenant, external_user_id) and only allows empty tenant for platforms without tenant scope. This can collapse users from different Lark tenants into the same binding actor/readmodel. Populate the platform tenant from the activity/registration payload (for Lark, open_tenant_id) or reject /init when the tenant cannot be derived.

  • glm-5.1 (L131): Tenant is hardcoded to string.Empty. For Lark platforms, the tenant (open_tenant_id) is a key part of the ExternalSubjectRef identity — two users with the same external_user_id in different Lark tenants are distinct subjects. The tenant should be extracted from activity, registration, or the inbound context.

If no tenant is available in the inbound message for the current platform, consider extracting it from activity.Conversation or registration rather than defaulting to empty.

  • kimi-k2p6 (L138): Tenant = string.Empty hardcodes an empty tenant for all platforms. Per the ADR and proto comments, Lark requires open_tenant_id in the actor key; omitting it causes cross-tenant collisions where users with the same external_user_id share one binding actor. Populate Tenant from the inbound activity or ChannelBotRegistrationEntry context (e.g. Lark open_tenant_id).

},
};
await actor.HandleEventAsync(commitEnvelope, ct).ConfigureAwait(false);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 3 reviewers] severity=blocker, category=bug

WaitForEventAsync(commitEnvelope.Id, ...) waits on the outer command-envelope ID generated by the callback, but ExternalIdentityBindingProjector.ProjectAsync matches against stateEvent.EventId — the inner StateEvent ID minted by the event-sourcing infrastructure during PersistDomainEventAsync. These IDs will never be equal, so the wait will always time out: the projection-readiness handshake is broken on the happy path. Fix: have HandleCommitBinding return (or expose via reply event) the actual StateEvent.EventId after persisting, and pass that into WaitForEventAsync. The same defect appears in HandleBrokerRevocationWebhookAsync (see other comment).

Per-reviewer verbatim
  • deepseek-v4-pro (L125): WaitForEventAsync(commitEnvelope.Id, ...) passes the outer envelope ID (Guid.NewGuid()) but ExternalIdentityBindingProjector.ProjectAsync stores stateEvent.EventId (the inner StateEvent ID generated by the event-sourcing infrastructure during PersistDomainEventAsync). These two IDs will never match, so the projection readiness wait will always time out. Binding is still persisted, but the ADR-required synchronous confirmation path is dead. Fix: capture the actual StateEvent.EventId from the commit result, or wait by actor committed-version instead of event ID.

  • codex (L129): This waits for commitEnvelope.Id, but ExternalIdentityBindingProjector writes LastEventId = stateEvent.EventId, not the command envelope id. Unless the event store deliberately reuses the incoming command id as the persisted state event id, successful callbacks will always hit the 3s timeout even though the projection caught up. Return/capture the persisted state event id or committed version from the actor write and wait on that value instead.

  • kimi-k2p6 (L139): WaitForEventAsync(commitEnvelope.Id, ...) passes the manually-generated command envelope id (Guid.NewGuid()), but the projector stores stateEvent.EventId in LastEventId. These ids are generated by different subsystems and likely never match, so the readiness wait will almost always time out. Pass the actual persisted event id/version (returned by PersistDomainEventAsync or available on the committed-state envelope) instead of commitEnvelope.Id.

using var request = new HttpRequestMessage(HttpMethod.Delete, $"{TrimAuthority()}{BindingsEndpoint}/{Uri.EscapeDataString(bindingId)}");
ApplyClientSecretBasic(request);
using var response = await _http.SendAsync(request, ct).ConfigureAwait(false);
if ((int)response.StatusCode >= 500)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 2 reviewers] severity=major, category=bug

RevokeBindingByIdAsync (and the revoke-by-subject path) treats every 4xx response as success ('binding already gone'). That collapses real failures (401 unauthorized, 403 forbidden, malformed body) into silent no-ops, so the user sees a successful /unbind while the binding remains active in NyxID. Fix: only treat 404 Not Found (and possibly 410 Gone) as idempotent success; surface other 4xx as errors with body context for diagnosis.

Per-reviewer verbatim
  • codex (L111): All NyxID 4xx responses are silently treated as revoke success. That masks real failures such as 401/403 client authentication errors or malformed requests; if local revoke is added after this call, aevatar can mark the binding revoked while NyxID still has it active. Only treat the explicit idempotent cases agreed with NyxID (for example 404/410 if that is the contract) as success, and throw for other non-2xx responses.

  • glm-5.1 (L147): RevokeBindingByIdAsync silently treats all 4xx responses as idempotent success. A 401 Unauthorized (misconfigured client credentials) or 403 Forbidden (insufficient scope) would be swallowed, masking a real configuration or permissions issue. Only 404/410 are legitimately idempotent for DELETE.

Suggestion: Distinguish between client errors:

if ((int)response.StatusCode is 404 or 410)
    return; // idempotent: binding already gone
if (!response.IsSuccessStatusCode)
    response.EnsureSuccessStatusCode();

{
var existing = await queryPort.ResolveAsync(subject, ct);
if (existing is not null)
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 2 reviewers] severity=major, category=bug

/unbind calls broker.RevokeBindingAsync(subject) to revoke at NyxID but never sends a RevokeBindingCommand to the local ExternalIdentityBindingGAgent. The actor's state and the projection readmodel will keep showing an active binding — subsequent /init checks, capability lookups, and any read-side query will see stale state until something else triggers a re-projection. After the broker call succeeds, dispatch RevokeBindingCommand(subject, reason='user_unbind') to the actor so the revoke flows through the same event chain as a NyxID-initiated revocation webhook.

Per-reviewer verbatim
  • codex (L95): RevokeBindingAsync only deletes the binding in NyxID; it never event-sources ExternalIdentityBindingRevokedEvent locally, so the projection continues to resolve the old binding_id after /unbind. The test fake hides this because it removes its in-memory entry. After a successful NyxID revoke, commit RevokeBindingCommand to ExternalIdentityBindingGAgent and have the /unbind path wait for projection readiness before replying success.

  • glm-5.1 (L162): The /unbind path calls broker.RevokeBindingAsync(subject) which only revokes at NyxID (DELETE endpoint). No RevokeBindingCommand is dispatched to ExternalIdentityBindingGAgent, so the local actor state retains binding_id and the projection readmodel stays active.

Consequence: after /unbind succeeds, ResolveAsync still returns the old binding_id. If the user sends /init, the idempotency check sees an existing binding and replies "已绑定" instead of starting a new OAuth flow.

Fix: After broker.RevokeBindingAsync succeeds, dispatch a RevokeBindingCommand to the actor (mirroring HandleBrokerRevocationWebhookAsync in IdentityOAuthEndpoints.cs), or have the broker's revoke path internally emit the command.

else
{
var challenge = await broker.StartExternalBindingAsync(subject, ct);
replyText = $"打开此链接完成 NyxID 登录(5 分钟内有效):\n{challenge.AuthorizeUrl}";
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[codex] severity=blocker, category=security

The OAuth authorize URL is sent back through SendReplyAsync to the current conversation. In a group chat this exposes the state token/link to other participants, enabling the state-hijack scenario ADR-0017 explicitly forbids. Deliver the challenge only through a sender-only/private channel (for example Lark DM), and fail closed for conversations/platforms where that private delivery is unavailable.

throw new ArgumentException("ExternalSubjectRef.external_user_id is required.", nameof(externalSubject));

if (externalSubject.Platform.Contains(':', StringComparison.Ordinal)
|| externalSubject.Tenant.Contains(':', StringComparison.Ordinal)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[glm-5.1] severity=major, category=bug

externalSubject.Tenant can be null when the object is constructed manually without setting Tenant (protobuf C# strings are nullable). Platform and ExternalUserId are guarded by explicit IsNullOrWhiteSpace checks above, but Tenant is accessed directly via .Contains(':') which will throw NullReferenceException.

Fix: Add a null guard before the colon check:

var tenant = externalSubject.Tenant ?? string.Empty;
if (externalSubject.Platform.Contains(':', StringComparison.Ordinal)
    || tenant.Contains(':', StringComparison.Ordinal)
    || externalSubject.ExternalUserId.Contains(':', StringComparison.Ordinal))

Or add externalSubject.Tenant ??= string.Empty; at the top of the method.


private sealed record TokenResponse
{
public string? AccessToken { get; init; }
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[codex] severity=major, category=bug

JsonSerializerDefaults.Web maps properties like AccessToken to accessToken, not OAuth's snake_case fields (access_token, id_token, binding_id, expires_in). As written, token exchange returns an empty access token and authorization-code exchange sees BindingId as missing. Add [JsonPropertyName("access_token")], [JsonPropertyName("binding_id")], etc., or configure a snake_case naming policy.

var payload = new StateTokenPayload
{
CorrelationId = correlationId,
ExternalSubject = externalSubject.Clone(),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[deepseek-v4-pro] severity=major, category=bug

payload.ToByteArray() uses standard protobuf serialisation, which is not guaranteed deterministic. ADR-0017 §Implementation Notes #1 requires deterministic serialisation for the HMAC-sealed payload, because any non-determinism (e.g., a future map<...> field) would silently break TryDecode verification. While StateTokenPayload currently has no unordered fields, pin the invariant by using a deterministic serialisation path.


byte[] bodyBytes;
await using (var ms = new MemoryStream())
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[deepseek-v4-pro] severity=major, category=security

HandleBrokerRevocationWebhookAsync copies the full request body into a MemoryStream without a size limit. A malicious or misconfigured caller can exhaust memory. Cap the body size (e.g., 64 KB via RequestSizeLimit attribute or IHttpMaxRequestBodySizeFeature) — webhook payloads should be small.

var actor = await actorRuntime.CreateAsync<ExternalIdentityBindingGAgent>(actorId, ct).ConfigureAwait(false);
var revokeEnvelope = new EventEnvelope
{
Id = Guid.NewGuid().ToString("N"),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[deepseek-v4-pro] severity=major, category=bug

Same event-ID mismatch exists in HandleBrokerRevocationWebhookAsync. Not exercised today because this handler returns Accepted() without a readiness wait, but ADR §Implementation Notes #5 mandates /unbind to wait for projection catch-up — the same bug will surface when that wait is added.

return;
}

if (!string.IsNullOrEmpty(State.BindingId))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[codex] severity=major, category=bug

Duplicate /init commits are discarded locally, but the newly issued NyxID binding_id is left active with no cleanup signal back to the callback handler. ADR-0017 requires best-effort DELETE /oauth/bindings/{binding_id} for the rejected binding; otherwise concurrent /init can create orphan active bindings in NyxID. Make the commit path return an accepted/rejected result (or expose an explicit command result event) so the callback can delete the rejected binding id.

var document = await _reader.GetAsync(readmodelId, ct).ConfigureAwait(false);
if (document is not null && string.Equals(document.LastEventId, eventId, StringComparison.Ordinal))
return;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[kimi-k2p6] severity=major, category=bug

Exact equality document.LastEventId == eventId is fragile: if the projection processes a later event before the next poll, LastEventId advances and this check falsely times out. The interface docs say "advance past the version", but the implementation checks exact match. Switch to a monotonic version comparison (e.g. document.StateVersion >= expectedVersion) and update the interface to accept a version watermark instead of an opaque event id.

_ = TrySendImmediateLarkReactionAsync(activity, registration, ct);

var inbound = ToInboundMessage(activity);
if (await TryHandleSlashCommandAsync(activity, inbound, registration, runtimeContext, ct) is { } slashResult)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[deepseek-v4-pro] severity=major, category=design

TryHandleSlashCommandAsync runs before TryHandleWorkflowResumeAsync (line 69). A workflow-resume message starting with /init or /unbind (e.g., /init previous-context) will be silently consumed as a slash command, dropping the resume. While unlikely given current slash-command match patterns, the ordering should be documented or inverted — workflow resume is the normal path; slash commands are rare.

_ = TrySendImmediateLarkReactionAsync(activity, registration, ct);

var inbound = ToInboundMessage(activity);
if (await TryHandleSlashCommandAsync(activity, inbound, registration, runtimeContext, ct) is { } slashResult)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[codex] severity=major, category=security

This only intercepts /init and /unbind; every other inbound message still continues into the existing LLM flow. Once identity ports are registered, an unbound sender can send any non-slash text and still use the legacy bot-owner credentials, which violates ADR-0017's "unbound sender must /init, no bot-owner fallback" rule. Add a pre-LLM binding gate for all inbound turns when broker mode is enabled: resolve ExternalSubjectRef, return /init guidance on miss, and only proceed to LLM after a binding exists.

Convert.ToBase64String(bytes).TrimEnd('=').Replace('+', '-').Replace('/', '_');

private static byte[] Base64UrlDecode(string value)
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[mimo-v2.5-pro] severity=minor, category=perf

HmacSign converts _options.StateTokenHmacKey to byte[] via Encoding.UTF8.GetBytes() on every call. Since StateTokenCodec is registered as a singleton, cache the key bytes in the constructor to avoid repeated allocations:

private readonly byte[] _hmacKeyBytes;

public StateTokenCodec(NyxIdBrokerOptions options, TimeProvider? timeProvider = null)
{
    _options = options ?? throw new ArgumentNullException(nameof(options));
    _timeProvider = timeProvider ?? TimeProvider.System;
    _hmacKeyBytes = Encoding.UTF8.GetBytes(_options.StateTokenHmacKey);
}

IConfiguration? configuration = null)
{
ArgumentNullException.ThrowIfNull(services);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[glm-5.1] severity=minor, category=design

AddHttpClient<NyxIdRemoteCapabilityBroker>() registers the broker as transient, but TryAddSingleton<INyxIdCapabilityBroker> captures that transient instance forever. While HttpClient is safe to hold long-term, NyxIdBrokerOptions is resolved once via IOptions<> snapshot and never refreshed. If options are reloaded via IOptionsMonitor, the singleton broker won't pick up changes.

Consider using IOptionsMonitor<NyxIdBrokerOptions> inside the broker, or document that options changes require an app restart.

}
catch (Exception ex)
{
logger.LogError(ex, "OAuth callback authorization-code exchange failed for correlation {CorrelationId}", correlationId);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[deepseek-v4-pro] severity=minor, category=design

actorRuntime.CreateAsync<ExternalIdentityBindingGAgent>(actorId, ct) can throw if activation fails. Wrap in try/catch and return a 503 with a user-friendly message (matching the existing error-handling pattern around the token exchange at lines 96–112) rather than letting the exception propagate to the framework's default error handler.

}
return null;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[mimo-v2.5-pro] severity=minor, category=design

The revocation webhook handler lacks error handling around actor.HandleEventAsync. If the actor throws (e.g. concurrency conflict, serialization error), the raw exception propagates to NyxID as a 500 with potentially sensitive details. Consider wrapping in try-catch and returning a generic 500 message, matching the pattern used in the OAuth callback handler.

@@ -0,0 +1,53 @@
using Aevatar.CQRS.Projection.Stores.Abstractions;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[kimi-k2p6] severity=minor, category=test

No tests cover the readiness port's polling loop, timeout path, or the exact-match logic. Add tests for: successful catch-up, timeout expiration, and the scenario where the projection advances past the expected event before the poll observes it.

_agent = new ExternalIdentityBindingGAgent
{
Services = _serviceProvider,
EventSourcingBehaviorFactory =
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[mimo-v2.5-pro] severity=minor, category=test

Tests call HandleCommitBinding directly rather than routing through HandleEventAsync with a properly constructed EventEnvelope. This bypasses the runtime's event routing and [EventHandler] dispatch. Consider adding at least one integration-style test that constructs an EventEnvelope with a packed CommitBindingCommand payload and calls HandleEventAsync to pin the full dispatch path.

…andling

Three consensus issues from the post-c67fe537 review pass:

1. **BLOCKER (3 reviewers, L125):** WaitForEventAsync was passing the
   outer command-envelope id (Guid.NewGuid in the callback) but the
   projector wrote stateEvent.EventId (minted by the event-sourcing
   infrastructure during PersistDomainEventAsync). The two ids never
   match, so the readiness handshake always timed out on the happy
   path. Replace event-id-based waiting with binding-state polling:
   WaitForBindingStateAsync(externalSubject, expectedBindingId, …)
   polls the readmodel until BindingId matches, or until the field
   clears for the revoke case. The interface now describes the
   semantics the callback actually needs rather than the event-store
   primitives it can't reliably observe.

2. **MAJOR (4 reviewers, L131):** Slash-command path hardcoded
   ExternalSubjectRef.Tenant = "" for every platform. Lark's identity
   key includes open_tenant_id and two users with the same
   external_user_id in different tenants are distinct subjects, so
   the empty default collapsed bindings across tenants. New
   ResolveTenant helper:
     1) prefers `open_tenant_id` / `tenant` from
        InboundMessage.Extra (platform adapter populated),
     2) falls back to registration.ScopeId so bindings stay at
        least per-bot-scoped (each bot is registered to one tenant),
     3) refuses the slash command when neither is available rather
        than committing a tenant-collapsed binding.

3. **MAJOR (2 reviewers, L111):** RevokeBindingByIdAsync collapsed
   every 4xx into "binding already gone", masking 401/403 client-auth
   errors and 422 validation failures as successful unbinds. Now only
   404 / 410 short-circuit as idempotent success — anything else
   surfaces body context in the log and re-throws so the caller can
   decide whether to retry or abort.

Tests: 39 Identity-tagged tests still pass, full
`dotnet build aevatar.slnx` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eanzhao
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 29, 2026

Multi-reviewer Consensus Fixes (commit a86d2047)

# Severity / Reviewers 内容 落点
L125 BLOCKER / 3 reviewers (deepseek-v4-pro, codex, kimi-k2p6) WaitForEventAsynccommitEnvelope.Id(callback 里 Guid.NewGuid),projector 写的是 stateEvent.EventId(PersistDomainEventAsync 里 ES infra 生成)。两者永远不等,readiness 永远超时 → happy path 死掉 重写 IProjectionReadinessPort:WaitForBindingStateAsync(externalSubject, expectedBindingId?, timeout, ct) —— 直接 poll readmodel 的 BindingId 字段(null = 等 revoke)。callback 改用新签名,语义明确,不绕 ES infra
L131 MAJOR / 4 reviewers (deepseek-v4-pro, codex, glm-5.1, kimi-k2p6) ExternalSubjectRef.Tenant = "" 硬编码 → Lark 多租户语义崩塌(同 open_user_id 跨租户共享一个 binding actor) ResolveTenant(inbound, registration) 三级降级:① inbound.Extra["open_tenant_id"] / ["tenant"](platform adapter 填) → ② registration.ScopeId(每个 bot 注册到唯一租户)→ ③ 拒绝 slash command(production 必须要排查配置,而不是写一个 tenant-collapsed binding)
L111 MAJOR / 2 reviewers (codex, glm-5.1) RevokeBindingByIdAsync 把所有 4xx 当 idempotent success → 401/403 client-auth、422 validation 全被吞 → 用户看到 /unbind 成功但 NyxID 那边 binding 还活着 只对 404 / 410 short-circuit;其他 4xx / 5xx 把 body 截断后入 log,然后 EnsureSuccessStatusCode() 抛,让调用方决定 retry / abort

验证

  • dotnet build aevatar.slnx — 0 errors
  • dotnet test --filter "FullyQualifiedName~Identity" — 39 passed,0 failed

一些选择理由

  • IProjectionReadinessPort 改 binding-state semantic:event-id 的 ES infra 暴露面我们够不到(PersistDomainEventAsync 没返回 stateEvent),用 version 也得 actor 暴露;直接 poll readmodel 的业务字段反而最直白 —— callback 真正想知道的就是"用户回到 Lark 发消息时 ResolveAsync 能看到这个 binding"。
  • Tenant fallback 到 registration.ScopeId:不是完美的 platform-tenant 映射,但把 binding 退到 per-bot 而不是全局 collapse,production 场景(每个 bot 在一个租户)语义正确,且 implementation 不需要在这个 PR 给所有 platform adapter 都加 tenant extraction —— 那个属于 platform 适配工作。Adapter 填 Extra["open_tenant_id"] 就升级到 platform-级 tenant scope。
  • 404/410 idempotent:DELETE 标准语义 —— "已经没有了" 跟 "成功删除" 等价;其他 4xx 是 caller 的责任。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant