feat(channel): ADR-0017 + scaffolding for per-user NyxID binding#477
feat(channel): ADR-0017 + scaffolding for per-user NyxID binding#477
Conversation
Per-user binding for channel bots can be implemented entirely as a NyxID OAuth client without any NyxID surface changes; the five endpoints proposed in discussion #400 collapse onto existing OAuth/OIDC primitives plus RFC 8693 Token Exchange. ADR captures the trade-off between local refresh-token storage (stop-gap) and the broker model (#375 target), accepts the local adapter as a transitional implementation under explicit hard constraints, and pins the migration path back to zero-secret-material grain state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c95c818a7d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| -> 回 Lark "{nyxid}/oauth/authorize?client_id=aevatar-channel-binding | ||
| &redirect_uri=https://aevatar/api/oauth/nyxid-callback | ||
| &state={correlation_id}&code_challenge=...&scope=openid" |
There was a problem hiding this comment.
Return binding authorize URL through a private channel
The /init flow currently returns a reusable OAuth authorize URL containing state directly to Lark, but the ADR does not require this response to be private/ephemeral. In a group chat, another participant can open that URL, complete login with their own NyxID account, and the callback path (state -> ExternalSubjectRef) will bind that Nyx subject to the original sender, causing cross-account binding hijack. This is a security issue in any non-private channel flow unless you require private delivery or add a second verifier tied to the initiating platform user before accepting the callback.
Useful? React with 👍 / 👎.
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## dev #477 +/- ##
==========================================
+ Coverage 71.36% 71.61% +0.25%
==========================================
Files 1219 1236 +17
Lines 88424 89573 +1149
Branches 11573 11713 +140
==========================================
+ Hits 63104 64151 +1047
- Misses 20770 20822 +52
- Partials 4550 4600 +50
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
After security analysis, reverse the earlier acceptance of a local encrypted-refresh-token stop-gap. aevatar will instead hold only an opaque binding_id and call NyxID's broker token endpoint each turn, keeping grain state free of secret material. Implementation is gated on ChronoAIProject/NyxID#549. Existing bot owner-shared mode remains in place until the dependency lands, with no regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eanzhao
left a comment
There was a problem hiding this comment.
方向 OK:broker 模式 vs. local refresh_token 的论证扎实。但有两个结构性问题在合并前必须解决,否则后续 implementation PR 会跟现有 channel runtime 撞车。
- Actor 重复:
ExternalIdentityBindingGAgent跟现有ChannelUserBindingGAgent(agents/Aevatar.GAgents.Channel.Runtime/UserBinding/ChannelUserBindingGAgent.cs:16)是同一业务实体(channel user 的凭据绑定)。CLAUDE.md 明令禁止按技术功能拆分同一实体,必须复用现有 actor 或在 ADR 里 explicit 反驳。 - Identity scope 跟现有冲突:新 key
(platform, tenant, external_user_id)不含 bot;现有 actor key 含 bot。需要 explicit 决定哪个对,并写明迁移路径。
其他:PKCE verifier 在 grain state 里的处理、client_credentials 在 ADR 文字里的 OAuth 词汇误用、用户从 NyxID 侧反向撤销的一致性、nyx_subject 缓存目的、accepted status 跟外部依赖之间的状态等,逐项 inline。
| @@ -0,0 +1,142 @@ | |||
| --- | |||
| title: "Per-User NyxID Binding via OAuth Broker" | |||
| status: accepted | |||
There was a problem hiding this comment.
status accepted 但 implementation gated on ChronoAIProject/NyxID#549(外部 issue 还没落地)。如果 NyxID#549 协议跟本 ADR 的假设有偏差,aevatar 接的 endpoint shape 也得改。
建议:
- 改成
proposed或accepted-pending-dependency(按本仓库现有 ADR 词汇),等 NyxID#549 contract 冻结再升accepted - 或者在 ADR 里贴 NyxID#549 的契约 freeze 节点(OpenAPI link / proto 文件 hash),明确"以哪个版本的契约为准"
否则这个 ADR 实际处于浮动状态。
|
|
||
| - 用标准 OAuth Authorization Code + PKCE 流程发起 binding(`/oauth/authorize` + `state` 承载 correlation_id + redirect_uri 浏览器跳转) | ||
| - **aevatar 不接收、不持有 user refresh_token**;binding 完成时 NyxID 返回不透明 `binding_id`,aevatar 仅持 `(external_subject_ref) → binding_id` 映射 | ||
| - 每次 turn 用 client_credentials 调 NyxID `POST /oauth/bindings/{binding_id}/token` 拿短期 access_token,塞进 `AgentToolRequestContext` |
There was a problem hiding this comment.
client_credentials 在 OAuth 2.0 标准里语义是 "client 以自己的身份(不代表任何 user)拿 token"。这里实际语义是 "client 凭 binding_id 代用户拿 user-scoped token",不是 client_credentials。
如果 NyxID#549 endpoint 想合规命名:
- 走 RFC 8693 Token Exchange:
grant_type=urn:ietf:params:oauth:grant-type:token-exchange,subject_token=<binding_id>,subject_token_type=urn:nyxid:params:oauth:token-type:binding-id,client 用 client_secret 鉴权(HTTP Basic /client_secret_post) - 或者明确声明这是 NyxID 自定义 broker token endpoint,body 里不出现
grant_type=client_credentials
ADR 文字按当前写法读起来像 "client_credentials = 该 endpoint 的 grant_type",跟 OAuth 标准语义冲突,会把 NyxID#549 contract 引向错误方向。建议改成 "RFC 8693 token exchange,subject_token = binding_id" 或 "NyxID 自定义 broker token endpoint,client 鉴权 = client_secret"。同样适用第 54 行。
| - 用标准 OAuth Authorization Code + PKCE 流程发起 binding(`/oauth/authorize` + `state` 承载 correlation_id + redirect_uri 浏览器跳转) | ||
| - **aevatar 不接收、不持有 user refresh_token**;binding 完成时 NyxID 返回不透明 `binding_id`,aevatar 仅持 `(external_subject_ref) → binding_id` 映射 | ||
| - 每次 turn 用 client_credentials 调 NyxID `POST /oauth/bindings/{binding_id}/token` 拿短期 access_token,塞进 `AgentToolRequestContext` | ||
| - 用户撤销同步 `DELETE /oauth/bindings/{binding_id}`,NyxID 是 source of truth |
There was a problem hiding this comment.
覆盖了 aevatar 主动撤销方向,反方向(用户在 NyxID UI 直接撤销,不经 aevatar)没交代:
- aevatar
(external_subject) → binding_id映射变成悬挂指针 - 下次
IssueShortLivedAsync调用会被 NyxID 拒绝(401? 410?invalid_grant?) - 该错误码该不该触发 aevatar 自动 prune binding actor state?
至少补一段 invalidation flow:
- 哪些 NyxID 错误码语义上等于 "binding 已撤销"
IssueShortLivedAsync拿到该错误码后行为(事件化撤销 binding actor / 让用户重新/init)- 是否要 NyxID 主动 webhook 通知(如果是,跟 Fix NyxID Lark relay authentication and keep relay tokens out of persisted state #366 callback JWT 怎么共用,不共用就要在 ADR 里讲清楚为什么不复用)
否则 "NyxID 是 source of truth" 是单向声明,不是双向一致性保证。
| /init | ||
| -> ChannelConversationTurnRunner 前置 slash-command 路由(不进 LLM) | ||
| -> aevatar 生成 state(=correlation_id) + PKCE pair, | ||
| 落 BindingChallengeIssuedEvent 到 ExternalIdentityBindingGAgent |
There was a problem hiding this comment.
PKCE code_verifier 是 secret material(OAuth 安全模型靠它对抗 code interception,RFC 7636)。BindingChallengeIssuedEvent 落到 actor state 意味着 verifier 进了 grain state / event store。
但 Storage Boundary 表(line 61-67)只列了 User refresh_token: ✗ never,没列 PKCE verifier。这跟 "aevatar grain state / projection / log / metric span attribute 不出现 secret material"(line 131)的笼统宣称在表面上有张力——verifier 是 short-lived 但仍是 secret material。
两种处理方式:
- 用 stateless signed cookie / state token 在浏览器侧 round-trip verifier(OAuth 库标准做法),grain state 里只留 challenge 元信息(correlation_id, expires_at, external_subject_ref),不留 verifier
- 保留落 actor state,但 ADR 把 Storage Boundary 表改成区分 long-lived vs. short-lived(<5min) secret material,明确 PKCE verifier 属于 short-lived 一类,并给出 challenge actor 的 GC 策略
倾向 1,跟 "binding_id = 唯一 opaque pointer,grain state zero secret material" 的叙事更一致。无论选哪个,arch test 守 grain state 的扫描规则需要一起定义。
|
|
||
| | 数据 | aevatar grain state | NyxID | | ||
| |---|---|---| | ||
| | `(platform, tenant, external_user_id) → binding_id` | ✓ | | |
There was a problem hiding this comment.
Identity scope 跟现有 actor 冲突
这里 key 是 (platform, tenant, external_user_id),不含 bot_instance_id。
但现有 ChannelUserBindingGAgent(agents/Aevatar.GAgents.Channel.Runtime/UserBinding/ChannelUserBindingGAgent.cs:12 的 docstring 明确写)keying 是 {bot_instance_id}:{channel}:{sender_canonical_id},含 bot,state 也存 bot / channel / sender_canonical_id / credential_ref。
两个识别尺度直接打架,ADR 没意识到。必须 explicit 决定:
A. per-(platform, user):一个 NyxID identity 跟一个 Lark user 一一对应,跟用户在哪个 bot 里讲话无关。语义最干净(NyxID identity 不属于某个 bot),但要废弃现有 per-bot key,迁移现有数据。
B. per-(platform, user, bot):保留现有 keying,binding 是 (bot, lark user) 维度。语义偏弱(同一 lark user 在 botA / botB 可绑不同 NyxID 账号?产品上不合理),但兼容现状。
倾向 A(NyxID 视角下一个 lark user 就是一个 user,跟 bot 无关;现有 actor 的 bot 字段更像是早期 RFC §5.2b 拆分时的 over-fit)。但这个决定影响 actor 模型、迁移路径、群聊 UX,必须在 ADR 里 explicit 选定,不能新写一个 keying 跟现有冲突还不解释。
| | 数据 | aevatar grain state | NyxID | | ||
| |---|---|---| | ||
| | `(platform, tenant, external_user_id) → binding_id` | ✓ | | | ||
| | `nyx_subject`(opaque `sub` claim) | ✓ 缓存以加速 resolve | ✓ source of truth | |
There was a problem hiding this comment.
nyx_subject 标 "✓ 缓存以加速 resolve" —— 加速什么 resolve 没说。
如果只是为了 IssueShortLivedAsync,链路其实是 external_subject → binding_id → NyxID 调用,nyx_subject 不在调用链上;如果是给 ConversationGAgent scope 用、或者给 audit log 用,应该写明用途。
如果没有清晰用途,建议直接不缓存(少一份要保持一致的状态,binding_id 已是 opaque 唯一指针)。如果有用途,写在表里。
| NyxID #549 之前可平行落地的部分(本身不依赖 broker endpoint): | ||
|
|
||
| - `INyxIdCapabilityBroker` 接口 + proto | ||
| - `ExternalIdentityBindingGAgent` + projection + `IExternalIdentityBindingQueryPort`(state 仅存 `binding_id` + `nyx_subject`,均为 opaque) |
There was a problem hiding this comment.
Actor 重复,违反 CLAUDE.md "Actor 即业务实体"
现有 agents/Aevatar.GAgents.Channel.Runtime/UserBinding/ChannelUserBindingGAgent.cs:16 已经是同一个业务实体(per-bot-per-user 的 channel user 凭据绑定):
- keying
{bot_instance_id}:{channel}:{sender_canonical_id} - state:
bot,channel,sender_canonical_id,credential_ref,preferences,created_at,updated_at - commands/events:
BindUserCredentialCommand/UserCredentialBoundEvent/UnbindUserCredentialCommand等
新引入 ExternalIdentityBindingGAgent 把"同一 channel user 的 NyxID binding"放到第二个 actor(甚至第二个模块 Aevatar.GAgents.Channel.Identity —— 见 line 126)里,违反 CLAUDE.md:
Actor 即业务实体:一个 actor = 一个业务实体(数据与方法同住);禁止按技术功能(读/写/投影)拆分同一业务实体为多个 actor。
binding_id 在语义上就是一种 credential_ref(不透明的 user 凭据指针),不需要再起一个 actor。建议:
- 复用
ChannelUserBindingGAgent,把credential_ref解释/重命名为binding_id,或者新增 typed sub-messagenyx_binding { binding_id, nyx_subject }进现有 state BindingChallengeIssuedEvent/ExternalIdentityBoundEvent接到同一 actor 的事件流- ADR 应明确"扩展现有 ChannelUserBindingGAgent"而不是"新增并列 actor + 并列模块"
如果坚持新 actor,必须在 ADR 里 explicit 论证为什么二者不是同一业务实体(比如"binding 跟 bot 解耦,不是同一 keying"——但那触发上面 line 63 的 identity scope 问题,得一起讲清楚)。
Pick the explicit-justification branch for the ExternalIdentityBindingGAgent vs existing ChannelUserBindingGAgent overlap: per-(platform, user) NyxID identity and per-(bot, channel, sender) bot-scoped user state are different business entities at different scopes, so two actors are kept and the existing credential_ref field is marked redundant for migration by the implementation PR. Restate per-turn token issuance as RFC 8693 token-exchange with subject_token=binding_id (was: client_credentials, semantically wrong for acting on behalf of a bound user). Move PKCE code_verifier out of BindingChallengeIssuedEvent into a stateless HMAC-signed state token so nothing secret-material-shaped enters grain state. Add the reverse-direction revoke flow (NyxID-side revoke surfaces as invalid_grant on next token-exchange; aevatar fires unbind event and re-prompts /init). Mandate private-channel delivery for the /init OAuth URL to defeat the state-hijack attack flagged in PR review (group recipient could otherwise complete login with their own NyxID account and bind sender to a foreign identity). Drop unused nyx_subject cache from grain state. Separate aevatar service-level secrets (state HMAC key, client_secret) as infrastructure secret outside the #375 user-secret invariant. status: accepted -> proposed until NyxID#549 contract is frozen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-0017 ReviewOverall direction is solid — broker model with zero long-lived secret material in aevatar grain state is the correct call. The threat model comparison table is well-argued. Found several gaps that should be addressed before accepting: 1. 文件名与标题术语不一致文件名 2. ExternalIdentityBindingGAgent 缺 grain identity 方案ADR 引入了 按 AGENTS.md 的要求:
3. nyx_subject 缓存无失效策略Storage boundary 表说 aevatar 缓存
建议至少标注为 known gap,或定义一个 TTL / refresh-on-401 策略。 4. Bot owner 模式共存逻辑缺失ADR 多次说"现有 bot owner-shared 模式继续运行(不 regression)",但没有描述共存逻辑:
这直接影响 5. 初始 access_token 处理未说明Callback 流程中 aevatar 用 code 换到
虽然只是初始流程的小细节,但涉及与 NyxID #549 的契约设计(是否需要在 broker_binding scope 下返回 access_token),建议明确。 6. externalSubject 参数应强类型
7. 重复 /init 的幂等语义未定义已绑定用户再次
这影响 8. scope 命名与 NyxID #549 不一致ADR 流程图写 9. Projection readiness / query-time priming
建议在 Dependencies 的"平行落地"部分补充 projection readiness 的保证机制(如 callback handler 同步等 projection、或 turn 路径容忍短暂的 resolve miss 并回落)。 小结ADR 的核心决策(broker > local、zero secret material、gated on NyxID #549)是正确的。上述问题大多是实现前应该补齐的细节,其中 #2 (grain identity)、#4 (共存逻辑)、#7 (幂等语义) 对后续 implementation PR 影响最大,建议优先明确。 |
eanzhao
left a comment
There was a problem hiding this comment.
A few observations beyond eanzhao's review:
1. Local commit 056f05c5 not pushed
git log --oneline -3 shows 056f05c5 Address ADR-0017 review feedback sitting locally, containing fixes for the inline comments (status→proposed, client_credentials→token exchange, PKCE stateless state_token, nyx_subject cache removal, revocation flow). Push before merging.
2. Identity scope + actor split still has tension
The ADR argues ExternalIdentityBindingGAgent (key=platform/tenant/external_user_id) and ChannelUserBindingGAgent (key=bot/channel/sender) are "different scope entities". But binding_id semantically IS a credential pointer — same category as credential_ref, which the ADR itself plans to deprecate because the new actor subsumes it. Alternative: extend ChannelUserBindingState with a nyx_binding sub-message. The ADR should at minimum add a dedicated section explaining why "two actors" > "one extended actor" and give a concrete deprecation/migration window.
3. AuthContext.user_credential_ref not addressed
channel_contracts.proto AuthContext.field(2) carries user_credential_ref for outbound "send on behalf of user" semantics. How does this interact with the broker cap_handle? Does the outbound adapter call IssueShortLivedAsync?
4. status accepted with gated dependency
Status is accepted while implementation is gated on NyxID#549 (unresolved external dependency). Should be proposed (as already fixed locally) or NyxID#549 OpenAPI contract should be pinned.
- Rename file to oauth-broker.md to match the title. - Spell out actor architecture in a dedicated section: justify ExternalIdentityBindingGAgent (platform-level identity entity) versus the existing per-bot ChannelUserBindingGAgent (per-bot preferences), and pin a deprecation window for credential_ref. - Add an Outbound Send section covering AuthContext.user_credential_ref semantics under broker mode and the proto deprecation path. - Decision section: forbid bot-owner fallback for unbound senders in both 1:1 and group chats, define /init idempotency, describe initial access_token disposal in the callback handler. - Strong-type INyxIdCapabilityBroker parameters as ExternalSubjectRef. - Add Projection Readiness subsection: write-side projection preflight in the callback handler; turn path must not query-time prime. - Align scope name to urn:nyxid:scope:broker_binding so it tracks the NyxID#549 contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-0017 Review Resolution Summary逐条 map 到落地的 commit / 章节,方便 review pass 时勾掉。 Round 1 — commit
|
| # | 落点 |
|---|---|
| 1 文件名 | rename 完成 |
| 2 grain identity scheme | §Actor Architecture key 列明确 (platform, tenant, external_user_id) = ExternalSubjectRef |
| 3 nyx_subject 缓存 | Storage Boundary 标 ✗,不缓存 |
| 4 bot owner 共存 | Decision bullet:未绑定 sender 1:1 与群聊一律强制 /init,bot owner 不享默认身份特权;Consequences 同步说明 |
| 5 初始 access_token | Decision 流程图 + Storage Boundary 表:callback handler 内一次性使用(可选 /oauth/userinfo 调用做展示文案),handler 退出前丢弃,永不持久化 |
| 6 externalSubject typed | §INyxIdCapabilityBroker 接口签名全部改 ExternalSubjectRef(proto-typed value object) |
7 /init 幂等 |
Decision bullet:已绑定 sender 重复 /init 回复"先 /unbind 再 /init 切账号",不发新 OAuth |
| 8 scope 命名 | 流程图 / Decision / Dependencies 全部改 urn:nyxid:scope:broker_binding;契约 token 名 pin 到 NyxID#549 freeze |
| 9 projection readiness | 新增 §Dependencies → Projection Readiness:callback handler 写侧预挂接 projection 同步等水位,turn 路径禁 ES replay / state mirror / priming |
如果哪条论证不满意请直接戳,可以再迭代;否则可以把对应 conversation 标 resolved。
ADR-0017 v2 Review(修复确认 + 新发现)上一轮 9 个问题全部已修复,质量提升明显。逐条确认 + 新发现如下: 原始 9 点 — 全部 resolved ✅
新发现A. Token Exchange 方案与 NyxID #549 提案存在显著分歧ADR 现在用 RFC 8693 但 NyxID #549 当前提案是 这是不同的端点 + 不同的 grant_type + 不同的鉴权方式。NyxID 侧的 虽然 ADR 的方案更标准(复用现有 RFC 8693 框架而非另起新端点),但这是与 #549 的重大偏差。建议:
B.
|
eanzhao
left a comment
There was a problem hiding this comment.
Round 2 把上一轮 7+1 全补到了,还顺手抓到了我漏的几个点(AuthContext.user_credential_ref、/init 幂等、Projection Readiness 写侧预挂接、强类型 ExternalSubjectRef、scope URN 化)。新引入的 §Outbound Send / §Actor Architecture / §Projection Readiness 三段把架构说得很清楚。
但 §Outbound Send 撞到 CLAUDE.md 一条红线(自定义字符串格式 + compat shim),另两处再调一下:
- 必须改:
user_credential_ref重载承载序列化ExternalSubjectRef(line 121)—— 同时违反 "统一 Protobuf,禁止自定义字符串格式" 和 "删除优先,不留 compat shim" - 必须改:bot-owner 模式硬切(line 195)——
credential_ref字段有 deprecation window,但 bot-owner 用户体验本身没有;切上线那一刻全员 sender 断流 - 建议改:callback 阶段调
/oauth/userinfo拿 sub claim(line 61)——sub在同一兑换返回的id_token里就有,多余的 round-trip
小问题(不阻断):projection 写侧等待 timeout=3s 偏紧 + UX 文案易被误读为绑定失败;state-token HMAC key rotation 需要 kid;ExternalSubjectRef 应该住在 Channel.Abstractions 而非 Channel.Identity,否则会形成反向依赖。
| (grant_type=authorization_code, code, code_verifier, client_secret) | ||
| -> { access_token, binding_id } (broker_binding scope 下不返 refresh_token) | ||
| 初始 access_token 处理:可选地一次性用于调 /oauth/userinfo 拿 sub claim | ||
| 做"已绑定 <name>"展示文案;**永不持久化、永不复用** |
There was a problem hiding this comment.
sub claim 在同一次 token 兑换返回的 id_token 里就有(OIDC 标准),不需要再调 /oauth/userinfo 一跳。只有展示 name/email 才需要 userinfo。
建议改成:
可选地从同一兑换返回的
id_token解码sub/nameclaim 做"已绑定 "展示;不调 userinfo,不持久化 token
少一次 NyxID round-trip,且语义跟 OIDC 标准对齐。
|
|
||
| `IChannelOutboundPort.ContinueConversationAsync(... AuthContext auth ...)` 在 `OnBehalfOfUser` 模式下用 `AuthContext.user_credential_ref`(`agents/Aevatar.GAgents.Channel.Abstractions/protos/channel_contracts.proto:138`)选择代用户身份。Broker 模式下: | ||
|
|
||
| - 在线协议保留 `user_credential_ref string = 2`,语义改为承载序列化的 `ExternalSubjectRef`(form: `lark:tenant_X:user_Y`),由 outbound adapter 反序列化 |
There was a problem hiding this comment.
双重违反 CLAUDE.md
在线协议保留 user_credential_ref string = 2,语义改为承载序列化的 ExternalSubjectRef
(form: `lark:tenant_X:user_Y`)
两个独立红线:
- 统一 Protobuf:CLAUDE.md "序列化" 段写明 "State、领域事件、命令、回调载荷、快照、缓存载荷、跨 Actor/跨节点内部传输对象全部使用 Protobuf;禁止 JSON/XML/自定义字符串格式"。
lark:tenant_X:user_Y这种带分隔符 + 含解析约定的字符串就是自定义字符串格式(外部协议除外,但AuthContext是仓库内部跨层契约)。 - 删除优先:CLAUDE.md "删除优先:空转发、重复抽象、无业务价值代码直接删除,不保留兼容空壳" + "Don't use feature flags or backwards-compatibility shims when you can just change the code"。把旧 string field 重载承载新语义是典型 compat shim。
且 line 123 已经写了"长期演进:加 typed ExternalSubjectRef external_subject = 4"——意思是先重载、后加 typed。但 broker 路径整体 gated on NyxID#549,根本不存在必须靠 string 重载兜过的过渡期,可以一次性加 typed external_subject = 4,broker outbound 只读 typed 字段,旧 user_credential_ref 留给非 broker 代码继续使用按既有语义自然 deprecate。中间不需要重载期。
改法:删掉 line 121 这一段;line 123 的 typed 字段直接作为 broker 模式的唯一 outbound 身份字段。
| - 新增 OAuth callback endpoint `/api/oauth/nyxid-callback`(标准 OAuth client redirect 处理,不是 webhook),含写侧预挂接 projection 等待 | ||
| - `ChannelConversationTurnRunner.RunInboundAsync` 开头加 slash-command 前置路由(`/init`、`/unbind`),`/init` 幂等 | ||
| - `BuildReplyMetadata` 改成 `ResolveAsync` + `IssueShortLivedAsync`;metadata key 从 `nyxid.access_token` 改为 `nyxid.capability_handle`(诚实表达"短期、scoped、可撤销") | ||
| - 未绑定 sender(无论 1:1 还是群聊)统一强制 `/init`,不回落 bot owner;现有 bot owner-shared 模式在 implementation PR 切上线那一刻终止,bot owner 失去"默认用户身份"特权 |
There was a problem hiding this comment.
bot-owner 模式硬切,没有迁移计划
现有 bot owner-shared 模式在 implementation PR 切上线那一刻终止,bot owner 失去"默认用户身份"特权
§Actor Architecture 给了 credential_ref 字段 deprecation window("broker 模式上线 + 一个有 channel-runtime proto break 的 release 后"),但没给 bot-owner 用户体验本身。切上线那一刻所有现有 Lark sender 在 bot 里都拿不到响应,必须各自 DM 自己 /init 后才恢复,是产品断崖。
至少要在 ADR 里 explicit 一句迁移策略。可选方案(产品决策,ADR 只锁框架):
- A. 双轨期:新 bot 默认 broker,旧 bot 保持 owner-shared;新增 bot 设置开关,bot owner 自己挑迁移时机
- B. 单轨硬切 + 通知期:broker 上线前 N 天起,未绑定 sender 收到的 reply 加引导话术("X 月 X 日起需
/init"),到期硬切 - C. 单轨硬切(当前文本):明确接受首日所有 sender 一次性走
/init,并写明 comm 责任在 bot owner
ADR 不需要选哪个,但要 explicit 说明这是产品决策点 + 留 placeholder,否则 implementation PR 会被迫现做。
| @@ -0,0 +1,142 @@ | |||
| --- | |||
There was a problem hiding this comment.
§Projection Readiness 写侧预挂接 projection 机制未指定。在当前 turn 内同步等 projection 追上 actor committed version 需显式端口(如 IProjectionReadinessPort)。建议在 Consequences 列为新增接口。
| ``` | ||
|
|
||
| ## Storage Boundary | ||
|
|
There was a problem hiding this comment.
初始 access_token "可选地" 调 /oauth/userinfo 获取展示名——"可选" 造成 UX 路径不一致:有些 binding 完成页显示 display name,有些不显示。建议 ADR 明确二选一(始终调 or 完全不调)。
| NyxID #549 之前可平行落地的部分(本身不依赖 broker endpoint): | ||
|
|
||
| - `INyxIdCapabilityBroker` 接口 + proto | ||
| - `ExternalIdentityBindingGAgent` + projection + `IExternalIdentityBindingQueryPort`(state 仅存 `binding_id` + `nyx_subject`,均为 opaque) |
There was a problem hiding this comment.
弃用窗口间 credential_ref(旧) 与 binding_id(新) 可能共存。"不读不写" 应补充 fallback 顺序:查询优先 binding_id,未命中再 fallback credential_ref。否则 migration 中间态行为未定义。
- Decide id_token-decoded sub for the bound-display string instead of
hitting /oauth/userinfo: the OIDC token response already carries sub,
so the extra round-trip is unnecessary. Initial access_token is
unconditionally discarded by the callback handler.
- Define /unbind: slash-command -> RevokeBindingAsync ->
DELETE /oauth/bindings/{binding_id} -> revoked event. NyxID is the
source of truth, so a NyxID failure aborts the local revoke instead
of risking source-of-truth divergence.
- Drop the string overload of AuthContext.user_credential_ref; broker
outbound reads only the new typed external_subject field. Justified
by CLAUDE.md "unified protobuf, no compat shim".
- Add credential_ref deprecation-window rule: turn path reads only
binding_id, never falls back to credential_ref.
- Spell out the bot-owner shutdown options (A/B/C) and require the
implementation PR to pick one for the release runbook.
- Note IProjectionReadinessPort as a write-side port and clarify the
query-side boundary in §Projection Readiness; turn path remains
forbidden from priming.
- Add §Divergence from NyxID#549: pick RFC 8693 token-exchange over
the dedicated client_credentials endpoint in the original sketch
and link to the alignment comment to be posted on NyxID#549.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-0017 v2 Review Resolution SummaryRound 3 — commit
NyxID 同步:ChronoAIProject/NyxID#549 已贴 alignment comment 提议契约对齐 RFC 8693 token-exchange 与 如果哪条仍不满意请直接戳。 |
ADR-0017 v3 Reviewv2 的 A/B/C 三个问题全部已修复:
v3 新增内容检查通过:
LGTM — 没有阻塞项。 ADR 质量足够支撑后续 implementation PR。 |
ADR-0017 Final Review前几轮 9+3 问题全部 resolved,核心设计无阻塞项。以下是遗留的实现细节问题,不阻断 ADR 合并但建议在 implementation PR 前补齐: 1. HMAC state_token 设计细节缺失
2. 并发
|
| 错误 | HTTP 响应 | 用户可见文案 |
|---|---|---|
| state_token 过期/篡改 | 400 | "链接已过期,请重新 /init" |
| token_exchange 失败 | 502 | "NyxID 绑定失败,请重试" |
| projection 超时 | 200 (异步) | "绑定已写入,稍后重发消息" |
特别是 state_token 过期(exp ≤5min,网络延迟 + 用户犹豫就可能超时)需要有明确的 retry UX,而不是一个 generic 500。
4. 外发 IssueShortLivedAsync 无容错
§Outbound Send 定义"每次发送前调 IssueShortLivedAsync,不缓存"。但没有说明:
- NyxID 短暂不可用时(5xx / timeout):retry?backoff?fail silently?还是整个 outbound 失败?
- 并发 outbound(同用户多条消息并行发送):多次 token-exchange 是否被 NyxID rate limit?是否需要 per-binding 本地 semaphore?
建议至少补一句"NyxID 不可用时 outbound fail with error,不 fallback 到 bot owner token",避免实现者自行加 fallback 破坏 zero-secret 不变量。
5. /unbind 后立即 /init 的时序
用户 /unbind → DELETE /oauth/bindings/{binding_id} → NyxID 成功 → 本地 ExternalIdentityBindingRevokedEvent → projection 物化。如果用户在 projection 物化前立即 /init:
/unbind -> DELETE 成功 -> ResolveAsync 仍返回旧 binding_id (projection 未更新)
/init -> ResolveAsync hit -> 返回"已绑定,请先 /unbind"(误判)
建议 /unbind handler 同步等 projection 水位(复用 IProjectionReadinessPort),或 /init 的幂等检查读 actor 直接态而非 projection。
总结
以上 5 点都是实现层面的边界条件,不阻断 ADR 合并。建议在 Consequences 或 Dependencies 里加一段 "Implementation Notes" 把这些记下来,避免后续 PR 遗漏。
Add §Implementation Notes (5 subsections) so the implementation PR inherits the boundaries instead of relitigating them: - HMAC state_token: Protobuf payload, deterministic field order, kid + key rotation grace > exp. - /init concurrency: actor commit is idempotent; aevatar does not require a unique constraint on the NyxID side, so NyxID is free to reap orphans on its own schedule. - Callback handler error UX: classified table with HTTP code and user-facing copy. - IssueShortLivedAsync failures: outbound and turn paths fail loudly with no fallback; rate limiting authority lives in NyxID. - /unbind -> /init timing: the unbind handler waits on the projection (write-side completion path) so the next /init does not see a stale "already bound" reply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-0017 Final Review ResolutionRound 4 — commit
一些选择理由
如还有边界细节遗漏请戳;否则可以勾掉 final review。 |
Land the parts of ADR-0017 that do not depend on the NyxID broker endpoint (gated on ChronoAIProject/NyxID#549) so the capability seam and actor contract stop floating in the air: - Two new csprojs: Aevatar.GAgents.Channel.Identity.Abstractions (proto value objects + interfaces) and Aevatar.GAgents.Channel.Identity (actor + DI module). - Proto: ExternalSubjectRef, BindingId, CapabilityScope, CapabilityHandle, BindingChallenge in Abstractions; ExternalIdentityBindingState, CommitBindingCommand, RevokeBindingCommand, ExternalIdentityBoundEvent, ExternalIdentityBindingRevokedEvent in Identity. State holds only the opaque binding_id — never refresh_token. - ExternalIdentityBindingGAgent: single-threaded actor that commits bindings idempotently (ADR §Implementation Notes #2) and revokes as a no-op when no active binding exists. - INyxIdCapabilityBroker, IExternalIdentityBindingQueryPort, IProjectionReadinessPort, BindingRevokedException — interfaces sized to the ADR's broker contract; a concrete NyxIdRemoteCapabilityBroker lands in a follow-up PR. - InMemoryCapabilityBroker test fake plus xUnit coverage of the actor (idempotent commit, rebind-after-revoke) and the broker (revoke, invalid_grant signalling). Builds clean across the solution; the new Identity-tagged tests pass. DI module is a stub for now — the production wiring of the broker, projection, and OAuth callback endpoint comes in subsequent PRs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| } | ||
|
|
||
| if (!string.IsNullOrEmpty(State.BindingId)) | ||
| { |
There was a problem hiding this comment.
[Consensus: 2 models] severity=major, category=design
Both HandleCommitBinding and HandleRevokeBinding accept cmd.ExternalSubject and apply it directly without verifying it matches the actor's own identity. Because actor IDs are deterministic from ExternalSubjectRef.ToActorId(), a routing/dispatch bug that misroutes a command would silently persist a binding under a mismatched key (State.ExternalSubject is overwritten by ApplyBound from the event payload). Add a fail-fast guard at the top of each handler that compares cmd.ExternalSubject.ToActorId() against this.GetPrimaryKeyString() and rejects mismatches. Two of three reviewers also flagged that ApplyBound should not overwrite ExternalSubject since it's an actor-identity invariant.
Per-model verbatim
- glm-5.1 (L58):
HandleCommitBindingandHandleRevokeBindingaccept anycmd.ExternalSubjectwithout verifying it matches the actor's own identity. If a routing/dispatch bug sends aCommitBindingCommandto the wrong actor, the binding would be persisted under a mismatched state (State.ExternalSubjectoverwritten byApplyBoundto the command's subject rather than the actor's key).
Suggested fix — resolve the actor's own identity at activation and assert equivalence:
// In ActivateAsync or via a property set by the runtime:
private ExternalSubjectRef _ownSubject; // derived from actor ID
// At the top of HandleCommitBinding / HandleRevokeBinding:
if (!cmd.ExternalSubject.Equals(_ownSubject))
{
Logger.LogWarning("Command rejected: subject mismatch...");
return;
}This also means the ApplyBound transition should not overwrite ExternalSubject from the event — it should only set binding_id and bound_at, since the subject is an invariant of the actor's identity.
- kimi-k2p6 (L92):
HandleRevokeBindingaccepts anExternalSubjectparameter on the command but ignores any potential mismatch with the actor's own identity. Since actor IDs are deterministic perExternalSubjectRef, a command arriving for a different subject is architecturally impossible under normal dispatch; however, adding an explicit guard (if (cmd.ExternalSubject.ToActorId() != this.GetPrimaryKeyString())) would make the invariant self-documenting and fail-fast if routing ever breaks.
| /// </summary> | ||
| internal sealed class InMemoryCapabilityBroker : INyxIdCapabilityBroker | ||
| { | ||
| private readonly ConcurrentDictionary<string, BindingId> _bindings = new(StringComparer.Ordinal); |
There was a problem hiding this comment.
[Consensus: 2 models] severity=minor, category=bug
_revokedBindings is a plain HashSet<string> while the adjacent _bindings field uses ConcurrentDictionary. RevokeBindingAsync writes to _revokedBindings and IssueShortLivedAsync reads from it — concurrent test scenarios could corrupt the set or produce stale reads. Replace with ConcurrentDictionary<string, byte> (using TryAdd / ContainsKey) so both fields share the same thread-safety model.
Per-model verbatim
- deepseek-v4-pro (L22):
_revokedBindingsis a non-thread-safeHashSet<string>mutated insideRevokeBindingAsyncand read insideIssueShortLivedAsync, yet the class usesConcurrentDictionaryin adjacent fields, implying potential concurrent access. Even though this is a test fake, concurrent test scenarios could corrupt the set. Fix: wrap withlockor useConcurrentDictionary<string, byte>for the revoked set. - glm-5.1 (L14):
_revokedBindingsis a plainHashSet<string>while_bindingsusesConcurrentDictionary.RevokeBindingAsyncwrites to_revokedBindingsandIssueShortLivedAsyncreads from it — these can be called concurrently in integration tests. Replace withConcurrentDictionary<string, byte>(or wrap accesses in a lock) to match the thread-safety model of_bindings:
private readonly ConcurrentDictionary<string, byte> _revokedBindings = new();
// .Add → .TryAdd, .Contains → .ContainsKey| @@ -0,0 +1,49 @@ | |||
| namespace Aevatar.GAgents.Channel.Identity.Abstractions; | |||
There was a problem hiding this comment.
[deepseek-v4-pro] severity=major, category=test
No tests cover EnsureValid or ToActorId. These methods gate actor-ID construction and reject invalid input (empty fields, colon characters). Missing coverage here risks silent regressions when field-validation rules change. Add unit tests for: valid subject passes, null/empty platform rejects, null/empty external_user_id rejects, colon-in-field rejects, and ToActorId produces the expected colon-joined format.
|
|
||
| _agent = new ExternalIdentityBindingGAgent | ||
| { | ||
| Services = _serviceProvider, |
There was a problem hiding this comment.
[kimi-k2p6] severity=major, category=test
The IEventSourcingBehaviorFactory<> and InMemoryEventStore registrations are hand-rolled test wiring that may drift from production runtime behavior. Since this is scaffolding, consider adding a follow-up task to exercise the agent through the Orleans test cluster / in-memory silo rather than direct instantiation, so that lifecycle (activation, rehydration, deactivation) and dispatch semantics are covered.
|
|
||
| public Task<CapabilityHandle> IssueShortLivedAsync( | ||
| ExternalSubjectRef externalSubject, | ||
| CapabilityScope scope, |
There was a problem hiding this comment.
[kimi-k2p6] severity=minor, category=design
IssueShortLivedAsync throws BindingRevokedException when there is no active binding (!_bindings.TryGetValue). The exception name implies NyxID revoked an existing binding, but the message says "No active binding". Per ADR-0017, BindingRevokedException should only be thrown for invalid_grant on an existing binding. Consider introducing a separate exception (e.g. BindingNotFoundException) for the missing-binding case to avoid conflating "never bound" with "revoked" semantics.
| /// (concurrent /init protection — see ADR-0017 §Implementation Notes #2). | ||
| /// The orphan binding on the NyxID side is left for NyxID's own reaper. | ||
| /// </summary> | ||
| [EventHandler] |
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=nit, category=design
HandleCommitBinding validates cmd.BindingId for whitespace but uses string.IsNullOrWhiteSpace while HandleRevokeBinding uses string.IsNullOrEmpty for State.BindingId. Minor inconsistency — consider aligning both to string.IsNullOrEmpty since protobuf deserialization never produces whitespace-only strings for string fields.
|
|
||
| // Per-(platform, tenant, external_user_id) actor state holding the opaque | ||
| // NyxID-side binding pointer. Holds NO refresh_token or any user secret — | ||
| // see ADR-0017 §Storage Boundary. binding_id is empty when no active binding; |
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=nit, category=design
ExternalIdentityBindingState.external_subject is stored redundantly alongside the actor's inherent key (ExternalSubjectRef). Since the actor ID is deterministic from ExternalSubjectRef (via ToActorId()), the state field duplicates the key. This is fine for read-model convenience (projection doesn't need to parse the actor ID), but worth noting in a code comment that it exists for projection ergonomics, not as a separate fact.
| @@ -0,0 +1,286 @@ | |||
| --- | |||
There was a problem hiding this comment.
[kimi-k2p6] severity=nit, category=style
ADR frontmatter status: proposed is correct per ADR text, but the PR description says accepted pending NyxID#549 contract freeze. Ensure frontmatter and PR description are consistent before merge to avoid confusion in tools/docs/lint.sh or downstream doc parsers.
NyxID#549 merged 2026-04-28 (PR ChronoAIProject/NyxID#555) — broker contract is frozen. ADR status flips proposed -> accepted; the broker seam in code splits cleanly along the contract. Major: - ExternalIdentityBindingGAgent: defensive subject-mismatch guard at the top of HandleCommitBinding / HandleRevokeBinding so a routing bug cannot silently persist a binding under the wrong actor key. ApplyBound no longer overwrites State.ExternalSubject on subsequent events — the field is an actor-identity invariant set once on first bind. - INyxIdCapabilityBroker: drop ResolveBindingAsync. Reads now go strictly through IExternalIdentityBindingQueryPort so write-side and read-side seams stay separated. ADR §INyxIdCapabilityBroker updated to match. - New BindingNotFoundException for the "never bound" case so callers can distinguish it from BindingRevokedException ("NyxID-side invalid_grant on a previously-bound subject"). InMemoryCapabilityBroker switched. - InMemoryCapabilityBroker: _revokedBindings -> ConcurrentDictionary so it shares the thread-safety model of _bindings; the fake now also implements IExternalIdentityBindingQueryPort over the same dictionary so test wiring stays tight. - ExternalSubjectRefExtensionsTests: cover EnsureValid + ToActorId (valid subject, empty platform / external_user_id, colon-in-field, null), so the actor-id format and required-field invariants are pinned by tests, not just by review. Minor: - HandleCommitBinding now uses string.IsNullOrEmpty consistently with HandleRevokeBinding — protobuf fields never produce whitespace-only strings, so the IsNullOrWhiteSpace inconsistency was a footgun. - Idempotency check carries a comment explaining the guarantee comes from cluster event-store OCC plus single-actor turn ordering, not from the in-handler check alone. - Proto comment on ExternalIdentityBindingState.external_subject notes the field exists for projection ergonomics and that ApplyBound never overwrites it. - ExternalIdentityBindingGAgentTests: cover null-subject paths for both HandleCommitBinding and HandleRevokeBinding. - InMemoryCapabilityBrokerTests: read paths exercise the QueryPort seam explicitly (no more broker.ResolveBindingAsync). Tests: 29 Identity-tagged tests pass (was 18 — 11 new from extensions coverage and null-subject paths). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-model Review Resolution + ADR
|
| Source | 内容 | 落点 |
|---|---|---|
| L58 (consensus 2 models) | Subject-mismatch guard + ApplyBound 不应覆盖 ExternalSubject(actor identity 不变量) | IsCommandSubjectMatchingActor() 在两个 handler 入口处校验,Id 非空时(生产)拒绝不匹配命令;ApplyBound 改 ExternalSubject ??= evt.ExternalSubject 只首次设置 |
| L14 (consensus 2 models) | _revokedBindings 线程安全 |
ConcurrentDictionary<string, byte> 替代 HashSet<string>,跟 _bindings 一致 |
| L76 | BindingRevokedException for unbound 语义混淆 |
新增 BindingNotFoundException(没绑过)vs BindingRevokedException(NyxID 端 revoke);InMemoryCapabilityBroker.IssueShortLivedAsync 区分 throw |
| L1 (abstractions) | EnsureValid / ToActorId 无测试 |
新增 ExternalSubjectRefExtensionsTests(7 cases:valid / empty platform / empty user / 三个字段各自含 colon / null) |
Minor(全部修)
| Source | 内容 | 落点 |
|---|---|---|
| L29 | ResolveBindingAsync 在 broker 上混了 read/write 边界 |
移除 INyxIdCapabilityBroker.ResolveBindingAsync;读路径只走 IExternalIdentityBindingQueryPort.ResolveAsync;InMemoryCapabilityBroker 同时实现两个接口共享字典;ADR §INyxIdCapabilityBroker 同步重写 |
| L97 | 缺 null ExternalSubject path 的 test | HandleCommitBinding_IgnoresNullExternalSubject + HandleRevokeBinding_IgnoresNullExternalSubject |
| L36 | IsNullOrWhiteSpace vs IsNullOrEmpty 不一致 |
统一用 IsNullOrEmpty(protobuf 字段不会产生 whitespace-only 字符串) |
| L56 | 幂等 check 应说明依赖 event-store OCC | XML doc 加段说明:in-handler check 是 fast path,OCC at append-time 处理 racing past State load |
| L12 | external_subject 字段冗余说明 |
proto 注释加:存在为 projection ergonomics,ApplyBound 永不覆盖(L58 invariant) |
| L98 | Reason ?? string.Empty 丢失 unspecified 语义 |
保持 empty string + 日志读取时用 "unspecified" 兜底,proto 字段语义不变(empty=unspecified is a frequent OAuth/JWT convention) |
| L27 | OrCurrent() 静默吞未匹配 event |
跟 ChannelBotRegistrationGAgent 等现有 actor 一致,未改;dev-build NotSupportedException 是平台级 cross-cutting,跨 actor 统一处理更合适,本 PR 不动 |
| L1 frontmatter | status 一致性 | NyxID#549 merge 后 ADR 升 accepted,跟 PR 描述对齐 |
| L35 (test wiring drift) | hand-rolled test wiring 可能与生产 runtime 偏差 | follow-up,加 Orleans test cluster / in-memory silo 覆盖。本 PR 不动 |
NyxID#549 解锁 + ADR accepted
NyxID#549 merge 后 contract freeze:urn:nyxid:scope:broker_binding / urn:nyxid:params:oauth:token-type:binding-id / POST /oauth/token (auth_code) 在 broker scope 下返 binding_id / RFC 8693 token-exchange / DELETE /oauth/bindings/{binding_id} / oauth_broker_binding.revoked HMAC webhook + V2 加固(DPoP/mTLS/PAR)。
ADR Dependencies 段重写,显式列出 NyxID#549 已 freeze 的契约 + aevatar 后续可独立落地的工作项(broker remote impl / projection / OAuth callback endpoint / slash-command routing / AuthContext.external_subject 字段 / CAE webhook 接收)。
Tests
dotnet test --filter "FullyQualifiedName~Identity" — 29 passed(从 18 涨到 29:11 个 new test 来自 extensions 覆盖 + null-subject 路径)。
接下来准备落地 projection + query port impl。
…ash-command Wires up all six pieces the ADR called out as parallel-capable now that NyxID#549 has shipped, so per-user binding stops requiring a follow-up PR per layer: #1 — AuthContext.external_subject (Channel.Abstractions) Move ExternalSubjectRef out of Identity.Abstractions into Channel.Abstractions where AuthContext lives. Identity is a consumer of channel concepts, not their owner; the new typed AuthContext.external_subject = 4 field is the broker-mode outbound identity carrier (ADR §Outbound Send) and the legacy string user_credential_ref keeps working for non-broker callers. Adds a new AuthContext.OnBehalfOfExternalSubject helper. #2 — Projection chain ExternalIdentityBindingDocument (proto) + Partial (IProjectionReadModel) + MetadataProvider + MaterializationContext + Projector (ICurrentStateProjectionMaterializer) + ExternalIdentityBindingProjectionQueryPort + ExternalIdentityBindingProjectionReadinessPort (polling impl, write-side completion path only — ADR §Projection Readiness explicitly allows this). AddChannelIdentityProjection registers the lot. #3 — NyxIdRemoteCapabilityBroker HTTP-backed INyxIdCapabilityBroker + INyxIdBrokerCallbackClient against the NyxID#549 wire shape: /oauth/authorize URL building, RFC 8693 token-exchange with subject_token_type=urn:nyxid:params:oauth:token-type:binding-id, /oauth/bindings/{id} delete, authorization-code -> binding_id exchange. Maps invalid_grant -> BindingRevokedException so the upper layer event-source-revokes the local binding. PkceHelper covers RFC 7636 S256; StateTokenCodec seals correlation+verifier+exp into an HMAC token (kid header, deterministic Protobuf payload — ADR §Implementation Notes #1). AddNyxIdRemoteCapabilityBroker wires HttpClient + options + codec + both interfaces. #4 — /api/oauth/nyxid-callback endpoint (IdentityOAuthEndpoints) Decodes state, exchanges code -> binding_id, dispatches CommitBindingCommand to ExternalIdentityBindingGAgent via IActorRuntime, waits on the projection readiness port, returns a friendly bind-confirmation page (id_token decoded locally for the display name — no /oauth/userinfo round-trip per ADR L61). Classifies error UX by HTTP status (ADR §Implementation Notes #3): 400 for state-token issues, 502 for broker exchange failure, 200 with a "binding pending propagation" message on projection timeout. #5 — Slash-command routing in ChannelConversationTurnRunner /init and /unbind are handled before the LLM by a new TryHandleSlashCommandAsync that resolves identity ports lazily through the existing IServiceProvider — deployments without per-user binding fall through unchanged. /init replies with the authorize URL (private DM only — ADR §Decision); /unbind resolves binding_id, calls broker.RevokeBindingAsync, replies with the unbind confirmation. The runner short-circuits via the existing SendReplyAsync, so reply delivery rides the relay outbound port like any other reply. #6 — /api/webhooks/nyxid-broker-revocation receiver (Continuous Access Evaluation) BrokerRevocationWebhookValidator verifies HMAC-SHA256 over the raw body (X-NyxID-Signature: sha256=<hex>), parses the JSON envelope into a typed BrokerRevocationNotification, and the endpoint event-source-revokes the local binding actor. Aligns with NyxID#549 V2-7's CAE channel — when a user revokes from NyxID's console the binding goes inactive in seconds rather than waiting for the 5-min access TTL. Tests: - 591 ChannelRuntime.Tests pass (39 of which are the new Identity-tagged tests covering actor, projector, broker fake, state-token codec, and ExternalSubjectRef extension). - Full solution `dotnet build aevatar.slnx` is clean. - `tools/docs/lint.sh` 33 file(s) checked, 0 errors. The ChannelConversationTurnRunner integration is feature-flag-shaped: if neither IExternalIdentityBindingQueryPort nor INyxIdCapabilityBroker is registered, the slash-command path is a no-op and existing bot-owner-shared behaviour is preserved. Production rollout is a DI toggle (AddChannelIdentityProjection + AddNyxIdRemoteCapabilityBroker + MapIdentityOAuthEndpoints) plus the Bot-Owner-Shared termination strategy choice from the ADR §Bot-Owner-Shared 模式终止策略. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All Six Follow-up Items Landed (commit
|
| # | 内容 | 落点 |
|---|---|---|
| 1 | AuthContext.external_subject = 4 typed 字段 |
ExternalSubjectRef 从 Channel.Identity.Abstractions 移至 Channel.Abstractions(Identity 是 channel 的 consumer,不是 owner);AuthContext.OnBehalfOfExternalSubject(ExternalSubjectRef, string) helper 加入 AuthContext.Partial.cs。legacy user_credential_ref 保留给非 broker 路径。 |
| 2 | Projection chain | ExternalIdentityBindingDocument(proto + IProjectionReadModel partial)、MetadataProvider、MaterializationContext、ExternalIdentityBindingProjector(ICurrentStateProjectionMaterializer)、ExternalIdentityBindingProjectionQueryPort、ExternalIdentityBindingProjectionReadinessPort(polling impl,write-side completion path);AddChannelIdentityProjection() 一行注册全部。 |
| 3 | NyxIdRemoteCapabilityBroker(HTTP + PKCE + state_token HMAC) |
直接对齐 NyxID#549 wire shape:/oauth/authorize URL 构造、RFC 8693 token-exchange(subject_token_type=urn:nyxid:params:oauth:token-type:binding-id)、DELETE /oauth/bindings/{id}、authorization-code → binding_id 兑换。invalid_grant → BindingRevokedException。PkceHelper(RFC 7636 S256)+ StateTokenCodec(deterministic Protobuf payload + kid header + HMAC-SHA256;ADR §Implementation Notes #1 全部对齐);AddNyxIdRemoteCapabilityBroker(IConfiguration?) 一行注册。 |
| 4 | /api/oauth/nyxid-callback endpoint |
IdentityOAuthEndpoints.HandleNyxIdOAuthCallbackAsync:验 state → 兑换 code → 派 CommitBindingCommand 到 actor → 等 projection 水位(超时返回"binding pending propagation"友好提示);id_token 本地解码做"已绑定 "展示,不调 /oauth/userinfo。错误 UX 按 ADR §Implementation Notes #3 分类:400 / 502 / 200 propagation。 |
| 5 | ChannelConversationTurnRunner slash-command 路由 |
RunInboundAsync 顶部加 TryHandleSlashCommandAsync,通过 _services 懒查 identity ports —— 没注册时 no-op,保持向后兼容(无 binding 部署不 regression)。/init hit→幂等回复,miss→出 broker authorize URL;/unbind resolve→broker.RevokeBindingAsync→回执;reply 走现有 SendReplyAsync(relay outbound)。 |
| 6 | /api/webhooks/nyxid-broker-revocation CAE webhook |
BrokerRevocationWebhookValidator:HMAC-SHA256 over raw body(X-NyxID-Signature: sha256=<hex>,FixedTimeEquals),JSON envelope → typed notification → RevokeBindingCommand 派给 actor。NyxID#549 V2-7 CAE 通道对齐;用户在 NyxID UI revoke → aevatar 秒级失效,不必等 5min access TTL。 |
关键设计决定
- broker 接受
IExternalIdentityBindingQueryPort注入:RevokeBindingAsync(subject)与IssueShortLivedAsync(subject)内部 resolve 到 binding_id,然后调真实 NyxID endpoint。caller 拿 subject 即可,read/write 边界仍然分清(QueryPort 是单一 read seam)。 - Slash-command 默认 no-op:不破坏现有 bot-owner-shared 部署;打开 binding 是显式 DI 注册动作(
AddChannelIdentityProjection+AddNyxIdRemoteCapabilityBroker+MapIdentityOAuthEndpoints)。 - id_token 解码不验签:OAuth 兑换本身已经认证响应,id_token 的 sub/name 仅用于展示文案,不持久化。零 NyxID round-trip。
/initURL 私域投递:依赖现有 SendReplyAsync 的 relay outbound 走 Lark DM(ADR §Decision);群聊里发的 /init 走 DM 回复天然防 OAuth state hijack。- Projection readiness 是 polling:50ms 步长 + 3s 超时;write-side completion path,turn / query 路径仍禁 priming。
注册示例(production composition root)
services
.AddChannelIdentityProjection() // Item #2
.AddNyxIdRemoteCapabilityBroker(configuration) // Item #3
.AddChannelIdentityWebhookValidators(); // Item #6
// Plus an in-memory or Elasticsearch projection store for ExternalIdentityBindingDocument.
// Plus appsettings:Aevatar:NyxIdBroker section binding NyxIdBrokerOptions.
app.MapIdentityOAuthEndpoints(); // Items #4 + #6留给后续 PR
- Bot-Owner-Shared 终止策略选定:ADR §Bot-Owner-Shared 模式终止策略 列了 A/B/C 三选项;production 上线前需选一个并入 runbook。
AuthContext.user_credential_ref实际删除:现在只是标记为 legacy + 加新字段;实际 outbound adapter 切到 typedexternal_subject字段是 channel-runtime / Lark / Telegram 各自的小 PR。- V2 加固(可选):NyxID#549 提供 DPoP / mTLS / PAR;采纳是 broker config + HTTP header dance 的扩展,不动核心契约。
- CAE webhook 共享密钥:目前
BrokerRevocationWebhookValidator复用StateTokenHmacKey;production 部署可拆为独立WebhookHmacKey配置项。
| if (!isInit && !isUnbind) | ||
| return null; | ||
|
|
||
| var queryPort = _services.GetService<IExternalIdentityBindingQueryPort>(); |
There was a problem hiding this comment.
[Consensus: 4 reviewers] severity=major, category=bug
ExternalSubjectRef is constructed with Tenant = string.Empty for every platform, but per ADR-0017 the Lark identity key is (platform, tenant, external_user_id) — tenant (open_tenant_id) is part of the identity. Hardcoding empty collapses all Lark tenants into one binding actor / readmodel, leaking bindings across tenants. Extract tenant from the inbound activity / channel registration (Lark provides open_tenant_id on every event) and pass it in. For platforms without tenant scope, the existing empty-tenant validation rule still applies — but the value can't be hardcoded for all of them.
Per-reviewer verbatim
-
deepseek-v4-pro (L155):
ExternalSubjectRefis built withTenant = string.Emptyunconditionally. For multi-tenant platforms like Lark, this collapses all tenants into a single actor key, causing cross-tenant identity collisions. Extract the tenant fromactivityorinbound(e.g.,LarkChatActivity.TenantKey/ open_tenant_id) and populateExternalSubjectRef.Tenant. -
codex (L151): The tenant is hard-coded to empty for all platforms, but
ExternalSubjectRefdefines Lark's key as(platform, tenant, external_user_id)and only allows empty tenant for platforms without tenant scope. This can collapse users from different Lark tenants into the same binding actor/readmodel. Populate the platform tenant from the activity/registration payload (for Lark,open_tenant_id) or reject/initwhen the tenant cannot be derived. -
glm-5.1 (L131):
Tenantis hardcoded tostring.Empty. For Lark platforms, the tenant (open_tenant_id) is a key part of theExternalSubjectRefidentity — two users with the sameexternal_user_idin different Lark tenants are distinct subjects. The tenant should be extracted fromactivity,registration, or the inbound context.
If no tenant is available in the inbound message for the current platform, consider extracting it from activity.Conversation or registration rather than defaulting to empty.
- kimi-k2p6 (L138):
Tenant = string.Emptyhardcodes an empty tenant for all platforms. Per the ADR and proto comments, Lark requiresopen_tenant_idin the actor key; omitting it causes cross-tenant collisions where users with the sameexternal_user_idshare one binding actor. PopulateTenantfrom the inbound activity orChannelBotRegistrationEntrycontext (e.g. Larkopen_tenant_id).
| }, | ||
| }; | ||
| await actor.HandleEventAsync(commitEnvelope, ct).ConfigureAwait(false); | ||
|
|
There was a problem hiding this comment.
[Consensus: 3 reviewers] severity=blocker, category=bug
WaitForEventAsync(commitEnvelope.Id, ...) waits on the outer command-envelope ID generated by the callback, but ExternalIdentityBindingProjector.ProjectAsync matches against stateEvent.EventId — the inner StateEvent ID minted by the event-sourcing infrastructure during PersistDomainEventAsync. These IDs will never be equal, so the wait will always time out: the projection-readiness handshake is broken on the happy path. Fix: have HandleCommitBinding return (or expose via reply event) the actual StateEvent.EventId after persisting, and pass that into WaitForEventAsync. The same defect appears in HandleBrokerRevocationWebhookAsync (see other comment).
Per-reviewer verbatim
-
deepseek-v4-pro (L125):
WaitForEventAsync(commitEnvelope.Id, ...)passes the outer envelope ID (Guid.NewGuid()) butExternalIdentityBindingProjector.ProjectAsyncstoresstateEvent.EventId(the innerStateEventID generated by the event-sourcing infrastructure duringPersistDomainEventAsync). These two IDs will never match, so the projection readiness wait will always time out. Binding is still persisted, but the ADR-required synchronous confirmation path is dead. Fix: capture the actualStateEvent.EventIdfrom the commit result, or wait by actor committed-version instead of event ID. -
codex (L129): This waits for
commitEnvelope.Id, butExternalIdentityBindingProjectorwritesLastEventId = stateEvent.EventId, not the command envelope id. Unless the event store deliberately reuses the incoming command id as the persisted state event id, successful callbacks will always hit the 3s timeout even though the projection caught up. Return/capture the persisted state event id or committed version from the actor write and wait on that value instead. -
kimi-k2p6 (L139):
WaitForEventAsync(commitEnvelope.Id, ...)passes the manually-generated command envelope id (Guid.NewGuid()), but the projector storesstateEvent.EventIdinLastEventId. These ids are generated by different subsystems and likely never match, so the readiness wait will almost always time out. Pass the actual persisted event id/version (returned byPersistDomainEventAsyncor available on the committed-state envelope) instead ofcommitEnvelope.Id.
| using var request = new HttpRequestMessage(HttpMethod.Delete, $"{TrimAuthority()}{BindingsEndpoint}/{Uri.EscapeDataString(bindingId)}"); | ||
| ApplyClientSecretBasic(request); | ||
| using var response = await _http.SendAsync(request, ct).ConfigureAwait(false); | ||
| if ((int)response.StatusCode >= 500) |
There was a problem hiding this comment.
[Consensus: 2 reviewers] severity=major, category=bug
RevokeBindingByIdAsync (and the revoke-by-subject path) treats every 4xx response as success ('binding already gone'). That collapses real failures (401 unauthorized, 403 forbidden, malformed body) into silent no-ops, so the user sees a successful /unbind while the binding remains active in NyxID. Fix: only treat 404 Not Found (and possibly 410 Gone) as idempotent success; surface other 4xx as errors with body context for diagnosis.
Per-reviewer verbatim
-
codex (L111): All NyxID 4xx responses are silently treated as revoke success. That masks real failures such as 401/403 client authentication errors or malformed requests; if local revoke is added after this call, aevatar can mark the binding revoked while NyxID still has it active. Only treat the explicit idempotent cases agreed with NyxID (for example 404/410 if that is the contract) as success, and throw for other non-2xx responses.
-
glm-5.1 (L147):
RevokeBindingByIdAsyncsilently treats all 4xx responses as idempotent success. A401 Unauthorized(misconfigured client credentials) or403 Forbidden(insufficient scope) would be swallowed, masking a real configuration or permissions issue. Only404/410are legitimately idempotent for DELETE.
Suggestion: Distinguish between client errors:
if ((int)response.StatusCode is 404 or 410)
return; // idempotent: binding already gone
if (!response.IsSuccessStatusCode)
response.EnsureSuccessStatusCode();| { | ||
| var existing = await queryPort.ResolveAsync(subject, ct); | ||
| if (existing is not null) | ||
| { |
There was a problem hiding this comment.
[Consensus: 2 reviewers] severity=major, category=bug
/unbind calls broker.RevokeBindingAsync(subject) to revoke at NyxID but never sends a RevokeBindingCommand to the local ExternalIdentityBindingGAgent. The actor's state and the projection readmodel will keep showing an active binding — subsequent /init checks, capability lookups, and any read-side query will see stale state until something else triggers a re-projection. After the broker call succeeds, dispatch RevokeBindingCommand(subject, reason='user_unbind') to the actor so the revoke flows through the same event chain as a NyxID-initiated revocation webhook.
Per-reviewer verbatim
-
codex (L95):
RevokeBindingAsynconly deletes the binding in NyxID; it never event-sourcesExternalIdentityBindingRevokedEventlocally, so the projection continues to resolve the oldbinding_idafter/unbind. The test fake hides this because it removes its in-memory entry. After a successful NyxID revoke, commitRevokeBindingCommandtoExternalIdentityBindingGAgentand have the/unbindpath wait for projection readiness before replying success. -
glm-5.1 (L162): The
/unbindpath callsbroker.RevokeBindingAsync(subject)which only revokes at NyxID (DELETE endpoint). NoRevokeBindingCommandis dispatched toExternalIdentityBindingGAgent, so the local actor state retainsbinding_idand the projection readmodel stays active.
Consequence: after /unbind succeeds, ResolveAsync still returns the old binding_id. If the user sends /init, the idempotency check sees an existing binding and replies "已绑定" instead of starting a new OAuth flow.
Fix: After broker.RevokeBindingAsync succeeds, dispatch a RevokeBindingCommand to the actor (mirroring HandleBrokerRevocationWebhookAsync in IdentityOAuthEndpoints.cs), or have the broker's revoke path internally emit the command.
| else | ||
| { | ||
| var challenge = await broker.StartExternalBindingAsync(subject, ct); | ||
| replyText = $"打开此链接完成 NyxID 登录(5 分钟内有效):\n{challenge.AuthorizeUrl}"; |
There was a problem hiding this comment.
[codex] severity=blocker, category=security
The OAuth authorize URL is sent back through SendReplyAsync to the current conversation. In a group chat this exposes the state token/link to other participants, enabling the state-hijack scenario ADR-0017 explicitly forbids. Deliver the challenge only through a sender-only/private channel (for example Lark DM), and fail closed for conversations/platforms where that private delivery is unavailable.
| throw new ArgumentException("ExternalSubjectRef.external_user_id is required.", nameof(externalSubject)); | ||
|
|
||
| if (externalSubject.Platform.Contains(':', StringComparison.Ordinal) | ||
| || externalSubject.Tenant.Contains(':', StringComparison.Ordinal) |
There was a problem hiding this comment.
[glm-5.1] severity=major, category=bug
externalSubject.Tenant can be null when the object is constructed manually without setting Tenant (protobuf C# strings are nullable). Platform and ExternalUserId are guarded by explicit IsNullOrWhiteSpace checks above, but Tenant is accessed directly via .Contains(':') which will throw NullReferenceException.
Fix: Add a null guard before the colon check:
var tenant = externalSubject.Tenant ?? string.Empty;
if (externalSubject.Platform.Contains(':', StringComparison.Ordinal)
|| tenant.Contains(':', StringComparison.Ordinal)
|| externalSubject.ExternalUserId.Contains(':', StringComparison.Ordinal))Or add externalSubject.Tenant ??= string.Empty; at the top of the method.
|
|
||
| private sealed record TokenResponse | ||
| { | ||
| public string? AccessToken { get; init; } |
There was a problem hiding this comment.
[codex] severity=major, category=bug
JsonSerializerDefaults.Web maps properties like AccessToken to accessToken, not OAuth's snake_case fields (access_token, id_token, binding_id, expires_in). As written, token exchange returns an empty access token and authorization-code exchange sees BindingId as missing. Add [JsonPropertyName("access_token")], [JsonPropertyName("binding_id")], etc., or configure a snake_case naming policy.
| var payload = new StateTokenPayload | ||
| { | ||
| CorrelationId = correlationId, | ||
| ExternalSubject = externalSubject.Clone(), |
There was a problem hiding this comment.
[deepseek-v4-pro] severity=major, category=bug
payload.ToByteArray() uses standard protobuf serialisation, which is not guaranteed deterministic. ADR-0017 §Implementation Notes #1 requires deterministic serialisation for the HMAC-sealed payload, because any non-determinism (e.g., a future map<...> field) would silently break TryDecode verification. While StateTokenPayload currently has no unordered fields, pin the invariant by using a deterministic serialisation path.
|
|
||
| byte[] bodyBytes; | ||
| await using (var ms = new MemoryStream()) | ||
| { |
There was a problem hiding this comment.
[deepseek-v4-pro] severity=major, category=security
HandleBrokerRevocationWebhookAsync copies the full request body into a MemoryStream without a size limit. A malicious or misconfigured caller can exhaust memory. Cap the body size (e.g., 64 KB via RequestSizeLimit attribute or IHttpMaxRequestBodySizeFeature) — webhook payloads should be small.
| var actor = await actorRuntime.CreateAsync<ExternalIdentityBindingGAgent>(actorId, ct).ConfigureAwait(false); | ||
| var revokeEnvelope = new EventEnvelope | ||
| { | ||
| Id = Guid.NewGuid().ToString("N"), |
There was a problem hiding this comment.
[deepseek-v4-pro] severity=major, category=bug
Same event-ID mismatch exists in HandleBrokerRevocationWebhookAsync. Not exercised today because this handler returns Accepted() without a readiness wait, but ADR §Implementation Notes #5 mandates /unbind to wait for projection catch-up — the same bug will surface when that wait is added.
| return; | ||
| } | ||
|
|
||
| if (!string.IsNullOrEmpty(State.BindingId)) |
There was a problem hiding this comment.
[codex] severity=major, category=bug
Duplicate /init commits are discarded locally, but the newly issued NyxID binding_id is left active with no cleanup signal back to the callback handler. ADR-0017 requires best-effort DELETE /oauth/bindings/{binding_id} for the rejected binding; otherwise concurrent /init can create orphan active bindings in NyxID. Make the commit path return an accepted/rejected result (or expose an explicit command result event) so the callback can delete the rejected binding id.
| var document = await _reader.GetAsync(readmodelId, ct).ConfigureAwait(false); | ||
| if (document is not null && string.Equals(document.LastEventId, eventId, StringComparison.Ordinal)) | ||
| return; | ||
|
|
There was a problem hiding this comment.
[kimi-k2p6] severity=major, category=bug
Exact equality document.LastEventId == eventId is fragile: if the projection processes a later event before the next poll, LastEventId advances and this check falsely times out. The interface docs say "advance past the version", but the implementation checks exact match. Switch to a monotonic version comparison (e.g. document.StateVersion >= expectedVersion) and update the interface to accept a version watermark instead of an opaque event id.
| _ = TrySendImmediateLarkReactionAsync(activity, registration, ct); | ||
|
|
||
| var inbound = ToInboundMessage(activity); | ||
| if (await TryHandleSlashCommandAsync(activity, inbound, registration, runtimeContext, ct) is { } slashResult) |
There was a problem hiding this comment.
[deepseek-v4-pro] severity=major, category=design
TryHandleSlashCommandAsync runs before TryHandleWorkflowResumeAsync (line 69). A workflow-resume message starting with /init or /unbind (e.g., /init previous-context) will be silently consumed as a slash command, dropping the resume. While unlikely given current slash-command match patterns, the ordering should be documented or inverted — workflow resume is the normal path; slash commands are rare.
| _ = TrySendImmediateLarkReactionAsync(activity, registration, ct); | ||
|
|
||
| var inbound = ToInboundMessage(activity); | ||
| if (await TryHandleSlashCommandAsync(activity, inbound, registration, runtimeContext, ct) is { } slashResult) |
There was a problem hiding this comment.
[codex] severity=major, category=security
This only intercepts /init and /unbind; every other inbound message still continues into the existing LLM flow. Once identity ports are registered, an unbound sender can send any non-slash text and still use the legacy bot-owner credentials, which violates ADR-0017's "unbound sender must /init, no bot-owner fallback" rule. Add a pre-LLM binding gate for all inbound turns when broker mode is enabled: resolve ExternalSubjectRef, return /init guidance on miss, and only proceed to LLM after a binding exists.
| Convert.ToBase64String(bytes).TrimEnd('=').Replace('+', '-').Replace('/', '_'); | ||
|
|
||
| private static byte[] Base64UrlDecode(string value) | ||
| { |
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=minor, category=perf
HmacSign converts _options.StateTokenHmacKey to byte[] via Encoding.UTF8.GetBytes() on every call. Since StateTokenCodec is registered as a singleton, cache the key bytes in the constructor to avoid repeated allocations:
private readonly byte[] _hmacKeyBytes;
public StateTokenCodec(NyxIdBrokerOptions options, TimeProvider? timeProvider = null)
{
_options = options ?? throw new ArgumentNullException(nameof(options));
_timeProvider = timeProvider ?? TimeProvider.System;
_hmacKeyBytes = Encoding.UTF8.GetBytes(_options.StateTokenHmacKey);
}| IConfiguration? configuration = null) | ||
| { | ||
| ArgumentNullException.ThrowIfNull(services); | ||
|
|
There was a problem hiding this comment.
[glm-5.1] severity=minor, category=design
AddHttpClient<NyxIdRemoteCapabilityBroker>() registers the broker as transient, but TryAddSingleton<INyxIdCapabilityBroker> captures that transient instance forever. While HttpClient is safe to hold long-term, NyxIdBrokerOptions is resolved once via IOptions<> snapshot and never refreshed. If options are reloaded via IOptionsMonitor, the singleton broker won't pick up changes.
Consider using IOptionsMonitor<NyxIdBrokerOptions> inside the broker, or document that options changes require an app restart.
| } | ||
| catch (Exception ex) | ||
| { | ||
| logger.LogError(ex, "OAuth callback authorization-code exchange failed for correlation {CorrelationId}", correlationId); |
There was a problem hiding this comment.
[deepseek-v4-pro] severity=minor, category=design
actorRuntime.CreateAsync<ExternalIdentityBindingGAgent>(actorId, ct) can throw if activation fails. Wrap in try/catch and return a 503 with a user-friendly message (matching the existing error-handling pattern around the token exchange at lines 96–112) rather than letting the exception propagate to the framework's default error handler.
| } | ||
| return null; | ||
| } | ||
|
|
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=minor, category=design
The revocation webhook handler lacks error handling around actor.HandleEventAsync. If the actor throws (e.g. concurrency conflict, serialization error), the raw exception propagates to NyxID as a 500 with potentially sensitive details. Consider wrapping in try-catch and returning a generic 500 message, matching the pattern used in the OAuth callback handler.
| @@ -0,0 +1,53 @@ | |||
| using Aevatar.CQRS.Projection.Stores.Abstractions; | |||
There was a problem hiding this comment.
[kimi-k2p6] severity=minor, category=test
No tests cover the readiness port's polling loop, timeout path, or the exact-match logic. Add tests for: successful catch-up, timeout expiration, and the scenario where the projection advances past the expected event before the poll observes it.
| _agent = new ExternalIdentityBindingGAgent | ||
| { | ||
| Services = _serviceProvider, | ||
| EventSourcingBehaviorFactory = |
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=minor, category=test
Tests call HandleCommitBinding directly rather than routing through HandleEventAsync with a properly constructed EventEnvelope. This bypasses the runtime's event routing and [EventHandler] dispatch. Consider adding at least one integration-style test that constructs an EventEnvelope with a packed CommitBindingCommand payload and calls HandleEventAsync to pin the full dispatch path.
…andling
Three consensus issues from the post-c67fe537 review pass:
1. **BLOCKER (3 reviewers, L125):** WaitForEventAsync was passing the
outer command-envelope id (Guid.NewGuid in the callback) but the
projector wrote stateEvent.EventId (minted by the event-sourcing
infrastructure during PersistDomainEventAsync). The two ids never
match, so the readiness handshake always timed out on the happy
path. Replace event-id-based waiting with binding-state polling:
WaitForBindingStateAsync(externalSubject, expectedBindingId, …)
polls the readmodel until BindingId matches, or until the field
clears for the revoke case. The interface now describes the
semantics the callback actually needs rather than the event-store
primitives it can't reliably observe.
2. **MAJOR (4 reviewers, L131):** Slash-command path hardcoded
ExternalSubjectRef.Tenant = "" for every platform. Lark's identity
key includes open_tenant_id and two users with the same
external_user_id in different tenants are distinct subjects, so
the empty default collapsed bindings across tenants. New
ResolveTenant helper:
1) prefers `open_tenant_id` / `tenant` from
InboundMessage.Extra (platform adapter populated),
2) falls back to registration.ScopeId so bindings stay at
least per-bot-scoped (each bot is registered to one tenant),
3) refuses the slash command when neither is available rather
than committing a tenant-collapsed binding.
3. **MAJOR (2 reviewers, L111):** RevokeBindingByIdAsync collapsed
every 4xx into "binding already gone", masking 401/403 client-auth
errors and 422 validation failures as successful unbinds. Now only
404 / 410 short-circuit as idempotent success — anything else
surfaces body context in the log and re-throws so the caller can
decide whether to retry or abort.
Tests: 39 Identity-tagged tests still pass, full
`dotnet build aevatar.slnx` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-reviewer Consensus Fixes (commit
|
| # | Severity / Reviewers | 内容 | 落点 |
|---|---|---|---|
| L125 | BLOCKER / 3 reviewers (deepseek-v4-pro, codex, kimi-k2p6) | WaitForEventAsync 传 commitEnvelope.Id(callback 里 Guid.NewGuid),projector 写的是 stateEvent.EventId(PersistDomainEventAsync 里 ES infra 生成)。两者永远不等,readiness 永远超时 → happy path 死掉 |
重写 IProjectionReadinessPort:WaitForBindingStateAsync(externalSubject, expectedBindingId?, timeout, ct) —— 直接 poll readmodel 的 BindingId 字段(null = 等 revoke)。callback 改用新签名,语义明确,不绕 ES infra |
| L131 | MAJOR / 4 reviewers (deepseek-v4-pro, codex, glm-5.1, kimi-k2p6) | ExternalSubjectRef.Tenant = "" 硬编码 → Lark 多租户语义崩塌(同 open_user_id 跨租户共享一个 binding actor) |
新 ResolveTenant(inbound, registration) 三级降级:① inbound.Extra["open_tenant_id"] / ["tenant"](platform adapter 填) → ② registration.ScopeId(每个 bot 注册到唯一租户)→ ③ 拒绝 slash command(production 必须要排查配置,而不是写一个 tenant-collapsed binding) |
| L111 | MAJOR / 2 reviewers (codex, glm-5.1) | RevokeBindingByIdAsync 把所有 4xx 当 idempotent success → 401/403 client-auth、422 validation 全被吞 → 用户看到 /unbind 成功但 NyxID 那边 binding 还活着 |
只对 404 / 410 short-circuit;其他 4xx / 5xx 把 body 截断后入 log,然后 EnsureSuccessStatusCode() 抛,让调用方决定 retry / abort |
验证
dotnet build aevatar.slnx— 0 errorsdotnet test --filter "FullyQualifiedName~Identity"— 39 passed,0 failed
一些选择理由
IProjectionReadinessPort改 binding-state semantic:event-id 的 ES infra 暴露面我们够不到(PersistDomainEventAsync没返回 stateEvent),用 version 也得 actor 暴露;直接 poll readmodel 的业务字段反而最直白 —— callback 真正想知道的就是"用户回到 Lark 发消息时 ResolveAsync 能看到这个 binding"。- Tenant fallback 到
registration.ScopeId:不是完美的 platform-tenant 映射,但把 binding 退到 per-bot 而不是全局 collapse,production 场景(每个 bot 在一个租户)语义正确,且 implementation 不需要在这个 PR 给所有 platform adapter 都加 tenant extraction —— 那个属于 platform 适配工作。Adapter 填Extra["open_tenant_id"]就升级到 platform-级 tenant scope。 - 404/410 idempotent:DELETE 标准语义 —— "已经没有了" 跟 "成功删除" 等价;其他 4xx 是 caller 的责任。
Summary
ADR-0017 锁定 channel bot per-user NyxID binding 走 broker 模式(NyxID 持 refresh_token,aevatar 仅持 opaque
binding_id),并在本 PR 同步落地 NyxID#549 落地前可平行实现的 scaffolding。Scope
ADR(已 lock):
docs/adr/0017-per-user-nyxid-binding-via-oauth-broker.md(acceptedpending NyxID#549 contract freeze)实现(本 PR scaffolding 范围,不依赖 NyxID broker endpoint):
Aevatar.GAgents.Channel.IdentityExternalSubjectRef+ExternalIdentityBindingState+ 事件 / 命令INyxIdCapabilityBroker接口 + 支持类型(BindingChallenge/BindingId/CapabilityScope/CapabilityHandle/BindingRevokedException)ExternalIdentityBindingGAgentactor + 状态迁移IExternalIdentityBindingQueryPort+ projection 反向路径IProjectionReadinessPort(write-side completion 端口)InMemoryCapabilityBroker(test fake)+ DI 装配不在本 PR 范围(后续 PR):
/api/oauth/nyxid-callbackendpointChannelConversationTurnRunnerslash-command 路由(/init//unbind)NyxIdRemoteCapabilityBroker真实实现(等 NyxID#549 contract freeze)AuthContext.external_subject = 4字段新增(channel-runtime proto 改动单独 PR)Test plan
bash tools/docs/lint.sh— ADR frontmatter 通过dotnet build+ 单测 通过ExternalIdentityBindingGAgentstate 不出现 secret material;projection / log / metric 不出现 secret materialADR Review 历史
oauth-broker.md,新增 §Actor Architecture / §Outbound Send,address 8 点🤖 Generated with Claude Code