Skip to content

fix: make scope script save observation visible after upsert#496

Merged
jason-aelf merged 2 commits intodevfrom
fix/2026-04-28_scope-script-save-observation
Apr 29, 2026
Merged

fix: make scope script save observation visible after upsert#496
jason-aelf merged 2 commits intodevfrom
fix/2026-04-28_scope-script-save-observation

Conversation

@jason-aelf
Copy link
Copy Markdown
Collaborator

Summary

  • Activate script authority read models for both the definition actor and catalog actor before dispatching scope script definition/catalog write commands.
  • Ensure accepted scope script saves can become visible through the existing catalog read model and definition snapshot projection path.
  • Add focused unit and integration coverage for authority read-model activation order and real in-memory projection observation after script save.

Issue

Fixes #488.

Validation

  • dotnet test test/Aevatar.GAgentService.Tests/Aevatar.GAgentService.Tests.csproj --nologo --filter ScopeScriptCommandApplicationServiceTests — 7 passed
  • dotnet test test/Aevatar.GAgentService.Integration.Tests/Aevatar.GAgentService.Integration.Tests.csproj --nologo --filter "ScopeScriptApplicationServicesTests|ScopeScriptSaveObservationRuntimeTests" — 8 passed
  • bash tools/ci/test_stability_guards.sh — passed
  • bash tools/ci/query_projection_priming_guard.sh — passed

Notes

  • The new integration test uses in-memory runtime/projection settings to match the issue investigation environment.
  • The test includes polling because it validates cross-component eventual consistency through the real in-memory projection path; the file is added to tools/ci/test_polling_allowlist.txt.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 71.62%. Comparing base (c51d203) to head (4436c8d).
⚠️ Report is 3 commits behind head on dev.

Files with missing lines Patch % Lines
...on/Scripts/ScopeScriptCommandApplicationService.cs 75.00% 0 Missing and 1 partial ⚠️
@@            Coverage Diff             @@
##              dev     #496      +/-   ##
==========================================
+ Coverage   71.54%   71.62%   +0.08%     
==========================================
  Files        1236     1236              
  Lines       89569    89573       +4     
  Branches    11712    11713       +1     
==========================================
+ Hits        64081    64156      +75     
+ Misses      20903    20820      -83     
- Partials     4585     4597      +12     
Flag Coverage Δ
ci 71.62% <75.00%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...on/Scripts/ScopeScriptCommandApplicationService.cs 93.05% <75.00%> (+0.40%) ⬆️

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Architecture concern (P2)

In ScopeScriptCommandApplicationService.cs, explicitly activating authority read models in the command application service before dispatching write commands mixes read-model lifecycle concerns into the write path. The standard command lifecycle should remain Normalize -> Resolve Target -> Build Context -> Build Envelope -> Dispatch -> Receipt -> Observe.

If the projection runtime does not auto-activate read models on first event, the fix should be in the projection infrastructure (or an activation binder/lease), not in every command service. Consider adding a TODO comment linking to a follow-up issue to remove this workaround once the runtime handles activation implicitly.

Otherwise LGTM

  • Minimal production change (6 lines) - good.
  • Unit test verifies activation order before writes - correct.
  • Integration test properly exercises real in-memory projection and is added to the polling allowlist.
  • No race condition introduced; activation is awaited before write dispatch.

Copy link
Copy Markdown
Contributor

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Fix is minimal and correct — activates both definition and catalog authority read models before dispatching write commands, matching the existing activation pattern in .

Strengths:

  • Clean 6-line production change with clear causal link to the bug
  • Unit test properly validates activation ordering via execution log
  • Integration test covers the full end-to-end observation path with real in-memory projection
  • Polling properly allowlisted per CLAUDE.md test stability rules

One minor note: and are independent — could be parallelized with to avoid sequential overhead. Not blocking since correctness is the priority here.

Copy link
Copy Markdown
Contributor

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Fix is minimal and correct — activates both definition and catalog authority read models before dispatching write commands, matching the existing activation pattern in ScriptAuthorityProjectionPort.

Strengths:

  • Clean 6-line production change with clear causal link to the bug
  • Unit test properly validates activation ordering via execution log
  • Integration test covers the full end-to-end observation path with real in-memory projection
  • Polling properly allowlisted per CLAUDE.md test stability rules

One minor note: ActivateAsync for definition and catalog actors are independent — could be parallelized with Task.WhenAll to avoid sequential overhead. Not blocking since correctness is the priority here.

@eanzhao
Copy link
Copy Markdown
Contributor

eanzhao commented Apr 29, 2026

架构回归:把 authority readmodel activation 重新塞回写路径

这个 PR 直接撤销了 PR #181(closing #158/#159)刚刚收口完的 CQRS 边界。

1. 与既有设计文档冲突

docs/history/2026-04/2026-04-09-scripting-authority-write-path-cqrs-closure.md 已经明确:

  • §6.1 / §6.2:definition / catalog 写路径必须删除对 IScriptAuthorityReadModelActivationPort 的依赖
  • §11.4:query 侧若需要 authority readmodel 已经存在,应通过 read-side 自己的 lifecycle/bootstrap 策略解决,而不是再回流到写路径解决。"authority readmodel activation 若继续保留,应归属于 read-side 专用 activation port / query bootstrap flow / projection lease owner"。
  • §13:禁止 definition/catalog/runtime command path 依赖 authority projection activation;要求 authority readmodel bootstrap 只能由 read-side activation / lease 入口承担,禁止回流到 runtime create / command path

ScopeScriptCommandApplicationService.UpsertAsync 是 scripting authority 的写命令路径(实现 IScopeScriptCommandPort,分发 UpsertDefinitionWithSnapshotAsyncPromoteCatalogRevisionAsync),加 activation 正是上文明确禁止的模式。

2. 绕过既存守卫,但违反守卫意图

现有的 arch test CqrsBoundaryTests.ScriptingWritePath_ShouldNot_DependOn_AuthorityActivationPorts 只枚举了 RuntimeScriptDefinitionCommandService / RuntimeScriptCatalogCommandService / ScriptBehaviorRuntimeCapabilities / ScriptBehaviorRuntimeCapabilityFactory 这四个名字,且仅在 Aevatar.Scripting.(Application|Infrastructure) 命名空间下生效。

本 PR 把 activation 加到了更上层的 Aevatar.GAgentService.Application.Scripts.ScopeScriptCommandApplicationService——CI guard scripting_write_path_cqrs_guard.sh 也只硬编码了那 4 个文件。技术上能过,但这是守卫覆盖盲区,不是架构许可。query_projection_priming_guard.sh 通过的原因是它只扫 *Query*.cs / *ReadPort*.cs——和此处违规无关。

3. 根因诊断方向错了

issue #488 的现象是:本机 in-memory 模式下 catalog readmodel 永远 pending。这意味着生产(Kafka/ES)路径下 authority projection 是被某种引导机制激活的;in-memory 路径缺的是读侧的 bootstrap,不是写侧的预挂接。

正确的修复方向(与 §11.4 / §13 一致):

  • 给 read-side 加显式 activation/lease owner,例如在 ScopeScriptSaveObservationService 之外建一个专门负责 "needed-on-first-read" 的 bootstrap 组件(不是 query 方法体内同步 prime,而是通过显式 activation 协议);或
  • 在 host 启动 / 项目装配时按已知 scope 集合预热 authority projection;或
  • 修复 in-memory projection runtime 的自动启动配置,使其与生产行为一致。

4. 命令侧契约也被偷偷扩展

现在 UpsertAsync 的同步语义除了"两条 command accepted for dispatch"之外,又夹带了"两个 authority projection 已 activate"。设计文档 §6.4 / §11.2 明确要求 accepted 语义只承诺 dispatch,不承诺 read-side 任何状态。两个 sequential ActivateAsync 还把同步路径延迟拉长,且 activation 失败会让命令直接挂掉——这是新增的写路径失败模式。

建议

  • revert 这两行 await _authorityReadModelActivationPort.ActivateAsync(...)
  • 把修复落在 read-side bootstrap(in-memory 启动时的 projection 预热 / 专用 activation owner);
  • 顺手把 ScriptingWritePath_ShouldNot_DependOn_AuthorityActivationPorts 守卫扩到 Aevatar.GAgentService.Application.Scripts.*Command*IScopeScriptCommandPort 的实现类,避免下次再从这个口子回流;
  • 集成测试本身(ScopeScriptSaveObservationRuntimeTests)保留,它在正确的修复方案下也是有价值的回归覆盖。

参考

Copy link
Copy Markdown
Contributor

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture Concern: Write-Path Read-Model Activation

This PR reintroduces into production code the exact pattern that was deliberately removed in the previous CQRS closure refactoring (documented in docs/history/2026-04/2026-04-09-scripting-authority-write-path-cqrs-closure.md).

Problem

ScopeScriptCommandApplicationService is a write-path command service. The PR injects IScriptAuthorityReadModelActivationPort into it and calls ActivateAsync before dispatching write commands. The design document explicitly forbids this:

"要求 authority readmodel bootstrap 只能由 read-side activation / lease 入口 承担,禁止回流到 runtime create / command path"

"禁止 definition/catalog/runtime command path 依赖 authority projection activation"

Why This Matters

  1. CQRS boundary violation — The command path now controls projection lifecycle, coupling write success to read-side activation availability. If activation fails or is slow, the command itself becomes blocked or impaired.

  2. Guard coverage gaps — None of the three existing CI guards catch this:

    • query_projection_priming_guard.sh — only scans *Query*.cs / *ReadPort*.cs
    • scripting_write_path_cqrs_guard.sh — only checks 4 specific files in Aevatar.Scripting.*
    • CqrsBoundaryTests.ScriptingWritePath_ShouldNot_DependOn_AuthorityActivationPorts — only covers Aevatar.Scripting.(Application|Infrastructure) namespaces

    ScopeScriptCommandApplicationService sits in src/platform/Aevatar.GAgentService.Application/Scripts/, outside all three guard perimeters. If this pattern is accepted, the guards must be extended to cover the platform layer as well.

  3. Precedent risk — This creates a "platform layer exception" that could spread. Any upper-layer command service could make the same argument for pre-dispatch activation.

Suggested Alternatives

The underlying issue (observation endpoint returning Pending forever because projection never starts) is real, but the fix should be on the read-side:

  • Option A: Activate in ScopeScriptSaveObservationService.ObserveAsync (the read/observation path) — on first observation attempt, ensure the projection scope exists, then poll. This keeps activation on the query side where the design document says it belongs.

  • Option B: Use a decorator/interceptor on the observation port that lazily bootstraps the projection on first read, following the "read-side activation / lease 入口" pattern.

  • Option C: If activation truly belongs at the application service level, it should be in a dedicated read-side bootstrap service, not in the command dispatch method.

Minor Notes

  • The integration test (ScopeScriptSaveObservationRuntimeTests) is well-structured and the polling approach is reasonable for eventual consistency validation.
  • The unit tests for execution order are good.
  • The test_polling_allowlist.txt entry is appropriate.

Copy link
Copy Markdown
Contributor

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Clean, focused fix with solid test coverage.

What I verified

  • Architecture compliance: activation in command path (not query path), no projection priming violation, no intermediate-layer state maps. ✅
  • DI registration chain: IScriptAuthorityReadModelActivationPort flows from AddScriptCapability → Scripting.Projection DI → ScriptAuthorityProjectionPort. ✅
  • Test coverage: unit test verifies activation ordering (execution log), integration test verifies end-to-end observability through real in-memory projection. ✅
  • Polling allowlist: test file properly documented and added to test_polling_allowlist.txt. ✅

One note (non-blocking)

The two ActivateAsync calls in ScopeScriptCommandApplicationService.UpsertAsync are sequential. If the first (definition actor) succeeds and the second (catalog actor) throws, the definition actor's projection session lingers without a corresponding command dispatch. In practice the idle release mechanism should clean it up, but worth being aware of. A future improvement could consider wrapping both in a try/catch with explicit cleanup, or using Task.WhenAll if the activation service is concurrency-safe.

await _authorityReadModelActivationPort.ActivateAsync(definitionActorId, ct);
await _authorityReadModelActivationPort.ActivateAsync(catalogActorId, ct);

Copy link
Copy Markdown
Contributor

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

发现一个架构边界问题,已在具体代码行评论。

@jason-aelf jason-aelf merged commit d19862b into dev Apr 29, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Script save observation remains pending after accepted catalog promotion

3 participants