Self-heal EventSourcingBehavior version drift + Lark mirror recovery runbook#503
Self-heal EventSourcingBehavior version drift + Lark mirror recovery runbook#503
Conversation
EventSourcingBehavior previously left _currentVersion stale in two scenarios that surfaced together in prod (issue #502): ConfirmEventsAsync — on EventStoreOptimisticConcurrencyException the catch path only logged and rethrew. The runtime envelope retry policy would replay the same observation, ConfirmEventsAsync would rebuild stateEvents from the unchanged _currentVersion, and Garnet would reject with the same expected-vs-actual conflict forever, wedging the actor until a manual stream reset. ReplayAsync — _currentVersion was set from events[^1].Version, so any drift between the store's authoritative version key and the events sorted set (interrupted Lua append, snapshot+compaction that wiped events but left the version key, externally-seeded store) reactivated the actor with a _currentVersion behind the store and every subsequent commit hit the same permanent conflict. The fix is two layers of defense: ConfirmEventsAsync now catches EventStoreOptimisticConcurrencyException specifically and refreshes _currentVersion to ex.ActualVersion before rethrowing. _pending is left intact so the runtime retry replays the same logical events with versions recomputed from the refreshed baseline. State stays in its pre-commit shape; the next ReplayAsync reconciles fully. ReplayAsync now treats GetVersionAsync(agentId) as the authoritative floor: _currentVersion = max(replayed_last_version, store_version). When events are missing but the version key is ahead the actor reactivates at the store version with empty/snapshot state and makes forward progress instead of conflict-looping. Adds three tests that mirror the prod scenarios — out-of-band store writes that leave the in-memory version stale, version-key-ahead-of- events drift after compaction, and the same drift with a snapshot present. Also adds docs/operations/2026-04-29-lark-mirror-recovery-runbook.md capturing the recovery procedure used during the 2026-04-28 incident (Nyx-side ID lookup via nyxid CLI, registration_id recovery from bot label/api-key name prefixes, channel_registrations action= repair_lark_mirror via NyxidChat agent) so the next operator hitting the symptom doesn't have to rediscover it. Calls out specifically that POST /api/channels/registrations is not idempotent and must not be used as a recovery shortcut. Refs: #502, #501 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 17f2305232
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codecov Report❌ Patch coverage is @@ Coverage Diff @@
## dev #503 +/- ##
==========================================
+ Coverage 71.62% 71.64% +0.01%
==========================================
Files 1236 1237 +1
Lines 89573 89642 +69
Branches 11713 11720 +7
==========================================
+ Hits 64156 64220 +64
- Misses 20819 20821 +2
- Partials 4598 4601 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 4 files with indirect coverage changes 🚀 New features to boost your workflow:
|
Direct authenticated HTTP equivalent of the LLM-tool path channel_registrations action=repair_lark_mirror, so recovery from a missing local Lark mirror does not require a working NyxidChat agent or a scope-bound chat session. The handler reuses the same INyxLarkProvisioningService.RepairLocalMirrorAsync the LLM tool calls — so Nyx-side ownership verification and the ChannelBotRegisterCommand dispatch path are identical. Validation matches existing endpoints: scope_id must equal the JWT scope_id claim if both are provided, Authorization bearer is required (forwarded to Nyx for api-key ownership checks), and Nyx-side failures map to the same status codes used by POST /api/channels/registrations. Tests cover the success path, missing nyx_channel_bot_id (400), missing Authorization (401), Nyx-side failure (502), and scope-mismatch attempts (400) — the last is important because a successful mirror repair routes all subsequent relay traffic for the api-key into the local scope, so a mismatched-scope repair is effectively a hijack vector. Updates the recovery runbook to document the direct HTTP endpoint as the preferred path with the LLM-tool path kept as a fallback. Refs: #502 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
LGTM ✅ 代码审查结果: EventSourcingBehavior版本漂移自愈
Lark镜像修复端点
验证结果
Build succeeded. /Users/chronoai/.paseo/worktrees/32qtkk8z/feeble-chipmunk/test/Aevatar.AI.Tests/Aevatar.AI.Tests.csproj : warning NU1510: PackageReference Microsoft.Extensions.DependencyInjection will not be pruned. Consider removing this package from your dependencies, as it is likely unnecessary. [/Users/chronoai/.paseo/worktrees/32qtkk8z/feeble-chipmunk/aevatar.slnx] Time Elapsed 00:00:13.05 ✅ 0 errors
风险评估PR description中的风险分析全面:
建议合并,可以解决生产环境的EventSourcing版本漂移问题并提供可靠的Lark镜像恢复路径。 |
eanzhao
left a comment
There was a problem hiding this comment.
Review: APPROVE
Both fixes are correctly implemented and well-tested. The PR closes the two gaps that caused the 2026-04-28 incident: permanent version-drift loops and the operational dependency on a working chat session for Lark mirror recovery.
Part 1: EventSourcingBehavior version-drift self-heal
ConfirmEventsAsync catch path (line 104-124):
_pendingis NOT cleared on conflict — the copy on line 64 (_pending.ToArray()) is transient, andRemoveCommittedPendingPrefixonly runs on success (line 94)- On retry, the same pending events are re-stamped with versions computed from the refreshed
_currentVersion(=ex.ActualVersion) - Correct:
_currentVersionrecovery + pending-event preservation = clean retry
ReplayAsync version floor (line 192-214):
Math.Max(events[^1].Version, storeVersion)as the authoritative floor handles compaction that wiped the sorted set while the version key survived- When events=0 + snapshot=null, setting
_currentVersion = storeVersion(line 198) is strictly better than leaving it at 0 (the previous behavior) — prevents the actor from committing v=1 against a store already at v=N - The double-guard design (Replay floor + Confirm catch) means even if the version drifts again between replay and first commit, the catch path still recovers
One observation (non-blocking): ReplayAsync now always calls GetVersionAsync even when the events sorted set is healthy and consistent (events[^1].Version >= storeVersion). The overhead is negligible (single key lookup), but if actor activation throughput becomes a concern, this could be gated behind a if (events.Count == 0 || events[^1].Version < storeVersion) check. Not worth changing now.
One clarification on the PR's risk statement "State staleness when events.Count==0 but version key is ahead": this is indeed only triggered by a data-loss scenario (compaction that deleted events without rewinding the version key). For the production case (projection scope actors), the materializer reconverges because observations are re-delivered. For domain agents with non-idempotent transitions, the lost events represent information that genuinely cannot be recovered — but the alternative (conflict-looping forever) is worse.
Part 2: Lark mirror repair endpoint
- Scope-hijack protection:
ResolveScopeIdwithrequired: true+body.scope_id == JWT.scope_idenforcement matches the existing register/rebuild endpoints. TestHandleRepairLarkMirrorAsync_RejectsScopeMismatchpins this. - Bearer token forwarding: The Nyx ownership verification needs the caller's Nyx access token. The handler reads it from the
Authorizationheader and passes it through — same pattern as register/rebuild. - Idempotency:
RepairLocalMirrorAsyncreturnsalready_registeredif the mirror already exists, so re-running is safe. - Field validation: All required fields (
nyx_channel_bot_id,nyx_agent_api_key_id,webhook_base_url) are checked with separate 400 responses. Tests cover each missing-field case.
One observation (non-blocking): The endpoint returns 202 Accepted even though the RepairLocalMirrorAsync may complete synchronously. The runbook correctly warns about a 5-10 second wait for projection materialization, so 202 is semantically appropriate — signals that the operation was accepted but the read-model may not be visible yet.
Runbook
Comprehensive and actionable. The symptom signature section, the "NOT this runbook" differential diagnosis, and the NyxID CLI commands are all well-written. The runbook correctly distinguishes between the annel_registrations action=repair_lark_mirror LLM-tool path (fallback) and the direct HTTP endpoint (preferred).
Verification
- Build: 0 errors
Aevatar.Foundation.Core.Tests: 170 pass, 0 fail (3 new EventSourcing drift tests)Aevatar.GAgents.ChannelRuntime.Tests: 711 pass, 0 fail (5 new repair endpoint tests)- Test stability guard: pass
- Architecture guards: all pass
Risk: log level change
The PR downgrades conflict logs from Error to Warning. If any monitoring/alerting keys on the previous "Event sourcing commit failed" Error message, the alert won't fire after this lands. The Warning message includes "refreshing _currentVersion so the next retry can recover" which should be searchable instead. Consider adding a note to the ops channel about the log signature change.
eanzhao
left a comment
There was a problem hiding this comment.
Review Summary
The core EventSourcingBehavior changes are solid — both the runtime conflict-recovery path and the activation ReplayAsync floor address the exact production wedge. The repair endpoint and runbook are well-scoped operational tooling. Below are two issues worth addressing:
1. _pending not re-stamped when version refreshes — duplicate version numbers on partial-commit edge case
File: src/Aevatar.Foundation.Core/EventSourcing/EventSourcingBehavior.cs:125-140
catch (EventStoreOptimisticConcurrencyException ex)
{
_currentVersion = ex.ActualVersion;
throw;
}When the next ConfirmEventsAsync runs, stateEvents is rebuilt from _pending:
var stateEvents = pendingEvents.Select((evt, i) => new StateEvent
{
Version = _currentVersion + i + 1,
...
}).ToArray();This works correctly for the full-batch rejection case (which is what EventStoreOptimisticConcurrencyException means). But consider a scenario where the store accepted a prefix of the batch before throwing (e.g. a batch of events where the first N were appended but the version counter was only partially advanced — this may not happen with current InMemoryEventStore/Garnet but is a valid concern for future stores).
In that case:
_currentVersionis refreshed toex.ActualVersion(which includes the prefix)_pendingstill contains ALL original events including the already-committed prefix- The retry rebuilds
stateEventswith_currentVersion + 1, +2, ...for ALL pending events - The already-committed prefix events are appended again with new version numbers → duplicate domain events in the stream
The current PR description says this is safe because EventStoreOptimisticConcurrencyException means the entire batch was rejected. This is correct for today's store implementations. But it's fragile — the recovery contract depends on an implementation detail of the event store rather than a structural guarantee.
Recommendation (non-blocking for this PR): Consider clearing _pending on conflict and requiring the domain layer to re-raise events (if that's the existing runtime retry contract), OR add a defensive comment + assertion that the exception contract guarantees full-batch atomicity:
// Contract: EventStoreOptimisticConcurrencyException guarantees the entire
// batch was rejected — no partial append. If this assumption changes, _pending
// must be trimmed to remove any successfully committed prefix.
Debug.Assert(ex.ActualVersion == fromVersion, "Partial-commit detected; _pending cleanup required.");This is non-blocking because the current Garnet + InMemory stores both guarantee full-batch rejection on version mismatch, and the test coverage explicitly validates the retry path.
2. ReplayAsync calls GetVersionAsync before checking events.Count == 0 — unnecessary I/O on the happy path
File: src/Aevatar.Foundation.Core/EventSourcing/EventSourcingBehavior.cs:185-188
var events = await _eventStore.GetEventsAsync(agentId, fromVersion, ct);
var storeVersion = await _eventStore.GetVersionAsync(agentId, ct);
if (events.Count == 0)
{GetVersionAsync is called unconditionally on every replay, including the normal happy path (no drift). For Garnet this is a single Redis GET, but for actors that replay frequently (e.g. short-lived projection scopes), this doubles the I/O per activation.
Since the drift scenario only manifests when events[^1].Version != storeVersion, the extra call can be deferred:
var events = await _eventStore.GetEventsAsync(agentId, fromVersion, ct);
if (events.Count == 0)
{
var storeVersion = await _eventStore.GetVersionAsync(agentId, ct);
if (snapshot == null)
{
_currentVersion = storeVersion;
return null;
}
_currentVersion = Math.Max(snapshot.Version, storeVersion);
return snapshot.State;
}
var state = snapshot?.State ?? new TState();
foreach (var stateEvent in events)
{
if (stateEvent.EventData != null)
state = TransitionState(state, stateEvent.EventData);
}
_currentVersion = events[^1].Version;This defers the GetVersionAsync call to only the events.Count == 0 branch (which is the corruption/compaction case). For the normal events.Count > 0 path, _currentVersion is set to events[^1].Version without the extra I/O, matching the pre-PR behavior.
The trade-off: if events exist but their last version is behind the store version (the partial-compaction-with-remaining-events case), the deferred approach would miss it. If that case is a real concern, keep the unconditional call but add a comment documenting the trade-off.
Minor observations (non-blocking)
- Repair endpoint is clean — scope-mismatch protection, proper auth forwarding, idempotent by design. Test coverage is solid.
NyxConversationRouteIdis not validated as required in the handler, but the runbook lists it as required. IfRepairLocalMirrorAsynchandles a missing route ID gracefully, this is fine; otherwise the handler should add a null/empty check for consistency withNyxChannelBotId/NyxAgentApiKeyId.- Runbook is excellent — the "What you must NOT do" section is particularly valuable for preventing operator mistakes during incident response.
eanzhao
left a comment
There was a problem hiding this comment.
Overall
LGTM. This PR correctly closes the two gaps described in #502 with a solid two-layer defense and good operational documentation.
ConfirmEventsAsync self-heal
Catching EventStoreOptimisticConcurrencyException specifically and refreshing _currentVersion = ex.ActualVersion before rethrowing is the right fix. The comment explaining why _pending is intentionally left intact is excellent — it preserves the runtime retry contract without requiring the caller to re-raise events. The log level downgrade from Error to Warning is appropriate because the catch path is now self-healing rather than a terminal failure.
ReplayAsync authoritative floor
Using GetVersionAsync(agentId) as the floor (_currentVersion = max(events[^1].Version, storeVersion)) correctly handles the compaction/drift case. The risk discussion in the PR description about non-idempotent domain GAgents is honest and well-reasoned. For the projection scope actors that hit this in prod, the behavior is correct.
repair-lark-mirror endpoint
Direct HTTP equivalent is the right operational call — it removes the dependency on a working NyxidChat agent during an outage. Validation (required fields, bearer token, scope_id == JWT scope_id) matches the existing register/rebuild endpoints. The scope-mismatch test (HandleRepairLarkMirrorAsync_RejectsScopeMismatch) is important and correctly pins the hijack-vector mitigation.
Runbook
Clear, structured, and includes the exact log lines to grep. The "What you must NOT do" section calling out that POST /api/channels/registrations is not idempotent is operationally valuable. Filename timestamp 2026-04-29-... follows the repo convention.
Minor / non-blocking
-
Sanity check on ActualVersion: In
ConfirmEventsAsync, consider adding a defensive check thatex.ActualVersion >= _currentVersionbefore assigning. If a malformedEventStoreOptimisticConcurrencyExceptionever carries a lower version, silently accepting it would move_currentVersionbackward and likely corrupt the event sequence. The current test coverage doesn't exercise this edge. -
ReplayAsync extra round-trip: The additional
GetVersionAsynccall inReplayAsyncadds one store read per actor activation. For high-churn actors this is negligible, but worth keeping in mind if activation latency becomes a concern later. -
Runbook closing paragraph: The "When to stop using this runbook" section still says "Once issue #502's EventSourcingBehavior hardening is deployed AND a direct authenticated HTTP repair endpoint is added..." — both conditions are now met by this PR, so that paragraph could be tightened, but it's not blocking.
Approve.
Three review-driven follow-ups on top of the EventSourcingBehavior hardening: 1. Defensive guard against malformed EventStoreOptimisticConcurrencyException.ActualVersion. If a future store implementation reports an ActualVersion below the in-memory _currentVersion, silently accepting it would rewind the actor and cause the next commit to assign duplicate event versions. The catch path now keeps Math.Max(_currentVersion, ex.ActualVersion) and logs at Error level with the contract violation surfaced. Added a test covering this edge: a store that fabricates a low ActualVersion must not regress _currentVersion. 2. Comment in the ConfirmEventsAsync catch path documenting the full-batch-atomicity assumption that justifies leaving _pending intact. Garnet's AppendScript and InMemoryEventStore both guarantee the entire batch is rejected on conflict; a future store with partial-commit semantics would need to extend this catch to drop the already-committed prefix, otherwise the retry would duplicate domain events. The current contract is documented inline. 3. Comment in ReplayAsync explaining why the GetVersionAsync probe is unconditional rather than gated behind events.Count == 0. Partial compaction can leave the events sorted set with valid trailing entries while the version key is ahead — events[^1].Version < storeVersion is a real shape that the gated form would miss. The one-extra-Redis-GET cost is negligible relative to the activation envelope. Refs: #502 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three review concerns from PR #503 (issue #502 follow-up): 1. ConfirmEventsAsync_WhenStoreVersionIsAhead — leaving _pending intact across an optimistic-concurrency conflict produced duplicate events on the runtime envelope retry. The handler is re-executed on redelivery and re-raises the same logical event(s), so the next ConfirmEventsAsync would commit both the stale pending entries and the freshly-raised ones. Drop the rejected batch from _pending in the catch path so the handler re-execution is the single source of truth for the retry payload. The suffix raised mid-flight (existing WhenNewEventIsRaisedDuringAppend contract) is preserved. The previous "retry without re-raising" test asserted the wrong shape for the production envelope retry; rewrite it to model handler re-execution and add a direct regression (OnConflict_DropsRejectedBatchFromPendingSoEnvelopeRetryDoesNotDuplicate) asserting _pending is empty after the catch path. 2. ReplayAsync version-floor — Math.Max(events[^1].Version, storeVersion) silently turned a missing-committed-events corruption into a healthy actor at a higher version with stale state. Safe for idempotent projection scope actors; unsafe as a default for arbitrary GAgentBase<TState> because future commits, snapshots, and projections would be built from facts that were never applied. New default: throw EventStoreVersionDriftException so the operator decides. Per-actor opt-in via EventSourcingRuntimeOptions.ShouldRecoverFromVersionDriftOnReplay (predicate evaluated at behavior construction in DefaultEventSourcingBehaviorFactory). Foundation.Runtime.Hosting wires the predicate to recover only projection.{durable,session}.scope:* ids so the original production wedge still self-heals while domain GAgents surface the drift. Reframe the two existing drift tests as opt-in tests, add throws-by- default tests for both the no-snapshot and with-snapshot shapes, and add a per-agent-predicate test directly against the factory. 3. Runbook updated to call out the new self-heal scope (projection scopes only) and the EventStoreVersionDriftException signal for any other actor. Refs: #502, #503 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #503 review (comment 3158220132): the direct HTTP endpoint skipped the preflight that the LLM-tool path runs against the local mirror, so repeated calls without a registration_id minted a fresh id every time. Worse, the same Nyx api-key could be repaired into multiple distinct scopes — the resolver would then refuse to route relay traffic and the 401 symptom would come back. The body.scope_id == JWT.scope_id check did not block this; it only proved the caller was using the scope they were currently authenticated for. Inject IChannelBotRegistrationQueryPort into the handler and mirror the LLM-tool path: - Same-scope mirror match → 200 already_registered, no dispatch - Different-scope mirror match → 400 reject (api-key hijack vector) - Empty-scope mirror match → reuse the existing registration id so the backfill path attaches a scope rather than producing a parallel entry Read-side failure during preflight is logged and falls through to dispatch so a degraded projection reader does not block operational repair. Tests: - HandleRepairLarkMirrorAsync_ShortCircuits_WhenSameScopeMirrorAlreadyExists - HandleRepairLarkMirrorAsync_RejectsCrossScopeMatch - HandleRepairLarkMirrorAsync_ReusesEmptyScopeRegistrationId - HandleRepairLarkMirrorAsync_FallsThroughToDispatch_WhenQuerySideIsUnavailable Existing repair-endpoint tests updated to inject the query port. Refs: #502, #503 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #502
Summary
EventSourcingBehavior.ConfirmEventsAsyncnow catchesEventStoreOptimisticConcurrencyExceptionspecifically and refreshes_currentVersion = ex.ActualVersionbefore rethrowing. The runtime envelope retry replays with versions recomputed from the refreshed baseline instead of wedging on the same conflict forever.EventSourcingBehavior.ReplayAsyncnow treatsGetVersionAsync(agentId)as the authoritative floor (_currentVersion = max(events[^1].Version, store_version)). When the events sorted set and the store's version key drift apart (interrupted Lua append, partial compaction, externally seeded store), the actor reactivates at the store-side version and makes forward progress instead of conflict-looping.POST /api/channels/registrations/repair-lark-mirroras the direct authenticated HTTP equivalent of the LLM-tool pathchannel_registrations action=repair_lark_mirror. SameINyxLarkProvisioningService.RepairLocalMirrorAsyncunder the hood — Nyx-side ownership verification, ChannelBotRegisterCommand dispatch — so recovery no longer requires a working NyxidChat agent or scope-bound chat session.docs/operations/2026-04-29-lark-mirror-recovery-runbook.mdcapturing the recovery procedure used during the 2026-04-28 incident, with the direct HTTP endpoint as the preferred path and the LLM tool kept as a fallback.Why
Production silo on 2026-04-28 hit:
Permanent loop — the projection scope's
_currentVersion=3while Garnet's version key was at 4 (events sorted set only contained 1–3, soReplayAsyncset_currentVersion = events[^1].Version = 3instead of seeing the 4 from the version key). Every retry rebuilt the samestateEventsat the sameexpectedVersionand conflicted again. Recovery required manually deleting the three Garnet keys and restarting the silo.Once the version drift was cleared, a separate problem surfaced:
state.Registrationsonchannel-bot-registration-storewas empty (lost during a priorChannelRuntime→Channel.Runtimenamespace migration cleanup), so the relay still 401'd. Recovering required calling the LLM toolchannel_registrations action=repair_lark_mirrorthroughaevatar-cli chatagainst a NyxidChat agent — which only worked because the operator happened to have the right scope already.This PR closes both gaps:
aevatar-cli api POSTinstead of a multi-step chat orchestration.The runbook covers the full procedure end-to-end so the next operator hitting the symptom doesn't have to rediscover it.
Risks
events.Count==0but version key is ahead. The actor reactivates with empty/snapshot state at the store version. For idempotent projection scopes (the prod case here) this is the correct behavior — observations get re-delivered and the materializer reconverges. For domain GAgents with non-idempotent transitions, missing event replays could leave state inconsistent. Mitigation: this only triggers when events actually went missing from the store, which is already a corruption case where any choice loses information; surfacing as drift-recovered-state-with-correct-version is strictly better than wedge-forever-with-correct-state._pendingis intentionally NOT cleared on conflict. The next retry re-stamps the same logical events with new versions computed fromex.ActualVersion. This preserves the existing "events raised during commit are kept for the next confirm" contract (covered by the existingConfirmEventsAsync_WhenNewEventIsRaisedDuringAppend_ShouldKeepUncommittedSuffixtest).Warning(with a self-heal-applied message) rather thanError. Other persistence failures still log atError. If alerts/dashboards key on the previous error log, they'll need to update.body.scope_id == JWT.scope_id(same check used by the existing register/rebuild endpoints) and a dedicated test (HandleRepairLarkMirrorAsync_RejectsScopeMismatch) pins this.Test plan
dotnet build aevatar.slnx --nologo— succeeds, 0 errors.dotnet test test/Aevatar.Foundation.Core.Tests/...— 170 tests pass (including 3 new EventSourcingBehavior drift tests).dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests/...— 711 tests pass (including 5 new repair-endpoint tests).dotnet test test/Aevatar.Foundation.Runtime.Hosting.Tests/...(excluding integration/garnet/kafka) — 117 tests pass.bash tools/ci/test_stability_guards.sh— passes.bash tools/ci/architecture_guards.sh— all guards throughworkflow_binding_boundary_guardpass.playground_asset_drift_guard.shfails locally (pre-existing environment issue: frontend node_modules not installed in this worktree, unrelated to this PR — no frontend files touched).Event sourcing commit hit optimistic concurrency conflict; refreshing _currentVersionwarnings — those indicate the self-heal is engaging in the wild and we should investigate the underlying drift source.POST /api/channels/registrations/repair-lark-mirrorwith the right body, confirmGET /api/channels/registrationsshows the bot, send a Lark message, confirm the bot replies.🤖 Generated with Claude Code