davidortinau · davidortinau · Apr 19, 2026 · Apr 20, 2026 · Apr 20, 2026 · Apr 20, 2026
diff --git a/.squad/agents/kaylee/history.md b/.squad/agents/kaylee/history.md
@@ -194,3 +194,22 @@ feat(i18n): Phase 2 Batch N — {area} strings to Korean
 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
 ```
 Never push — Captain runs `/review` first.
+
+---
+
+## 2026-04-20 — Potential Parallel Opportunity: Blazor JS Error Bridge (Mobile App Insights)
+
+**Cross-agent note from Scribe (Wash spawn context)**
+
+Wash's mobile observability memo identifies capturing Blazor WebView JavaScript errors as one of five telemetry hooks for App Insights integration. Current scope: Wash handles `.NET-side` wiring (Azure exporter, `MauiExceptions` subscriber, business event extensions).
+
+**Blazor JS error bridge** (separate piece):
+- `wwwroot/js/error-bridge.js`: global `window.onerror` + `unhandledrejection` handler
+- `JsErrorBridge.cs` service: `[JSInvokable]` method to receive errors from JS layer
+- JSInterop registration in DI
+
+**If Captain approves parallel work,** Kaylee could own this independently while Wash does the .NET wiring. Minimal merge conflict surface (JS file + one new service class). Leaves Wash free to focus on HTTP instrumentation + `MauiExceptions` plumbing.
+
+**Current status:** Awaiting Captain decision on full 1-day plan vs. 3-hour small-slice PoC, and answers to open questions. Will be documented in `.squad/decisions.md` once merged.
+
+**Reference:** `.squad/decisions/inbox/wash-mobile-observability.md` (now merged into decisions.md as of 2026-04-20).
diff --git a/.squad/agents/wash/history.md b/.squad/agents/wash/history.md
@@ -1157,3 +1157,95 @@ Build NOT attempted — net11 preview SDK + MAUI workload aren't installed local
 
 - 2026-04-18: **Resx Manifest & Culture Identifier Alignment** — <LogicalName> csproj override forces correct embed stream name (Designer hardcodes SentenceStudio.Resources.Strings.AppResources but MSBuild defaults to assembly-qualified path). Culture filename MUST match all five touchpoints: DB (ko), cookie (ko), whitelist (ko), endpoint validator (ko), resx file (ko). Rename ko-KR → ko: ResourceManager fallback walks specific → parent → invariant; ko is neutral (no regional variant needed), satellite resolution via parent fallback handles ko-KR requests. Two hotfixes applied as lockout-honors when Kaylee's code was rejected for revision: Round 1 manifest fix, Round 2 culture rename.
 
+
+---
+
+## 2026-04-19 — Observability Audit (Captain: "Can I see errors in Aspire on Azure?")
+
+**Short answer:** No Aspire dashboard on Azure. OTLP exporter in ServiceDefaults is gated on `OTEL_EXPORTER_OTLP_ENDPOINT`, which is unset in production ACA. No App Insights wired. No `UseExceptionHandler`. No `/health` endpoint mapped.
+
+**What production observability actually is today:**
+- stdout/stderr from each container → Container Apps system logs → Log Analytics workspace `law-3ovvqiybthkb6` in `rg-sstudio-prod` (table: `ContainerAppConsoleLogs_CL`).
+- Default ASP.NET Core console logger picks up `ILogger<T>` writes. `FeedbackEndpoints` does log warnings on AI failures and errors on GitHub API failures via `loggerFactory.CreateLogger("FeedbackEndpoints")`.
+- `/api/v1/ai/chat` returns `Results.Problem(...)` but does NOT log the underlying exception — failures there are invisible unless the ASP.NET Core pipeline logs the unhandled exception.
+
+**Quiz sentence scoring path:** clients POST to `/api/v1/ai/chat` or `/api/v1/ai/chat-messages` with a scoring prompt (River's prompts). No dedicated "score" endpoint. Any 5xx from these lands in container console logs as default Kestrel exception log.
+
+**Feedback path:** `/api/v1/feedback/preview` + `/submit`. Logs "FeedbackEndpoints" category. AI enrichment failures log warning + fall back; GitHub failures log error.
+
+**What's missing (and recommended):**
+1. Application Insights wired to API + WebApp containers (cheapest observability gain — request traces, dependencies, exceptions, end-to-end correlation).
+2. `app.UseExceptionHandler()` + `ProblemDetails` so unhandled exceptions are logged with context instead of silently swallowed.
+3. `/api/v1/health` endpoint (live + ready) so ACA probe failures are explicit.
+4. Wrap `/api/v1/ai/chat` handlers in try/catch → `logger.LogError(ex, ...)` so OpenAI failures appear with stack traces, not just 503s.
+
+**Azure resources from `.azure/sstudio-prod/.env`:**
+- Subscription: `a25bc5f2-e641-47b9-89a8-5e5fd428d9d6`
+- RG: `rg-sstudio-prod`
+- ACA env: `cae-3ovvqiybthkb6` (domain `livelyforest-b32e7d63.centralus.azurecontainerapps.io`)
+- LAW: `law-3ovvqiybthkb6`
+- Container app names follow Aspire resource names: `api`, `webapp`, `marketing`, `workers`.
+
+**Immediate command for Captain** — tail the API container now:
+`az containerapp logs tail -g rg-sstudio-prod -n api --follow --tail 200`
+
+And for retrospective KQL over this morning:
+```kusto
+ContainerAppConsoleLogs_CL
+| where TimeGenerated > ago(12h)
+| where ContainerAppName_s == "api"
+| where Log_s has_any ("error", "Exception", "fail", "Unhandled", "FeedbackEndpoints")
+| project TimeGenerated, Log_s
+| order by TimeGenerated desc
+```
+
+**Decision memo:** `.squad/decisions/inbox/wash-observability.md` — recommend wiring App Insights + exception handler + `/health` in next sprint.
+
+---
+
+**2026-04-19: Observability Audit Note**
+Captain reported intermittent prod errors (quiz scoring, feedback). Decision memo filed: wire App Insights, add exception handler + ProblemDetails, wrap AI endpoint failures with try/catch+LogError, add /health endpoint. Awaiting approval; ~1 day implement + e2e verify.
+
+
+---
+
+## 2026-04-19 — Mobile Observability Plan (Captain: "what are you gonna do to add App Insights to the mobile app?")
+
+**Key finding:** Mobile side is 80% already done. Didn't expect that going in.
+
+**Inventory:**
+- `SentenceStudio.MauiServiceDefaults/Extensions.cs` already calls `ConfigureOpenTelemetry()` with Logging + Metrics (HttpClient, Runtime) + Tracing (HttpClient). OTLP exporter is gated on `OTEL_EXPORTER_OTLP_ENDPOINT` (unset for mobile — works in local Aspire dev only).
+- `MauiExceptions.cs` already handles the platform gauntlet: AppDomain, TaskScheduler, iOS MarshalManagedException with `UnwindNativeCode`, Android `UnhandledExceptionRaiser`, WinUI 3 FirstChance+Application.UnhandledException. But **no subscriber is attached** anywhere → crashes die silently today.
+- `AddEmbeddedAppSettings()` loads invariant + Production/Development JSON from `SentenceStudio.AppLib` assembly manifest resources. Natural home for Azure Monitor connection string.
+- Typed HttpClients (`AiApiClient`, `FeedbackApiClient`, `SpeechApiClient`, `PlansApiClient`) already flow through `AddStandardResilienceHandler` + service discovery. OTel HttpClient instrumentation already captures them.
+- Zero `Microsoft.ApplicationInsights.*` refs anywhere. Clean slate.
+
+**Plan delivered (memo):** Add `Azure.Monitor.OpenTelemetry.Exporter` 1.3.0 (NOT classic AI SDK — MS .NET 10+ recommended path), plug into existing OTel pipeline via `AddOpenTelemetry().UseAzureMonitor(...)`, subscribe `ILogger` sink to `MauiExceptions.UnhandledException`, add a tiny `wwwroot/js/error-bridge.js` + `[JSInvokable] JsErrorBridge` for BlazorWebView JS errors, DEBUG-guard the connection string load so dev/simulator builds emit nothing.
+
+**Correlation:** Automatic via W3C `traceparent` header injection by OTel HttpClient instrumentation — works end-to-end once API side also emits OTel → App Insights. **Therefore server memo must ship first or in parallel** for correlation to be real.
+
+**iOS gotchas to remember for implementation:**
+- Full-link Release builds will strip `Azure.Monitor.OpenTelemetry.Exporter` reflection targets → need `Properties/LinkerConfig.xml` preserve directive.
+- `PrivacyInfo.xcprivacy` needs "Crash Data" + "Performance Data" entries for App Store — not needed for DX24 sideload.
+- Exporter has built-in 24h local file cache; don't disable it (handles offline).
+
+**PII discipline:** UserProfileId (GUID) yes. Email/display name/Korean user text NO. Scrub at log sites, not via a processor (easier). OTel doesn't capture HTTP bodies by default — don't opt in.
+
+**Effort:** ~1 day total. Recommended first-increment slice is ~3 hours: exporter + MauiExceptions subscriber only, Mac Catalyst first, prove the pipe works before investing in JS bridge / custom events / iOS AOT work.
+
+**Memo filed:** `.squad/decisions/inbox/wash-mobile-observability.md`.
+
+**Rule of thumb learned:** Before proposing new infrastructure, always inventory what's already wired. `MauiServiceDefaults` had the whole OTel pipeline sitting there, gated on an env var. The real gap was an exporter + a subscriber, not a rebuild.
+
+
+---
+
+## Learnings — 2026-04-20 (Mobile App Insights follow-up: TinyInsights eval + security stance)
+
+**Connection string is write-only; embed it.** InstrumentationKey authorizes ingestion push only — can't read telemetry or touch other Azure resources. Microsoft's own docs tell mobile/desktop/JS clients to ship it in the app bundle. Worst case is fake-telemetry spam, bounded by daily ingestion cap ($5/day) + sampling. All the "secure" alternatives (fetch from API at startup, per-user keys, Key Vault) are **strictly worse** for a mobile app — chicken-and-egg (no telemetry if API is down, which is exactly when you need it), massive complexity for zero security gain, or require an Azure identity the app doesn't have. Rule: write-only keys with bounded blast radius belong in the client. Read-capable secrets never do.
+
+**TinyInsights.Maui evaluated — REJECTED for this project.** Active project (Daniel Hindrikes, MVP, net10 support Jan 2026, crash improvements Apr 2026), nice developer ergonomics. BUT it depends on the **legacy `Microsoft.ApplicationInsights` 2.23.0** SDK, not OpenTelemetry. Our `SentenceStudio.MauiServiceDefaults` already has an OTel pipeline, and the API side is planning Azure Monitor OTel exporter. Mixing SDK families breaks W3C `traceparent` correlation between MAUI and API — which is the whole reason we want ONE App Insights resource in the first place. Would also duplicate telemetry (double HttpClient tracking, double exporters, double cost). Stuck with `Azure.Monitor.OpenTelemetry.Exporter` 1.3.0.
+
+**Rule of thumb: SDK family consistency > convenience.** When the server tier commits to OpenTelemetry, the client tier has to stay on OpenTelemetry too — or correlation is theater. Check the `<PackageReference>` before adopting any MAUI observability library: if it pulls `Microsoft.ApplicationInsights.*` (classic SDK) and your server uses `Azure.Monitor.OpenTelemetry.Exporter`, walk away no matter how good the DX looks.
+
+**First-increment plan UNCHANGED.** TinyInsights rejection doesn't alter the 3-hour Mac Catalyst slice: add exporter package, wire `UseAzureMonitor` in `ConfigureOpenTelemetry` guarded on Release+connection-string-present, subscribe `ILogger` to `MauiExceptions.UnhandledException`, embed connection string in `appsettings.Production.json`. Ship in parallel with server memo's PR so day-one traces already span both tiers.