Stage 0 — tripwire telemetry + auto-batched hourly drift runner (LLM-fallback baseline)

Stage 0 of the Stage 5b LLM-fallback plan — the always-on baseline every AgentKeys caller gets regardless of MCP support. Plan: `docs/spec/plans/` (see plan doc referenced in PR #52 follow-ups).

## Goal

When a deterministic scraper hits an unrecognized provider-side UI change (a *tripwire*), fail the current caller fast with a structured "known-issue, fix pending" outcome **and** feed the drift into a demand-driven pipeline that opens a GitHub issue within ~1 hour — without burning accounts when providers haven't actually changed anything.

Replaces the weekly full-matrix runner cadence (kept as a manual owner command) with an auto-batched, hourly-conditional default.

## Scope

1. **Scraper-side tripwire emission** — [openrouter-cdp.ts](../blob/main/provisioner-scripts/src/scrapers/openrouter-cdp.ts) + [openai-cdp.ts](../blob/main/provisioner-scripts/src/scrapers/openai-cdp.ts):
   - New terminal event type `{\"type\":\"tripwire\",\"kind\":\"selector-missing|unexpected-nav|timeout\",\"step\":\"<last>\",\"url\":\"<current>\",\"screenshot_b64\":\"<png>\",\"dom_digest\":\"<a11y-tree>\",\"resume_token\":\"<uuid>\"}`.
   - Exit 0 (intentional handoff, not a crash). Guard: tripwire only fires for shape-changes — bad creds / wrong OTP keep emitting `error`.
   - Reuse `logAction` / `snap` from `provisioner-scripts/src/workflow-recorder/artifacts.ts` for screenshot + DOM capture.

2. **Rust orchestrator** — `crates/agentkeys-provisioner/src/{subprocess.rs,orchestrator.rs}`:
   - Parse `tripwire` as a distinct outcome (not an error).
   - Persist `resume_token` + captured state to `~/.agentkeys/fallback/<token>.json`.
   - Default behavior: return `NeedsFallback { token, tier: \"none_configured\" }` to the caller with a \"drift reported, fix pending\" message.
   - Extend `ProvisionMetric::TripWireFired` with DOM-state payload.

3. **Telemetry client** (daemon side):
   - Opt-in via `AGENTKEYS_TELEMETRY=1` (default off for privacy).
   - Payload (PII-stripped): `{service, tripwire.kind, tripwire.step, dom_digest, url_host, scraper_version}` — no signup email, no screenshots in default tier, no full DOM — just the structural digest hash.
   - Transport: single `POST /drift/report` to the AgentKeys backend.
   - Failure isolation: telemetry post failures never block provisioning. Queue + retry on next invocation.
   - Structured JSONL audit always written to `~/.agentkeys/audit/` regardless of telemetry opt-in.

4. **Backend drift-tracker endpoint** (new service, TBD stack — simplest viable: Supabase / Cloudflare KV + Lambda):
   - `POST /drift/report` receives the PII-stripped payload.
   - **Auto-batches on ingest** by `(service, step_hash, day)`. Duplicate reports within the same day coalesce into a single entry with an incremented `count` and latest `screenshot_ref`. No manual grouping step — batching is a side effect of the handler.
   - Exposes a query API the runner uses: \"list services with ≥1 fresh report in the last hour\".

5. **Hourly conditional drift runner** — new `.github/workflows/drift-runner.yml`:
   - `schedule: cron: \"0 * * * *\"`.
   - Queries the batch store for services with ≥1 fresh tripwire report in the past hour.
   - For each matching service (and only those): runs the live-test runner for that single service, reproduces on a fresh account, attaches screenshots + DOM to an auto-opened issue titled `drift(<service>): <step> failing as of <ts>`, marks the batch entry as \"handled\".
   - Services with zero reports: skipped entirely — cron wakes, sees empty batch, exits.
   - **Coalescing**: if a second tripwire lands within 1 hour of an already-open issue for the same `(service, step_hash)`, bump the issue's count + attach the new screenshot. Don't spawn a second runner until the first issue closes.
   - Existing `provisioner-scripts/scripts/weekly-live-test.sh` stays as a manual owner command (not scheduled).

## Acceptance

- Synthetic injection: add a fake `<button>Accept new ToS</button>` to OpenAI's signup page via a test harness. Run `node src/scrapers/openai-cdp.ts`. Expect `{\"type\":\"tripwire\",\"kind\":\"selector-missing\",\"step\":\"signup-form\",\"screenshot_b64\":\"...\"}` and exit 0.
- CLI caller with no fallback config: expects `NeedsFallback { tier: \"none_configured\" }` outcome + a \"drift reported, fix pending\" message, NOT a crash.
- Backend-side batching: POST two synthetic `drift/report` payloads for the same `(service, step_hash)` within an hour; inspect the batch store and confirm they coalesce into a single entry with `count=2`.
- Hourly runner catches the synthetic drift: trigger `drift-runner.yml` manually (workflow_dispatch) with a seeded batch → confirm it fires the live-test for the one reported service only (skips the others) and auto-opens a GitHub issue with screenshot + DOM digest attached.
- Empty-batch run: trigger the workflow with zero reports in the store → confirm it exits cleanly with no runner invocation and no issue opened.
- Audit JSONL written to `~/.agentkeys/audit/<ts>.jsonl` regardless of telemetry opt-in.

## Out of scope

- MCP-caller handoff / `resume_provision` tool — tracked in the Stage 1 sibling issue.
- Daemon-hosted LLM fallback for non-MCP callers — explicitly excluded (preserves Stage 5b's \"no second API key\" rule).
- Automated script-update PRs from observed fallback sessions — tracked in #51 (`/agentkeys-ship-scraper` skill).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stage 0 — tripwire telemetry + auto-batched hourly drift runner (LLM-fallback baseline) #54

Goal

Scope

Acceptance

Out of scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stage 0 — tripwire telemetry + auto-batched hourly drift runner (LLM-fallback baseline) #54

Description

Goal

Scope

Acceptance

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions