Stage 0 of the Stage 5b LLM-fallback plan — the always-on baseline every AgentKeys caller gets regardless of MCP support. Plan: docs/spec/plans/ (see plan doc referenced in PR #52 follow-ups).
Goal
When a deterministic scraper hits an unrecognized provider-side UI change (a tripwire), fail the current caller fast with a structured "known-issue, fix pending" outcome and feed the drift into a demand-driven pipeline that opens a GitHub issue within ~1 hour — without burning accounts when providers haven't actually changed anything.
Replaces the weekly full-matrix runner cadence (kept as a manual owner command) with an auto-batched, hourly-conditional default.
Scope
-
Scraper-side tripwire emission — openrouter-cdp.ts + openai-cdp.ts:
- New terminal event type
{\"type\":\"tripwire\",\"kind\":\"selector-missing|unexpected-nav|timeout\",\"step\":\"<last>\",\"url\":\"<current>\",\"screenshot_b64\":\"<png>\",\"dom_digest\":\"<a11y-tree>\",\"resume_token\":\"<uuid>\"}.
- Exit 0 (intentional handoff, not a crash). Guard: tripwire only fires for shape-changes — bad creds / wrong OTP keep emitting
error.
- Reuse
logAction / snap from provisioner-scripts/src/workflow-recorder/artifacts.ts for screenshot + DOM capture.
-
Rust orchestrator — crates/agentkeys-provisioner/src/{subprocess.rs,orchestrator.rs}:
- Parse
tripwire as a distinct outcome (not an error).
- Persist
resume_token + captured state to ~/.agentkeys/fallback/<token>.json.
- Default behavior: return
NeedsFallback { token, tier: \"none_configured\" } to the caller with a "drift reported, fix pending" message.
- Extend
ProvisionMetric::TripWireFired with DOM-state payload.
-
Telemetry client (daemon side):
- Opt-in via
AGENTKEYS_TELEMETRY=1 (default off for privacy).
- Payload (PII-stripped):
{service, tripwire.kind, tripwire.step, dom_digest, url_host, scraper_version} — no signup email, no screenshots in default tier, no full DOM — just the structural digest hash.
- Transport: single
POST /drift/report to the AgentKeys backend.
- Failure isolation: telemetry post failures never block provisioning. Queue + retry on next invocation.
- Structured JSONL audit always written to
~/.agentkeys/audit/ regardless of telemetry opt-in.
-
Backend drift-tracker endpoint (new service, TBD stack — simplest viable: Supabase / Cloudflare KV + Lambda):
POST /drift/report receives the PII-stripped payload.
- Auto-batches on ingest by
(service, step_hash, day). Duplicate reports within the same day coalesce into a single entry with an incremented count and latest screenshot_ref. No manual grouping step — batching is a side effect of the handler.
- Exposes a query API the runner uses: "list services with ≥1 fresh report in the last hour".
-
Hourly conditional drift runner — new .github/workflows/drift-runner.yml:
schedule: cron: \"0 * * * *\".
- Queries the batch store for services with ≥1 fresh tripwire report in the past hour.
- For each matching service (and only those): runs the live-test runner for that single service, reproduces on a fresh account, attaches screenshots + DOM to an auto-opened issue titled
drift(<service>): <step> failing as of <ts>, marks the batch entry as "handled".
- Services with zero reports: skipped entirely — cron wakes, sees empty batch, exits.
- Coalescing: if a second tripwire lands within 1 hour of an already-open issue for the same
(service, step_hash), bump the issue's count + attach the new screenshot. Don't spawn a second runner until the first issue closes.
- Existing
provisioner-scripts/scripts/weekly-live-test.sh stays as a manual owner command (not scheduled).
Acceptance
- Synthetic injection: add a fake
<button>Accept new ToS</button> to OpenAI's signup page via a test harness. Run node src/scrapers/openai-cdp.ts. Expect {\"type\":\"tripwire\",\"kind\":\"selector-missing\",\"step\":\"signup-form\",\"screenshot_b64\":\"...\"} and exit 0.
- CLI caller with no fallback config: expects
NeedsFallback { tier: \"none_configured\" } outcome + a "drift reported, fix pending" message, NOT a crash.
- Backend-side batching: POST two synthetic
drift/report payloads for the same (service, step_hash) within an hour; inspect the batch store and confirm they coalesce into a single entry with count=2.
- Hourly runner catches the synthetic drift: trigger
drift-runner.yml manually (workflow_dispatch) with a seeded batch → confirm it fires the live-test for the one reported service only (skips the others) and auto-opens a GitHub issue with screenshot + DOM digest attached.
- Empty-batch run: trigger the workflow with zero reports in the store → confirm it exits cleanly with no runner invocation and no issue opened.
- Audit JSONL written to
~/.agentkeys/audit/<ts>.jsonl regardless of telemetry opt-in.
Out of scope
Stage 0 of the Stage 5b LLM-fallback plan — the always-on baseline every AgentKeys caller gets regardless of MCP support. Plan:
docs/spec/plans/(see plan doc referenced in PR #52 follow-ups).Goal
When a deterministic scraper hits an unrecognized provider-side UI change (a tripwire), fail the current caller fast with a structured "known-issue, fix pending" outcome and feed the drift into a demand-driven pipeline that opens a GitHub issue within ~1 hour — without burning accounts when providers haven't actually changed anything.
Replaces the weekly full-matrix runner cadence (kept as a manual owner command) with an auto-batched, hourly-conditional default.
Scope
Scraper-side tripwire emission — openrouter-cdp.ts + openai-cdp.ts:
{\"type\":\"tripwire\",\"kind\":\"selector-missing|unexpected-nav|timeout\",\"step\":\"<last>\",\"url\":\"<current>\",\"screenshot_b64\":\"<png>\",\"dom_digest\":\"<a11y-tree>\",\"resume_token\":\"<uuid>\"}.error.logAction/snapfromprovisioner-scripts/src/workflow-recorder/artifacts.tsfor screenshot + DOM capture.Rust orchestrator —
crates/agentkeys-provisioner/src/{subprocess.rs,orchestrator.rs}:tripwireas a distinct outcome (not an error).resume_token+ captured state to~/.agentkeys/fallback/<token>.json.NeedsFallback { token, tier: \"none_configured\" }to the caller with a "drift reported, fix pending" message.ProvisionMetric::TripWireFiredwith DOM-state payload.Telemetry client (daemon side):
AGENTKEYS_TELEMETRY=1(default off for privacy).{service, tripwire.kind, tripwire.step, dom_digest, url_host, scraper_version}— no signup email, no screenshots in default tier, no full DOM — just the structural digest hash.POST /drift/reportto the AgentKeys backend.~/.agentkeys/audit/regardless of telemetry opt-in.Backend drift-tracker endpoint (new service, TBD stack — simplest viable: Supabase / Cloudflare KV + Lambda):
POST /drift/reportreceives the PII-stripped payload.(service, step_hash, day). Duplicate reports within the same day coalesce into a single entry with an incrementedcountand latestscreenshot_ref. No manual grouping step — batching is a side effect of the handler.Hourly conditional drift runner — new
.github/workflows/drift-runner.yml:schedule: cron: \"0 * * * *\".drift(<service>): <step> failing as of <ts>, marks the batch entry as "handled".(service, step_hash), bump the issue's count + attach the new screenshot. Don't spawn a second runner until the first issue closes.provisioner-scripts/scripts/weekly-live-test.shstays as a manual owner command (not scheduled).Acceptance
<button>Accept new ToS</button>to OpenAI's signup page via a test harness. Runnode src/scrapers/openai-cdp.ts. Expect{\"type\":\"tripwire\",\"kind\":\"selector-missing\",\"step\":\"signup-form\",\"screenshot_b64\":\"...\"}and exit 0.NeedsFallback { tier: \"none_configured\" }outcome + a "drift reported, fix pending" message, NOT a crash.drift/reportpayloads for the same(service, step_hash)within an hour; inspect the batch store and confirm they coalesce into a single entry withcount=2.drift-runner.ymlmanually (workflow_dispatch) with a seeded batch → confirm it fires the live-test for the one reported service only (skips the others) and auto-opens a GitHub issue with screenshot + DOM digest attached.~/.agentkeys/audit/<ts>.jsonlregardless of telemetry opt-in.Out of scope
resume_provisiontool — tracked in the Stage 1 sibling issue./agentkeys-ship-scraperskill).