Skip to content

Stage 0 — tripwire telemetry + auto-batched hourly drift runner (LLM-fallback baseline) #54

@hanwencheng

Description

@hanwencheng

Stage 0 of the Stage 5b LLM-fallback plan — the always-on baseline every AgentKeys caller gets regardless of MCP support. Plan: docs/spec/plans/ (see plan doc referenced in PR #52 follow-ups).

Goal

When a deterministic scraper hits an unrecognized provider-side UI change (a tripwire), fail the current caller fast with a structured "known-issue, fix pending" outcome and feed the drift into a demand-driven pipeline that opens a GitHub issue within ~1 hour — without burning accounts when providers haven't actually changed anything.

Replaces the weekly full-matrix runner cadence (kept as a manual owner command) with an auto-batched, hourly-conditional default.

Scope

  1. Scraper-side tripwire emissionopenrouter-cdp.ts + openai-cdp.ts:

    • New terminal event type {\"type\":\"tripwire\",\"kind\":\"selector-missing|unexpected-nav|timeout\",\"step\":\"<last>\",\"url\":\"<current>\",\"screenshot_b64\":\"<png>\",\"dom_digest\":\"<a11y-tree>\",\"resume_token\":\"<uuid>\"}.
    • Exit 0 (intentional handoff, not a crash). Guard: tripwire only fires for shape-changes — bad creds / wrong OTP keep emitting error.
    • Reuse logAction / snap from provisioner-scripts/src/workflow-recorder/artifacts.ts for screenshot + DOM capture.
  2. Rust orchestratorcrates/agentkeys-provisioner/src/{subprocess.rs,orchestrator.rs}:

    • Parse tripwire as a distinct outcome (not an error).
    • Persist resume_token + captured state to ~/.agentkeys/fallback/<token>.json.
    • Default behavior: return NeedsFallback { token, tier: \"none_configured\" } to the caller with a "drift reported, fix pending" message.
    • Extend ProvisionMetric::TripWireFired with DOM-state payload.
  3. Telemetry client (daemon side):

    • Opt-in via AGENTKEYS_TELEMETRY=1 (default off for privacy).
    • Payload (PII-stripped): {service, tripwire.kind, tripwire.step, dom_digest, url_host, scraper_version} — no signup email, no screenshots in default tier, no full DOM — just the structural digest hash.
    • Transport: single POST /drift/report to the AgentKeys backend.
    • Failure isolation: telemetry post failures never block provisioning. Queue + retry on next invocation.
    • Structured JSONL audit always written to ~/.agentkeys/audit/ regardless of telemetry opt-in.
  4. Backend drift-tracker endpoint (new service, TBD stack — simplest viable: Supabase / Cloudflare KV + Lambda):

    • POST /drift/report receives the PII-stripped payload.
    • Auto-batches on ingest by (service, step_hash, day). Duplicate reports within the same day coalesce into a single entry with an incremented count and latest screenshot_ref. No manual grouping step — batching is a side effect of the handler.
    • Exposes a query API the runner uses: "list services with ≥1 fresh report in the last hour".
  5. Hourly conditional drift runner — new .github/workflows/drift-runner.yml:

    • schedule: cron: \"0 * * * *\".
    • Queries the batch store for services with ≥1 fresh tripwire report in the past hour.
    • For each matching service (and only those): runs the live-test runner for that single service, reproduces on a fresh account, attaches screenshots + DOM to an auto-opened issue titled drift(<service>): <step> failing as of <ts>, marks the batch entry as "handled".
    • Services with zero reports: skipped entirely — cron wakes, sees empty batch, exits.
    • Coalescing: if a second tripwire lands within 1 hour of an already-open issue for the same (service, step_hash), bump the issue's count + attach the new screenshot. Don't spawn a second runner until the first issue closes.
    • Existing provisioner-scripts/scripts/weekly-live-test.sh stays as a manual owner command (not scheduled).

Acceptance

  • Synthetic injection: add a fake <button>Accept new ToS</button> to OpenAI's signup page via a test harness. Run node src/scrapers/openai-cdp.ts. Expect {\"type\":\"tripwire\",\"kind\":\"selector-missing\",\"step\":\"signup-form\",\"screenshot_b64\":\"...\"} and exit 0.
  • CLI caller with no fallback config: expects NeedsFallback { tier: \"none_configured\" } outcome + a "drift reported, fix pending" message, NOT a crash.
  • Backend-side batching: POST two synthetic drift/report payloads for the same (service, step_hash) within an hour; inspect the batch store and confirm they coalesce into a single entry with count=2.
  • Hourly runner catches the synthetic drift: trigger drift-runner.yml manually (workflow_dispatch) with a seeded batch → confirm it fires the live-test for the one reported service only (skips the others) and auto-opens a GitHub issue with screenshot + DOM digest attached.
  • Empty-batch run: trigger the workflow with zero reports in the store → confirm it exits cleanly with no runner invocation and no issue opened.
  • Audit JSONL written to ~/.agentkeys/audit/<ts>.jsonl regardless of telemetry opt-in.

Out of scope

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions