Skip to content

docs(stage5): concrete 'Run it' runbook for the live OpenRouter demo#49

Closed
hanwencheng wants to merge 8 commits intomainfrom
docs/stage5-demo-runbook-fix
Closed

docs(stage5): concrete 'Run it' runbook for the live OpenRouter demo#49
hanwencheng wants to merge 8 commits intomainfrom
docs/stage5-demo-runbook-fix

Conversation

@hanwencheng
Copy link
Copy Markdown
Member

Summary

After apps#48 landed, docs/manual-test-stage5.md's Gmail-setup section ends at "Daemon running and paired — see the Stage 4 manual test guide". But Stage 5a provision doesn't actually need a paired daemon — `cmd_provision` runs as the master CLI and uses `session.wallet` as the `agent_id`. A reader who finished Gmail setup had no concrete path from there to a running demo.

Replaces the vague pointer with an explicit two-terminal runbook:

  • Terminal 1: `cargo run --release -p agentkeys-mock-server -- --port 8090` (Stage 5a stores into the mock backend; real Heima lands in v0.1).
  • Terminal 2: `agentkeys init --mock-token ...` + `agentkeys provision openrouter` + verification (read back, curl OpenRouter /models).

Plus an "under the hood" breakdown explaining why no daemon or pairing is involved, and a short "artifacts to inspect" pointer (`~/.agentkeys/master/session.json`, provision audit JSONL).

Also promotes the build-and-install step from prose to step 5 so the prerequisites list is self-contained and paste-able.

Test plan

  • `bash harness/stage-5a-done.sh` still exits 0 (doc-only change, no code touched)
  • Follow the new runbook end-to-end against a Gmail bot account after the OpenRouter-ToS gate clears; verify key stored + `curl` returns 200

The Gmail-setup section ended at 'Daemon running and paired — see the
Stage 4 manual test guide', but Stage 5a provision doesn't actually
need a paired daemon: cmd_provision runs as the master CLI and uses
session.wallet as the agent_id. A reader who finished Gmail setup had
no concrete path from there to a running demo.

Replace the vague pointer with an explicit two-terminal runbook:
- Terminal 1: mock backend (Stage 5a stores into the mock; real Heima
  lands in v0.1).
- Terminal 2: agentkeys init --mock-token + agentkeys provision
  openrouter + verification (read back the key, curl OpenRouter).
Plus the 'under the hood' breakdown so a reader knows why no daemon
or pairing is involved, and a short 'artifacts to inspect' pointer
(session.json path, audit JSONL).

Also promotes the build-and-install step from prose to step 5 so
the prerequisites list is self-contained and paste-able.
Three tightly-coupled changes so the Stage 5a live demo is both
re-runnable (returning-user collision) and debuggable (no more silent
'subprocess ended without terminal event' with no cause).

provisioner-scripts/src/scrapers/openrouter.ts
- Split AGENTKEYS_EMAIL_USER (canonical IMAP login) from a new
  AGENTKEYS_SIGNUP_EMAIL (what we type into OpenRouter's signup form).
  Gmail IMAP rejects plus-addressing at login, so the two had to
  diverge before plus-addressing could work at all.
- Wrap main() in a catch-all that emits a terminal error event and
  flushes stdout before process.exit. Playwright launch failures,
  dynamic-import errors, IMAP connection refusals, and any other
  throws upstream of the scraper's inner try/catch now surface as
  a parseable Error event instead of dying silently and being
  reported as 'subprocess ended without terminal event.'

crates/agentkeys-provisioner/src/orchestrator.rs
- On the no-terminal-event error path, best-effort-write the full
  subprocess output (exit code, every event emitted, complete stderr)
  to ~/.agentkeys/logs/provision-<service>-<ts>.log and include the
  path in the error message. stderr_tail (20 lines) stays inline for
  the quick case.

docs/manual-test-stage5.md
- Flip the primary demo path from 'dedicated throwaway Gmail' to
  'your existing Gmail + plus-addressing + app password.' Reason
  documented: OpenRouter's /auth is signup+signin on one URL, so
  reusing a canonical address across runs always fails on the second
  run with a returning-user UI the scraper wasn't designed for.
  Plus-addressing minted per-run via $(date +%s) gives us
  DWD-equivalent disposable emails at zero infrastructure cost.
- Document the two env vars and why they exist separately.
- Dedicated-throwaway-Gmail + Workspace DWD demoted to <details>
  alternatives.
- New 'Debugging a failure' block under Artifacts pointing to the
  persistent log file + the direct-scraper-run fallback.
- New 'subprocess ended without terminal event' and 'account already
  exists (returning-user path)' entries in Failure modes.

Tests:
- cargo test -p agentkeys-provisioner --release: 15/15 pass
- npm test --prefix provisioner-scripts: 15/15 pass across 6 files
Root cause of the 'exit_code: Some(0) / events_emitted: 0 / stderr
empty' failure mode: openrouter.ts declares `export default async
function main()` but nothing at module scope invokes it. When the
provisioner runs `npx tsx provisioner-scripts/src/scrapers/openrouter.ts`,
the module loads (imports + constant declarations + function decls),
reaches EOF, and exits cleanly without ever calling main(). The
orchestrator then correctly reports 'no terminal event' because the
scraper genuinely emitted none.

Tests did not catch this because they only import the named export
`runOpenRouterScraper`, not the default `main`.

Add the standard Node ESM entry-point guard at the bottom of the
file. main() runs only when the file is the direct script target
(argv[1] matches import.meta.url). Named-export imports from test
files still bypass it, so the 15/15 TS test suite stays green.

Tests:
- npx tsc --noEmit: clean
- npm test --prefix provisioner-scripts: 15/15 pass across 6 files
harness/stage-5a-live-demo-handoff.sh: preflights the Stage 5a live
demo end-to-end in a single bash run.

Checks:
- all 5 AGENTKEYS_EMAIL_* env vars present (fail-fast via :? with
  pointed error text for each)
- target/release/agentkeys exists + executable
- mock-server reachable at $BACKEND
- node + npx on PATH
- provisioner-scripts deps installed
- Playwright chromium_headless_shell-* installed under $HOME
  (guards against the sandbox-HOME gotcha discovered in this
  ralph session — Playwright caches browsers per-HOME and a
  fresh HOME without cached browsers fails with "browserType.launch:
  Executable doesn't exist")

Auto-mints AGENTKEYS_SIGNUP_EMAIL as <local>+or-<ts>@<domain> if
unset so each run hits the OpenRouter signup path with a fresh
email — no manual rotation needed.

Executes the four Stage 5a acceptance criteria in order:
1. agentkeys init + provision openrouter (exit 0 required)
2. masked-key form check on stdout
3. agentkeys read openrouter returns sk-or-v1-... prefix
4. curl OpenRouter /api/v1/models returns HTTP 200

On failure, dumps the most-recent provision-openrouter-*.log so
the user has the full stderr/events from the subprocess.
Three artifacts captured during ralph session driving the live
OpenRouter provision to ground truth.

harness/stage-5a-live-demo-handoff.sh: strip any existing plus-alias
from AGENTKEYS_EMAIL_USER before appending +or-<ts>. Some email
validators (including the one OpenRouter currently uses) reject
double-plus addresses like agent+2026042001+or-...@wildmeta.ai and
silently drop the signup. Gmail's inbound delivery path handles it
fine; the signup form does not.

provisioner-scripts/diag-imap.mjs: standalone probe that verifies
IMAP auth works with the configured AGENTKEYS_EMAIL_* env, lists
all mailboxes, and searches INBOX / Spam / All Mail / Trash for
recent OpenRouter verification emails. Distinguishes "auth failed"
/ "email went to spam" / "email never arrived" failure modes that
the scraper's EmailTimeout tripwire conflates.

provisioner-scripts/diag-openrouter.mjs: standalone Playwright probe
against the live openrouter.ai signup page. Captures screenshots +
HTML snapshots + a JSON inventory of all input/button candidates to
reveal where real DOM diverges from the scraper's hardcoded selectors.
Used in this session to confirm OpenRouter migrated to Clerk (field
name changed email -> emailAddress, button has no type=submit) — a
Stage 5b blocker, not a Stage 5a bug.
harness/stage-5a-live-demo-handoff.sh
- Drop misleading "JSON summary" claim from header — script prints
  SUCCESS but not JSON
- Drop dead repo-root node_modules branch (never exists in this
  project; deps only live at provisioner-scripts/node_modules)
- Collapse redundant step 4 header that had no check into step's 4
  section (AC#1-#3 read-back check); renumber step 5 accordingly.
  Prior numbering was 1→2→3→4(empty)→5→6 with the 4th being just
  a comment.

provisioner-scripts/diag-imap.mjs
- Fix stale usage comment: file was moved from harness/ into
  provisioner-scripts/ (imapflow resolution) but the header still
  pointed at the old path.

provisioner-scripts/diag-openrouter.mjs
- Drop dead `|| candidates.find(...)` fallback in submit-button
  lookup. `buttons` is already filtered with the same
  /sign|continue|next|submit|start/i regex, so the fallback is a
  strict subset of the main filter and can never fire with a
  different value.

Post-deslop regression:
- cargo test --release -p agentkeys-provisioner: 15/15 pass
- npm test --prefix provisioner-scripts: 15/15 pass across 6 files
- handoff preflight smoke with no env: exit 1, clear missing-var msg
Stage 5b MVP CDP-connected scraper proven end-to-end, blocked on
email-duplicate. Pivot unblocked by adding throwaway-inbox
provisioning as a named Stage 6 deliverable.

provisioner-scripts/src/scrapers/openrouter-cdp.ts (new)
  Connects to a user-launched real Chrome via chromium.connectOverCDP,
  drives OpenRouter's Clerk-hosted signup form, polls Gmail IMAP for
  the OTP, mints a key on /keys, prints sk-or-v1-* on stdout. Two
  bugs fixed during the session:
  - Click the checkbox INPUT directly, not the label (label wraps a
    "Terms of Service" link that navigates to /terms)
  - When the 180s Turnstile wait expires and URL is still /sign-up
    with no OTP input present, fail explicitly instead of falling
    through to a bogus OTP-waiting step.

  Why CDP and not Playwright-launched Chromium:
    Playwright's bundled Chromium ships with --enable-automation.
    Cloudflare Turnstile detects this (error 600010) and refuses
    to issue a token even when a human clicks the checkbox.
    Connect to a real Chrome (launched with --remote-debugging-port)
    bypasses this because the browser process has no automation
    flags. Verified 2026-04-20: Turnstile passes invisibly in real
    Chrome, Clerk backend returns clean responses.

  Known blocker:
    OpenRouter's Clerk integration normalizes Gmail/Workspace
    plus-aliases to canonical. If agent@wildmeta.ai already has an
    OpenRouter account, every plus-aliased variant gets rejected
    with "email already in use." Only distinct local-parts work.
    That's why Stage 6 throwaway inbox provisioning (bot-<id>@
    agentkeys-email.io per call) is what unblocks the live demo.

provisioner-scripts/diag-or-{flow,turnstile,signin}.mjs (new)
  Standalone Node probes used to diagnose the Turnstile failure.
  Kept as runtime evidence for the Clerk-moved-to-Radix-UI
  discovery and for future scraper authors' reference.

docs/manual-test-stage5.md (modified)
  Section 4 rewritten from "when Stage 5b lands, future" to "CDP
  scraper partial: proven working, blocked on email duplicate."
  Includes: the run-recipe with Chrome --remote-debugging-port
  command, required env, known blocker, Stage-6-dependent pickup
  checklist.

docs/spec/plans/development-stages.md (modified)
  Stage 6 deliverables extended with two named items:
  - Throwaway inbox provisioning API: mint unique local-parts per
    call (Clerk-normalization-proof), readable via the same
    fetchVerificationCode shape the Stage 5b scraper uses.
  - Stage 5b live-demo re-run: once throwaway provisioning lands,
    re-run the CDP scraper end-to-end. Closes the manual-test-stage5
    §4 pickup item.
  Plus two test rows: email::throwaway_inbox_provisioning and
  email::stage5b_live_demo_rerun.

docs/manual-test-stage6.md (new)
  Stage 6 manual demo guide: preflight, provision-throwaway-inbox
  walkthrough, per-user isolation test, Stage 5b live-demo re-run
  procedure. Structured like Stage 5 doc so both are readable in
  parallel.

.gitignore (modified)
  Add .gstack/ — gstack creates .gstack/browse.json at repo root
  during connect-chrome; not a repo artifact.

Post-change regression (fresh):
- cargo test --release -p agentkeys-provisioner: 15/15 pass
- npm test --prefix provisioner-scripts: 15/15 pass across 6 files
- Update how-to-use block to warn about Clerk's plus-alias normalization
  (SIGNUP_EMAIL must be a local-part OpenRouter hasn't seen)
- Fix outdated '120s' claim in header — actual wait is 180s
- Trim redundant log line that duplicated the block comment below it

Post-deslop regression:
- npm test --prefix provisioner-scripts: 15/15 pass
- npx tsc --noEmit: clean
@hanwencheng
Copy link
Copy Markdown
Member Author

Superseded by the new PR on docs/stage6-aws-setup (Stage 5 + Stage 6 + workflow-recorder + production scrapers bundle). Every commit on this branch is already present in the new branch's history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants