docs(stage5): concrete 'Run it' runbook for the live OpenRouter demo#49
Closed
hanwencheng wants to merge 8 commits intomainfrom
Closed
docs(stage5): concrete 'Run it' runbook for the live OpenRouter demo#49hanwencheng wants to merge 8 commits intomainfrom
hanwencheng wants to merge 8 commits intomainfrom
Conversation
The Gmail-setup section ended at 'Daemon running and paired — see the Stage 4 manual test guide', but Stage 5a provision doesn't actually need a paired daemon: cmd_provision runs as the master CLI and uses session.wallet as the agent_id. A reader who finished Gmail setup had no concrete path from there to a running demo. Replace the vague pointer with an explicit two-terminal runbook: - Terminal 1: mock backend (Stage 5a stores into the mock; real Heima lands in v0.1). - Terminal 2: agentkeys init --mock-token + agentkeys provision openrouter + verification (read back the key, curl OpenRouter). Plus the 'under the hood' breakdown so a reader knows why no daemon or pairing is involved, and a short 'artifacts to inspect' pointer (session.json path, audit JSONL). Also promotes the build-and-install step from prose to step 5 so the prerequisites list is self-contained and paste-able.
Three tightly-coupled changes so the Stage 5a live demo is both re-runnable (returning-user collision) and debuggable (no more silent 'subprocess ended without terminal event' with no cause). provisioner-scripts/src/scrapers/openrouter.ts - Split AGENTKEYS_EMAIL_USER (canonical IMAP login) from a new AGENTKEYS_SIGNUP_EMAIL (what we type into OpenRouter's signup form). Gmail IMAP rejects plus-addressing at login, so the two had to diverge before plus-addressing could work at all. - Wrap main() in a catch-all that emits a terminal error event and flushes stdout before process.exit. Playwright launch failures, dynamic-import errors, IMAP connection refusals, and any other throws upstream of the scraper's inner try/catch now surface as a parseable Error event instead of dying silently and being reported as 'subprocess ended without terminal event.' crates/agentkeys-provisioner/src/orchestrator.rs - On the no-terminal-event error path, best-effort-write the full subprocess output (exit code, every event emitted, complete stderr) to ~/.agentkeys/logs/provision-<service>-<ts>.log and include the path in the error message. stderr_tail (20 lines) stays inline for the quick case. docs/manual-test-stage5.md - Flip the primary demo path from 'dedicated throwaway Gmail' to 'your existing Gmail + plus-addressing + app password.' Reason documented: OpenRouter's /auth is signup+signin on one URL, so reusing a canonical address across runs always fails on the second run with a returning-user UI the scraper wasn't designed for. Plus-addressing minted per-run via $(date +%s) gives us DWD-equivalent disposable emails at zero infrastructure cost. - Document the two env vars and why they exist separately. - Dedicated-throwaway-Gmail + Workspace DWD demoted to <details> alternatives. - New 'Debugging a failure' block under Artifacts pointing to the persistent log file + the direct-scraper-run fallback. - New 'subprocess ended without terminal event' and 'account already exists (returning-user path)' entries in Failure modes. Tests: - cargo test -p agentkeys-provisioner --release: 15/15 pass - npm test --prefix provisioner-scripts: 15/15 pass across 6 files
Root cause of the 'exit_code: Some(0) / events_emitted: 0 / stderr empty' failure mode: openrouter.ts declares `export default async function main()` but nothing at module scope invokes it. When the provisioner runs `npx tsx provisioner-scripts/src/scrapers/openrouter.ts`, the module loads (imports + constant declarations + function decls), reaches EOF, and exits cleanly without ever calling main(). The orchestrator then correctly reports 'no terminal event' because the scraper genuinely emitted none. Tests did not catch this because they only import the named export `runOpenRouterScraper`, not the default `main`. Add the standard Node ESM entry-point guard at the bottom of the file. main() runs only when the file is the direct script target (argv[1] matches import.meta.url). Named-export imports from test files still bypass it, so the 15/15 TS test suite stays green. Tests: - npx tsc --noEmit: clean - npm test --prefix provisioner-scripts: 15/15 pass across 6 files
harness/stage-5a-live-demo-handoff.sh: preflights the Stage 5a live demo end-to-end in a single bash run. Checks: - all 5 AGENTKEYS_EMAIL_* env vars present (fail-fast via :? with pointed error text for each) - target/release/agentkeys exists + executable - mock-server reachable at $BACKEND - node + npx on PATH - provisioner-scripts deps installed - Playwright chromium_headless_shell-* installed under $HOME (guards against the sandbox-HOME gotcha discovered in this ralph session — Playwright caches browsers per-HOME and a fresh HOME without cached browsers fails with "browserType.launch: Executable doesn't exist") Auto-mints AGENTKEYS_SIGNUP_EMAIL as <local>+or-<ts>@<domain> if unset so each run hits the OpenRouter signup path with a fresh email — no manual rotation needed. Executes the four Stage 5a acceptance criteria in order: 1. agentkeys init + provision openrouter (exit 0 required) 2. masked-key form check on stdout 3. agentkeys read openrouter returns sk-or-v1-... prefix 4. curl OpenRouter /api/v1/models returns HTTP 200 On failure, dumps the most-recent provision-openrouter-*.log so the user has the full stderr/events from the subprocess.
Three artifacts captured during ralph session driving the live OpenRouter provision to ground truth. harness/stage-5a-live-demo-handoff.sh: strip any existing plus-alias from AGENTKEYS_EMAIL_USER before appending +or-<ts>. Some email validators (including the one OpenRouter currently uses) reject double-plus addresses like agent+2026042001+or-...@wildmeta.ai and silently drop the signup. Gmail's inbound delivery path handles it fine; the signup form does not. provisioner-scripts/diag-imap.mjs: standalone probe that verifies IMAP auth works with the configured AGENTKEYS_EMAIL_* env, lists all mailboxes, and searches INBOX / Spam / All Mail / Trash for recent OpenRouter verification emails. Distinguishes "auth failed" / "email went to spam" / "email never arrived" failure modes that the scraper's EmailTimeout tripwire conflates. provisioner-scripts/diag-openrouter.mjs: standalone Playwright probe against the live openrouter.ai signup page. Captures screenshots + HTML snapshots + a JSON inventory of all input/button candidates to reveal where real DOM diverges from the scraper's hardcoded selectors. Used in this session to confirm OpenRouter migrated to Clerk (field name changed email -> emailAddress, button has no type=submit) — a Stage 5b blocker, not a Stage 5a bug.
harness/stage-5a-live-demo-handoff.sh - Drop misleading "JSON summary" claim from header — script prints SUCCESS but not JSON - Drop dead repo-root node_modules branch (never exists in this project; deps only live at provisioner-scripts/node_modules) - Collapse redundant step 4 header that had no check into step's 4 section (AC#1-#3 read-back check); renumber step 5 accordingly. Prior numbering was 1→2→3→4(empty)→5→6 with the 4th being just a comment. provisioner-scripts/diag-imap.mjs - Fix stale usage comment: file was moved from harness/ into provisioner-scripts/ (imapflow resolution) but the header still pointed at the old path. provisioner-scripts/diag-openrouter.mjs - Drop dead `|| candidates.find(...)` fallback in submit-button lookup. `buttons` is already filtered with the same /sign|continue|next|submit|start/i regex, so the fallback is a strict subset of the main filter and can never fire with a different value. Post-deslop regression: - cargo test --release -p agentkeys-provisioner: 15/15 pass - npm test --prefix provisioner-scripts: 15/15 pass across 6 files - handoff preflight smoke with no env: exit 1, clear missing-var msg
Stage 5b MVP CDP-connected scraper proven end-to-end, blocked on
email-duplicate. Pivot unblocked by adding throwaway-inbox
provisioning as a named Stage 6 deliverable.
provisioner-scripts/src/scrapers/openrouter-cdp.ts (new)
Connects to a user-launched real Chrome via chromium.connectOverCDP,
drives OpenRouter's Clerk-hosted signup form, polls Gmail IMAP for
the OTP, mints a key on /keys, prints sk-or-v1-* on stdout. Two
bugs fixed during the session:
- Click the checkbox INPUT directly, not the label (label wraps a
"Terms of Service" link that navigates to /terms)
- When the 180s Turnstile wait expires and URL is still /sign-up
with no OTP input present, fail explicitly instead of falling
through to a bogus OTP-waiting step.
Why CDP and not Playwright-launched Chromium:
Playwright's bundled Chromium ships with --enable-automation.
Cloudflare Turnstile detects this (error 600010) and refuses
to issue a token even when a human clicks the checkbox.
Connect to a real Chrome (launched with --remote-debugging-port)
bypasses this because the browser process has no automation
flags. Verified 2026-04-20: Turnstile passes invisibly in real
Chrome, Clerk backend returns clean responses.
Known blocker:
OpenRouter's Clerk integration normalizes Gmail/Workspace
plus-aliases to canonical. If agent@wildmeta.ai already has an
OpenRouter account, every plus-aliased variant gets rejected
with "email already in use." Only distinct local-parts work.
That's why Stage 6 throwaway inbox provisioning (bot-<id>@
agentkeys-email.io per call) is what unblocks the live demo.
provisioner-scripts/diag-or-{flow,turnstile,signin}.mjs (new)
Standalone Node probes used to diagnose the Turnstile failure.
Kept as runtime evidence for the Clerk-moved-to-Radix-UI
discovery and for future scraper authors' reference.
docs/manual-test-stage5.md (modified)
Section 4 rewritten from "when Stage 5b lands, future" to "CDP
scraper partial: proven working, blocked on email duplicate."
Includes: the run-recipe with Chrome --remote-debugging-port
command, required env, known blocker, Stage-6-dependent pickup
checklist.
docs/spec/plans/development-stages.md (modified)
Stage 6 deliverables extended with two named items:
- Throwaway inbox provisioning API: mint unique local-parts per
call (Clerk-normalization-proof), readable via the same
fetchVerificationCode shape the Stage 5b scraper uses.
- Stage 5b live-demo re-run: once throwaway provisioning lands,
re-run the CDP scraper end-to-end. Closes the manual-test-stage5
§4 pickup item.
Plus two test rows: email::throwaway_inbox_provisioning and
email::stage5b_live_demo_rerun.
docs/manual-test-stage6.md (new)
Stage 6 manual demo guide: preflight, provision-throwaway-inbox
walkthrough, per-user isolation test, Stage 5b live-demo re-run
procedure. Structured like Stage 5 doc so both are readable in
parallel.
.gitignore (modified)
Add .gstack/ — gstack creates .gstack/browse.json at repo root
during connect-chrome; not a repo artifact.
Post-change regression (fresh):
- cargo test --release -p agentkeys-provisioner: 15/15 pass
- npm test --prefix provisioner-scripts: 15/15 pass across 6 files
- Update how-to-use block to warn about Clerk's plus-alias normalization (SIGNUP_EMAIL must be a local-part OpenRouter hasn't seen) - Fix outdated '120s' claim in header — actual wait is 180s - Trim redundant log line that duplicated the block comment below it Post-deslop regression: - npm test --prefix provisioner-scripts: 15/15 pass - npx tsc --noEmit: clean
Member
Author
|
Superseded by the new PR on |
Merged
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
After apps#48 landed, docs/manual-test-stage5.md's Gmail-setup section ends at "Daemon running and paired — see the Stage 4 manual test guide". But Stage 5a provision doesn't actually need a paired daemon — `cmd_provision` runs as the master CLI and uses `session.wallet` as the `agent_id`. A reader who finished Gmail setup had no concrete path from there to a running demo.
Replaces the vague pointer with an explicit two-terminal runbook:
Plus an "under the hood" breakdown explaining why no daemon or pairing is involved, and a short "artifacts to inspect" pointer (`~/.agentkeys/master/session.json`, provision audit JSONL).
Also promotes the build-and-install step from prose to step 5 so the prerequisites list is self-contained and paste-able.
Test plan