scrapers: generalize recording manifest + automate record→ship pipeline (keep hand-written production scrapers)

## Context

The `agentkeys-workflow-collection` skill's recorder + `src/scrapers/openrouter-cdp.ts` + `src/scrapers/openai-cdp.ts` (shipped in #66ac92d) proved the architecture: **hand-written production scrapers** + **recorder as iteration scaffolding**. The current pipeline:

1. `/agentkeys-workflow-collection` drives the recorder → iterates until signup mints a real key → `flows.ts` accumulates proven fixes.
2. Human ports the working flow from `flows.ts` into a new `src/scrapers/<service>-cdp.ts` by hand.

Step 2 is the gap. We do NOT want to fix it by making the emitter magically produce production-ready scrapers (the string-template approach is fragile — every `flows.ts` change needs the template re-synced, and service-specific knowledge will always leak through). Instead, **keep the hand-written scrapers as the source of truth** and make the recorder produce artifacts + tooling that shrink the porting step to a near-mechanical transcription.

## Scope

### 1. Generalize the manifest interface + helper functions

`src/workflow-recorder/artifacts.ts::Manifest` is currently scrapped-together around the OpenRouter/OpenAI happy path. Make it general:

- Flow-shape agnostic: signup AND login both serializable to the same manifest (today the manifest has `flow: \"signup\" | \"login\"` but many selectors/outcomes are signup-specific).
- Generic step-outcome vocabulary (\"fill-email\", \"click-continue\", \"wait-verification\", \"extract-key\") instead of the current mix of flow-specific labels.
- Typed \"detected\" fields: regexes (from/subject/URL), selectors (email/password/TOS/Continue/Create), timings (per-step ms), captcha-kind encountered (turnstile / hcaptcha / none / PoW-custom).
- `manifest.json` becomes the **contract**, not the debug dump. Any consumer (recorder, ship-scraper skill, drift-detector) reads the same shape.

Files: [src/workflow-recorder/artifacts.ts](src/workflow-recorder/artifacts.ts), [src/workflow-recorder/flows.ts](src/workflow-recorder/flows.ts), [src/workflow-recorder/email-analyzer.ts](src/workflow-recorder/email-analyzer.ts) (now in `src/lib/`).

### 2. Improve `agentkeys-workflow-collection` skill — emitter uses the manifest interface

The current `emitDraftScraper` string-templates a scraper inline. Replace with: read the manifest, **compose** a scraper from:

- A **stable shell** (argv parsing, env-var read, CDP connect, JSON-event emit, exit-cleanly) — identical across services, copied verbatim.
- A **service-specific body** generated by walking the manifest's step sequence and emitting the corresponding lib/playwright-patterns calls.
- Inherited behavior from the lib (no inlining of humanType / clickOuterCreate / etc.) — matches the hand-written scraper structure.

Minimal changes per service: the emitter should produce a file that is >80% identical to a hand-written scraper. Service-specific helpers (OpenRouter's `dismissOpenRouterOnboardingModals`, OpenAI's `completeOpenAIPostVerifyProfile`) still require human input (recorded as \"notes\" fields in the manifest).

Update [~/.claude/skills/agentkeys-workflow-collection/SKILL.md](https://github.com/anthropic-skills/agentkeys-workflow-collection/) Phase 4 to document the new emitter output + how to review it.

### 3. New skill: `/agentkeys-ship-scraper`

Takes the last-successful recording for a service and ships a production scraper. Flow:

1. \`--service <slug>\` argument.
2. Find most recent manifest with `state: completed` under `provisioner-scripts/recordings/<slug>-*-reference/` (reference) or `<slug>-<ts>/` (latest).
3. Emit via the new manifest-driven emitter → `src/scrapers/<slug>-cdp.ts`.
4. Run `tsc --noEmit` on the emitted file; surface errors as human decisions.
5. Run the scraper once (live) to prove minting; write outcome into manifest.
6. Stage the new file for PR.

Works for **both** `login` and `signup` flows:

- Signup flow: emits full create-account → email-verify → API-key-mint path.
- Login flow: emits login-with-credentials → API-key-mint (shorter; no email verify).

Skill lives at `~/.claude/skills/agentkeys-ship-scraper/SKILL.md`.

## Acceptance criteria

- [ ] `Manifest` interface + helper functions reviewed for flow-agnosticism; login-path recording produces a manifest shaped identically to signup.
- [ ] `emitDraftScraper` rewritten to compose from lib calls (not string-template). Output for OpenRouter recording matches hand-written `openrouter-cdp.ts` to within ~20 lines (docstring / ordering allowed to differ; behavior identical).
- [ ] `/agentkeys-ship-scraper` end-to-end: invoke on OpenRouter's reference recording → emitted scraper mints a real key live → no regression vs current `src/scrapers/openrouter-cdp.ts`.
- [ ] Same test for OpenAI.
- [ ] A third service (user's choice — Brave-login if we add a BYOK flow, or whatever unblocks next) goes record → emit → ship without hand-porting.

## Out of scope

- Fixing Brave's paid-plan paywall or ElevenLabs' hCaptcha — those are provider-side blockers.
- Telemetry hook inside scrapers (tracked separately — T1 in the sibling plan at [~/.claude/plans/the-agentkeys-workflow-collection-skill-gleaming-reef.md](https://claude.md)).
- Rust daemon wire-up (T3 in the sibling plan).

## Why this architecture

Keeping production scrapers hand-written means every service gets a clear, auditable file with service-specific logic visible. The emitter's job is to produce a **starting point**, not the final artifact. This matches the way we actually debug: when OpenRouter's modal chain changes, you edit `openrouter-cdp.ts` (not the emitter template), re-record, regenerate, diff, ship.

Related commit: #66ac92d  \`feat(scrapers): deterministic OpenRouter + OpenAI production scrapers\`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scrapers: generalize recording manifest + automate record→ship pipeline (keep hand-written production scrapers) #51

Context

Scope

1. Generalize the manifest interface + helper functions

2. Improve `agentkeys-workflow-collection` skill — emitter uses the manifest interface

3. New skill: `/agentkeys-ship-scraper`

Acceptance criteria

Out of scope

Why this architecture

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

scrapers: generalize recording manifest + automate record→ship pipeline (keep hand-written production scrapers) #51

Description

Context

Scope

1. Generalize the manifest interface + helper functions

2. Improve agentkeys-workflow-collection skill — emitter uses the manifest interface

3. New skill: /agentkeys-ship-scraper

Acceptance criteria

Out of scope

Why this architecture

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. Improve `agentkeys-workflow-collection` skill — emitter uses the manifest interface

3. New skill: `/agentkeys-ship-scraper`