AgentKeys 是什么?
-- 你可以把它理解成一个专门给 AI agent 管 API key 的“钥匙管理员”。 - 人先把 key 存进去,agent 需要时再按权限去取,不用再把一堆 key 塞进 - .env,出了问题也能随时收回。 -
-diff --git a/.gitignore b/.gitignore index 5ec43f5..bccabfb 100644 --- a/.gitignore +++ b/.gitignore @@ -11,6 +11,20 @@ AWSCLIV2.pkg # Local developer secrets — template is checked in as .env.example. agentkeys-secrets.env +# Stage 6 runbook one-shot JSON artifacts. CLAUDE.md mandates the +# `jq -n --arg` → `$(...)` pattern piped directly into the AWS CLI call +# (no file on disk). If any of these reappear, someone reverted to the +# heredoc anti-pattern — delete and fix the runbook usage. +/bucket-policy*.json +/daemon-user-inline.json +/dns-change.json +/probe*.mjs +/probe*.js + +# tsx double-nested cwd artefact — scrapers launched from the wrong +# working dir land here. Harmless but noisy; ignore unconditionally. +provisioner-scripts/provisioner-scripts/ + # agentkeys-workflow-collection: per-run recordings (~50MB each, binary # trace.zips don't delta-compress). Keep locally; commit only curated # reference recordings via explicit negations below. diff --git a/CLAUDE.md b/CLAUDE.md index cbe02dd..3de7907 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -5,6 +5,7 @@ Rust monorepo with Cargo workspace. See `docs/spec/architecture.md` for componen See `docs/spec/credential-backend-interface.md` for the CredentialBackend trait contract (15 methods). See `docs/spec/plans/development-stages.md` for the 8-stage build plan. See `docs/spec/plans/execution-plan.md` for the orchestration runbook (ralph, team, ultraqa). +Do not read folder `docs/archived` ## Version Control Use `jj` (Jujutsu) for all version control. Never use raw `git` commands. @@ -51,3 +52,4 @@ cargo test -p agentkeys-daemon -p agentkeys-mcp cargo test -p agentkeys-provisioner npm test --prefix provisioner-scripts ``` + diff --git a/docs/agentkeys-overview-cn.html b/docs/agentkeys-overview-cn.html deleted file mode 100644 index a9562a3..0000000 --- a/docs/agentkeys-overview-cn.html +++ /dev/null @@ -1,645 +0,0 @@ - - -
- - -- 你可以把它理解成一个专门给 AI agent 管 API key 的“钥匙管理员”。 - 人先把 key 存进去,agent 需要时再按权限去取,不用再把一堆 key 塞进 - .env,出了问题也能随时收回。 -
-- 如果你有技术背景,可以先把它想成: - GUI 管理端 + Agent Sidecar + 后端凭证服务。 - Agent 通过 MCP 按需取钥匙,但 MCP 只是访问接口,不是整个产品本体。 -
-- 重点不是“把密码存起来”,而是让 agent 在需要的时候拿到 API key,而且只能拿到它该拿的那一把。 -
-- 典型用户包括 Claude Code、OpenClaw、Codex 风格环境的使用者,以及自建 agent sandbox / MCP 工具链的工程团队。 -
-- 它把三件麻烦事标准化:给 agent 发 key、把不同 agent 的权限拆开、在 sandbox 重建或 agent 出问题时快速撤销和恢复。 -
-- 它不是单纯的 “MCP server + client”。更准确地说,它是一套分布式凭证系统,MCP 只是 agent 访问这套系统的一层协议接口。 -
-- 人类通过管理端管控,agent 通过 daemon + MCP 读取,后端负责限权、审计和恢复。现阶段管理端主要是 CLI,但最终更适合做成 GUI。 -
-初始化身份、存储已有 key、批准配对、撤销访问、查看审计。当前这些动作通过 CLI 完成,但长期更适合 GUI。
-不直接碰 `.env`,而是经由 daemon 的 MCP 工具获取受 scope 限制的 credential。
-负责 session、child session、credential storage、audit、rendezvous、auth request。
-agentkeys store my-agent openrouter sk-...
-Agent --MCP--> agentkeys.get_credential("openrouter")
-agentkeys revoke my-agent
- agentkeys-daemon --recover my-agent -...显示 Pair Code... -agentkeys approve ABCD-EFGH-
- 这里区分“已经在代码里存在的能力”和“目前还停留在设计文档里的能力”,避免把愿景误读成现状。 -
-`) ships in **Stage 4**. Reviewers should verify that the daemon works with this test seam, **NOT** that this test seam is the intended operational model. Any code that hard-depends on `AGENTKEYS_SESSION` being pre-set (rather than obtained via pairing) is a bug in Stage 4+.
+
+```bash
+# Prerequisite: Stage 1 mock backend running
+# Prerequisite: A session exists (from Stage 2 CLI: agentkeys init + store)
+
+# Start daemon (TEST SEAM — see note above)
+AGENTKEYS_BACKEND=http://localhost:8090 \
+AGENTKEYS_SESSION= \
+agentkeys-daemon --stdio
+
+# In a separate terminal, use an MCP client (or Claude Code) to:
+# 1. List tools → should show agentkeys.get_credential, agentkeys.list_credentials
+# 2. Call agentkeys.get_credential(service: "openrouter") → returns the stored key
+# 3. Revoke the session via CLI: agentkeys revoke my-agent
+# 4. Call agentkeys.get_credential(service: "openrouter") → DENIED
+
+# Hardening verification (run inside the daemon process or check /proc):
+cat /proc//status | grep -E 'Dumpable|NoNewPrivs|Seccomp|CapEff|VmLck'
+# Expected: Dumpable: 0, NoNewPrivs: 1, Seccomp: 2, CapEff: 0, VmLck > 0
+```
+
+### Stage Contract
+- **Inputs:** Stage 0 crates + Stage 1 running mock backend + a valid session token
+- **Outputs:** `agentkeys-daemon` binary with MCP server and kernel hardening
+- **Done when:** All 13 tests pass. MCP tools are discoverable and functional. Hardening checks pass on Linux (macOS: hardening tests skip gracefully with warnings).
+
+---
+
+## Stage 4: Pair/Approve Flow
+
+**Goal:** The full child-initiates rendezvous pairing flow. A daemon can pair with a master session without any direct network connection.
+
+### Crates Modified
+- `agentkeys-daemon` — add pair-on-startup flow (open_auth_request, register_rendezvous, poll, display pair code)
+- `agentkeys-cli` — add `agentkeys approve ` command (fetch_auth_request by pair code, display details + OTP, confirm, approve_auth_request)
+
+### Deliverables
+- [ ] Daemon startup pair flow:
+ 1. Generate Ed25519 keypair
+ 2. Call `open_auth_request(Pair, {daemon_pubkey, scope})`
+ 3. Call `register_rendezvous(daemon_pubkey, pair_code)`
+ 4. Display: "Pair code: ABCD-EFGH. Approve on your Master device."
+ 5. Long-poll `poll_rendezvous` until payload arrives or timeout
+ 6. Decrypt child session from payload → store in memfd_secret + at-rest file
+- [ ] `agentkeys approve `:
+ 1. Call `fetch_auth_request(session, pair_code)` → display request type, scope, OTP
+ 2. Prompt user: "OTP is XXXXXX. Does this match? [y/N]"
+ 3. On confirm: call `approve_auth_request(session, request_id)`
+- [ ] Recovery flow: `agentkeys-daemon --recover `
+ 1. Same as pair but with `AuthRequestType::Recover { agent_identity, new_daemon_pubkey }`
+ 2. Backend resolves AgentIdentity → WalletAddress via identity graph
+ 3. Backend re-encrypts existing credentials to new daemon pubkey
+- [ ] Identity linking: `agentkeys link --alias/--email` already implemented in Stage 2
+
+### Unit Tests
+```
+cargo test -p agentkeys-daemon -p agentkeys-cli -- pair
+```
+
+| Test | What it validates |
+|---|---|
+| `pair::full_loop` | Daemon opens request + registers → CLI approves → daemon receives session |
+| `pair::otp_matches` | OTP displayed by daemon matches OTP shown by CLI `approve` |
+| `pair::timeout_retry` | Daemon times out on poll → generates fresh pair code → second attempt succeeds |
+| `pair::wrong_pair_code` | `agentkeys approve XXXX-YYYY` with unknown code → clear error |
+| `pair::expired_code` | Approve after 5-min TTL → EXPIRED error |
+| `pair::replay_resistance` | Approve same code twice → ALREADY_CONSUMED |
+| `pair::wrong_user_approve` | Different user's session tries to approve → UNAUTHORIZED |
+| `recover::full_loop` | Daemon `--recover agent-A` → CLI approves → daemon receives existing wallet + credentials |
+| `recover::unknown_identity` | `--recover nonexistent` → AGENT_NOT_FOUND with guidance |
+| `recover::old_pubkey_revoked` | After recovery, old daemon's pubkey is revoked |
+| `recover::credentials_intact` | After recovery, `get_credential` returns the same key that was stored before the old daemon died |
+
+### Reviewer E2E Checklist
+```bash
+# Prerequisite: Stage 1 mock backend running
+# Prerequisite: Master session exists (agentkeys init)
+
+# === PAIR FLOW ===
+
+# Terminal 1: start daemon (it will display a pair code)
+AGENTKEYS_BACKEND=http://localhost:8090 agentkeys-daemon
+# Output: "Pair code: ABCD-EFGH. Approve on your Master device."
+
+# Terminal 2: approve on Mac
+agentkeys approve ABCD-EFGH
+# Output: "Request: Pair new agent. OTP: 123456. Confirm? [y/N]"
+# Type: y
+# Output: "Approved. Agent paired successfully."
+
+# Terminal 1 should now show: "Paired. Session received. Daemon ready."
+
+# Test credential flow through the paired daemon:
+agentkeys store openrouter sk-test # store via CLI
+# Then via MCP: agentkeys.get_credential("openrouter") → sk-test
+
+# === RECOVER FLOW ===
+
+# Link an identity first
+agentkeys link --alias my-bot
+
+# Kill daemon (Ctrl+C)
+# Start new daemon in recover mode
+AGENTKEYS_BACKEND=http://localhost:8090 agentkeys-daemon --recover my-bot
+# Output: "Recovery code: WXYZ-1234. Approve on your Master device."
+
+# Approve recovery
+agentkeys approve WXYZ-1234
+# Output: "Request: Recover agent 'my-bot'. OTP: 654321. Confirm? [y/N]"
+# Type: y
+
+# Verify same credentials survived:
+# MCP: agentkeys.get_credential("openrouter") → sk-test (same key, no re-store needed)
+```
+
+### Stage Contract
+- **Inputs:** Stages 0-3 (all crates + running backend + CLI + daemon with MCP)
+- **Outputs:** Working pair + recover flows via rendezvous
+- **Done when:** All 11 tests pass. The pair E2E flow works across two terminals. The recover flow preserves credentials.
+
+---
+
+## Stage 5a: Provisioner — Deterministic + Patterns (v0 critical path)
+
+**Goal:** An agent with browser control can call `agentkeys.provision(service: "openrouter")` via MCP, a deterministic Playwright script (composing a reusable pattern) creates a real OpenRouter account, and a mandatory verification step confirms the returned API key actually works against the target service before the credential is stored.
+
+**Architectural context (2026-04-16 CEO review).** Stage 5 was restructured into a 4-tier runtime architecture. Stage 5a ships Tier 1 (patterns) and Tier 2 (scripts). Stage 5b ships Tier 0 (dev-time script generator) and Tier 3 (runtime agentic fallback).
+
+```
+ TIER 0 (dev tool, 5b) LLM-generated script via agentkeys-scripts-gen
+ ↓ produces a draft .ts file for human review
+ TIER 1 (5a) Pattern library: signupEmailOtp (v0),
+ OAuth-Google / OAuth-GitHub / magic-link / password+verify (5b)
+ ↓ scripts compose patterns
+ TIER 2 (5a) Script registry: provisioner-scripts/scrapers/*.ts
+ ↓ runtime tries this first
+ TIER 3 (5b) Claude-Chrome agentic fallback via MCP browser primitives
+ ↓ engages on trip-wire (selector miss, CAPTCHA, no script)
+```
+
+### Crates / Packages
+- `agentkeys-provisioner` — Rust library, spawns Playwright subprocess, handles IPC, runs verification
+- `provisioner-scripts/` — TypeScript + Playwright:
+ - `scrapers/openrouter.ts` — OpenRouter signup flow (composes `signup_email_otp` pattern)
+ - **`patterns/signup_email_otp.ts`** — reusable pattern: email signup with OTP verification. Takes `{ url, emailBackend, submitButton, otpSelector, successKeySelector }` and drives the flow. Extracted from the OpenRouter script so v0.1 services can compose it without reimplementing the signup-with-OTP shape.
+ - **`lib/email.ts`** — ephemeral email integration. Reads verification codes from the chosen burner email backend (Gmail plus-addressing for v0; SimpleLogin / mail.tm / AnonAddy in v0.1). Patterns call this; individual scrapers never call email directly.
+ - **`lib/verify.ts`** — post-provision credential verification helper. Takes `{ key, service }` and makes one authenticated API call against the target. Returns `true` only if the call succeeds. This is the only defense against silent-corrupt-credential (a string that looks like an API key but isn't).
+
+### Deliverables
+- [ ] MCP tool: `agentkeys.provision(service: "openrouter")` exposed on the daemon
+- [ ] Rust orchestrator: receives MCP call → spawns `npx tsx provisioner-scripts/scrapers/openrouter.ts` → passes parameters via stdin/env → receives API key via stdout JSON → **calls `lib/verify.ts` to confirm the key works against the live API** → encrypts to shielding key → calls `store_credential`. If verification fails, abort with a clear error; `store_credential` is NOT called.
+- [ ] **Mandatory post-provision verification step.** Every tier's success output must be verified by one authenticated API call against the target service. This is non-negotiable: without it, script drift or LLM hallucination can return a page label or session ID that passes the "string was extracted" bar but is not a working credential. For OpenRouter: `GET https://openrouter.ai/api/v1/models` with `Authorization: Bearer ` → 200 is real, 401 is phantom.
+- [ ] `patterns/signup_email_otp.ts` — reusable email-signup-with-OTP pattern extracted from the OpenRouter flow. Functions over a DSL. Composition is "scripts call pattern functions with service-specific selectors."
+- [ ] `scrapers/openrouter.ts` — OpenRouter signup composes `signupEmailOtp` with OpenRouter-specific selectors + success-page key extraction.
+- [ ] `lib/email.ts` — IMAP for Gmail plus-addressing in v0. Config via env: `AGENTKEYS_EMAIL_BACKEND`, `AGENTKEYS_EMAIL_USER`, `AGENTKEYS_EMAIL_PASSWORD` or `AGENTKEYS_EMAIL_API_KEY`.
+- [ ] Structured error reporting per trip-wire type: selector timeout (15s default), unexpected navigation, HTTP 5xx from target, email timeout, verification failure. Each trip-wire reports `{ stage, trigger, service, elapsed_ms }` to the MCP caller. No generic "something failed."
+- [ ] Observability (mandatory, per Section 8 of CEO review): emit `provision_tier_used{service,tier}`, `provision_duration_seconds{service}`, `provision_trip_wire_fired{service,trip_wire}`, `provision_verification_result{service,result}` metrics per run.
+
+### Unit Tests
+```
+cargo test -p agentkeys-provisioner # orchestrator IPC + trip-wire + verification gating
+npm test --prefix provisioner-scripts # patterns + scrapers + email + verify
+```
+
+| Test | What it validates |
+|---|---|
+| `provisioner::spawn_and_receive` | Orchestrator spawns a mock TS subprocess, receives JSON on stdout |
+| `provisioner::subprocess_timeout` | Subprocess hangs → orchestrator times out after 120s with clear error |
+| `provisioner::subprocess_error` | Subprocess returns error JSON → orchestrator surfaces it to MCP caller |
+| `provisioner::verification_failure_aborts` | Script returns a key, `lib/verify` returns false → provision aborts, `store_credential` NOT called |
+| `provisioner::stores_credential` | After successful provision + verification, `read_credential` returns the obtained key |
+| `provisioner::duplicate_provision` | Provision when already provisioned → return existing credential (no new signup) |
+| `provisioner::phantom_key_caught` | **Chaos test.** Decoy page returns a string shaped like `sk-or-v1-XXXXX` that isn't a real key → verification catches it → provision aborts with clear error |
+| `patterns::signup_email_otp_happy` | Pattern runs against HAR fixture of OpenRouter signup, completes flow, returns extracted key |
+| `patterns::signup_email_otp_selector_timeout` | Pattern hits missing selector → returns structured trip-wire error (not a hang) |
+| `email::fetch_code_gmail_plus` | `lib/email.ts` connects to Gmail IMAP with plus-addressed account, retrieves test email within 30s |
+| `email::fetch_code_timeout` | No matching email → clean timeout with structured error |
+| `email::fetch_code_wrong_pattern` | Email arrives but doesn't match sender/subject → NOT_FOUND, not the wrong code |
+| `verify::valid_key_returns_true` | Valid OpenRouter key → `GET /api/v1/models` 200 → returns true |
+| `verify::invalid_key_returns_false` | Random string → 401 → returns false |
+| `openrouter::smoke` | (CI weekly, non-blocking) Live openrouter.ai end-to-end provision with verification. Auto-files issue on failure; does not block merges. |
+
+### Reviewer E2E Checklist
+```bash
+# Prerequisite: Stages 0-4 complete, daemon paired and running
+
+# Happy path:
+# Call via MCP: agentkeys.provision(service: "openrouter")
+# Expected: Playwright opens browser, creates account via signup_email_otp pattern,
+# extracts key, verifies key against openrouter.ai/api/v1/models,
+# stores credential. Returns success.
+# Verify: agentkeys.get_credential(service: "openrouter") → returns a real sk-or-v1-... key
+
+# Phantom-key defense:
+# Deploy a decoy HTTP server returning a page with a fake sk-or-v1-FAKE string
+# Point the script at the decoy URL
+# Expected: script "succeeds" extracting FAKE; verification calls openrouter.ai with FAKE;
+# gets 401; provision aborts; store_credential NOT called.
+
+# Trip-wire: selector change
+# Monkey-patch an OpenRouter selector in the script to a non-existent element
+# Expected: clean structured error within 15s, not a hang. Error reports which selector failed.
+```
+
+### Stage Contract
+- **Inputs:** Stages 0-4 + Node.js + Chrome/Chromium + Gmail IMAP creds (or equivalent burner-email backend)
+- **Outputs:** Working `agentkeys.provision(openrouter)` MCP tool with pattern library (1 pattern) + mandatory verification + observability metrics
+- **Done when:** All unit tests pass (including the phantom-key chaos test). At least one successful live provision of a real OpenRouter account, with verification confirming the key works against `GET /api/v1/models`. All observability metrics emitted.
+
+### Stage 5a explicitly does NOT ship
+- Claude-Chrome agentic fallback (→ Stage 5b)
+- Fallback audit trail (→ Stage 5b)
+- LLM script-generator dev tool (→ Stage 5b)
+- Fallback→PR loop (→ Stage 5b)
+- Additional patterns beyond `signupEmailOtp` (→ Stage 5b, extracted from the 2nd/3rd service as it's added)
+
+### Open item to resolve before first live provision
+- [ ] **OpenRouter ToS check:** confirm that scripted account creation does not violate the target service's ToS. Repeat this check for every new service added to Tier 2. Noted in TODOS.md per 2026-04-16 CEO review.
+
+### CLI UX Specifications (2026-04-16 plan-design-review)
+
+User-facing surfaces for Stage 5a — decisions locked to avoid "we'll figure out the output format later":
+
+- **Success output masks the key.** Stdout on success prints exactly one line: `sk-or-v1-****...AB3F` (first 8 chars + `****...` + last 4 chars). Never the full key. Full key is retrieved via `agentkeys read openrouter` or injected into child processes via `agentkeys run`. Rationale: AgentKeys's whole pitch is "credentials don't leak" — printing a full key to stdout contradicts it (shell history, log aggregators, screen recordings all capture stdout).
+- **Progress to stderr during long-running provision.** One plain-text line per phase: `Creating account...`, `Waiting for email verification...`, `Extracting API key...`, `Verifying key against openrouter.ai...`, `Stored.` To stderr, not stdout — so piping / MCP daemon callers can ignore cleanly. No spinners, no TUI animations. Renders correctly under `agentkeys run -- ...` wrappers.
+- **Duplicate provision flow.** When a credential for the service already exists: verify the existing key with one `lib/verify.ts` call. If valid: stderr `openrouter already provisioned, key valid (provisioned ).` No re-signup, stdout prints the masked key. If invalid (revoked/expired): stderr `existing key invalid, re-provisioning...` and proceed with full flow. `--force` flag re-provisions regardless of existing.
+- **Error message format.** All new error codes (`PROVISION_IN_PROGRESS`, `TRIPWIRE_SELECTOR_TIMEOUT`, `EMAIL_TIMEOUT`, `VERIFICATION_FAILED`, `PROVISION_STORE_FAILED`, `AUDIT_DEGRADED`) follow the Stage 2 DX spec: `problem + cause + fix + docs link`. Example for `VERIFICATION_FAILED`: `Problem: Provision succeeded but the returned key did not authenticate. Cause: The target service may have rate-limited signup, or the script extracted the wrong element. Fix: Retry in 5 minutes; if persistent, file an issue at with provision audit log. Docs: https://agentkeys.dev/docs/errors#verification-failed`
+
+### CLI UX Specifications for 5b (2026-04-16 plan-design-review)
+
+- **TTY detection for fallback→PR prompt.** Use `atty::is(Stream::Stdin) && atty::is(Stream::Stdout)` in Rust. Prompt only shown when BOTH are TTYs. MCP daemon context (pipes), redirected output (`> log.txt`), and scripted execution all skip the prompt automatically. No environment variable needed. This is the Rust standard for "is this interactive?"
+- **TUI prompt text (verbatim).** `Captured a new script from this fallback session. Submit as a draft PR to provisioner-scripts/? [y/N]` — default on Enter is No (capital-N convention). On `y`: write to `/tmp/agentkeys-proposed--.ts` and print `Draft written to . Review, then run: gh pr create --title "add script" --body-file .md`.
+
+### Eng Review Implementation Notes (2026-04-16 plan-eng-review)
+
+Locked architectural decisions to prevent implementation drift:
+
+- **IPC contract between Rust orchestrator and TS subprocess.** Line-delimited JSON, each line tagged with `type`. Schema defined in `agentkeys-types` as `ProvisionEvent` enum. Tags: `progress` `{step}`, `tripwire` `{kind, step, elapsed_ms}`, `success` `{api_key}`, `error` `{code, details}`. TS side imports the schema via hand-sync (per CLAUDE.md typed-parameters principle — no opaque JSON parsing).
+- **Concurrency.** Daemon holds a single `Mutex