From 36ca5e7b6e55d91a85dc4db1da65e88a9994dd53 Mon Sep 17 00:00:00 2001 From: wildmeta-agent Date: Sun, 19 Apr 2026 22:17:39 +0800 Subject: [PATCH 1/5] docs(stage6): federated-email roadmap + broker-not-proxy rule #4 + architecture specs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Architectural core ------------------ - wiki/blockchain-tee-architecture.md §6 retitled to "Summary: the four rules"; adds Rule #4 (credential broker, not operation proxy). Our infrastructure mints ephemeral credentials; daemons call remote services directly. No per-operation compute on our side. This rule is why the email, knowledge- base, and OIDC-federation designs refuse to build SaaS-shape proxies. - wiki/Home.md rewritten as a tree-structured index: four rules up front, foundations / credential-lifecycle / service-architecture groupings, and a reading-order matrix by role. Stage 5–7 roadmap ----------------- - docs/spec/plans/development-stages.md: new "Stage 5–7 roadmap update" section near the top. - Stage 5 (quick email demo) stays current - Stage 6 = federated own-email on @agentkeys-email.io (hosted SES + TEE- held Ed25519 DKIM + ES256 OIDC issuer + PrincipalTag isolation) - Stage 7 = generalized OIDC provider (oidc.agentkeys.dev federates into AWS/GCP/Azure/Ali/K8s) - Old Stage 6/7/8/9 sections renamed POSTPONED, preserved inline for reference Architecture specs (new) ------------------------ - docs/spec/ses-email-architecture.md (368 lines) — Stage 6 email spec under the broker-not-proxy shape. SES → S3 direct drop for inbound (no Lambda); daemon calls SES via minted creds for outbound; PrincipalTag isolation on a shared bucket; TEE-derived Ed25519 DKIM + ES256 OIDC keys; data model trimmed to just Inbox + Domain on our side. - docs/spec/email-signing-backends.md (525 lines) — generalized three-layer backend comparison (Google Workspace DWD / AgentKeys TEE / AgentMail SaaS), with AWS OIDC docs verified for algorithm support (ES256, not Ed25519). - docs/stage5-workspace-email-setup.md (471 lines) — BYO Google Workspace DWD runbook, prominently flagged as ADVANCED / deferred past Stage 7 for enterprise deployments that want to reuse an existing Workspace subscription. Stage 5 demo updated -------------------- - docs/manual-test-stage5.md §1 rewritten: dedicated-personal-Gmail + TOTP + app password is now the recommended quickstart (~10 min, zero code changes, reuses existing imapflow path). Workspace DWD and plus-addressing preserved as collapsed alternatives. - harness/stage-5a-done.sh reworked: 8-step non-live gate with colored banners, covers Rust + TS unit tests + phantom-chaos + grep guard + typecheck + clippy + MCP registration + observability metrics, ~90s end-to-end. Session-scratchpad not in this commit ------------------------------------- The project-local `.omc/wiki/` architecture pages (overview, hosted-first, tag-based-access, oidc-federation, email-system, knowledge-storage) that fed this spec round are git-ignored and stay local. They are informally referenced from the committed docs and wiki Home.md, but are not shipped. --- docs/manual-test-stage5.md | 471 ++++++----------------- docs/spec/email-signing-backends.md | 525 ++++++++++++++++++++++++++ docs/spec/plans/development-stages.md | 229 ++++++++++- docs/spec/ses-email-architecture.md | 368 ++++++++++++++++++ docs/stage5-workspace-email-setup.md | 471 +++++++++++++++++++++++ harness/stage-5a-done.sh | 85 ++++- wiki/Home.md | 90 +++-- wiki/blockchain-tee-architecture.md | 8 +- 8 files changed, 1862 insertions(+), 385 deletions(-) create mode 100644 docs/spec/email-signing-backends.md create mode 100644 docs/spec/ses-email-architecture.md create mode 100644 docs/stage5-workspace-email-setup.md diff --git a/docs/manual-test-stage5.md b/docs/manual-test-stage5.md index d8d1cf9..42916da 100644 --- a/docs/manual-test-stage5.md +++ b/docs/manual-test-stage5.md @@ -7,359 +7,97 @@ > `/agentkeys-record-scraper` skill usage) is not yet shipped and is tested > separately once 5b lands. Stage 6 (npm packaging) is deferred to v0.1. -> **Hermetic vs live.** Stage 5a tests fall into two groups: -> - **Hermetic** — Playwright runs against local HTML fixtures via `page.route()`. -> No real network, no real Gmail, no real OpenRouter. These are the *unit -> and chaos tests* and can run on any machine with Node + Playwright. -> - **Live provision** — creates a real OpenRouter account via a real Chromium -> session, real Gmail IMAP, real HTTP call to openrouter.ai. Requires -> Gmail plus-addressing creds **and** a ToS compliance check (tracked in -> `TODOS.md`) before running. The live test is documented here but *do not* -> run it until the ToS check completes. - -All manual tests target the workspace layout: -``` -crates/agentkeys-{types,provisioner,mcp,cli} -provisioner-scripts/{src,tests} -harness/ -``` - ---- - -## 1. Fast gate (30 seconds, no external deps) - -The quickest way to verify Stage 5a is intact. Run this first after any change -that touches Stage 5a files. - -```bash -cd ~/Projects/agentkeys -bash harness/stage-5a-done.sh -``` +Stage 5a has two tests that matter: -**Expected output ends with:** -``` -STAGE 5a PASSED -``` - -This script runs: -1. `cargo test -p agentkeys-types -p agentkeys-provisioner -p agentkeys-mcp -p agentkeys-cli` -2. `npm test --prefix provisioner-scripts` -3. `grep -iE "openrouter|brave|jina|groq|anthropic|gemini|twitter|instagram" provisioner-scripts/src/patterns/` (must be empty) -4. Isolated phantom-key chaos test (hermetic) - -Exit 0 = everything green. Exit non-zero = stage broken, do not merge. +1. **The live demo** — a real OpenRouter signup, real Gmail, real API key stored and verified. This is what you run to show Stage 5a actually works end-to-end. Written up top; this is the centerpiece. +2. **Everything else** — Rust + TS unit tests, phantom-key chaos, grep guard, typecheck, clippy, MCP registration, observability metrics. All of it runs in one command: `bash harness/stage-5a-done.sh`. No per-section prose required. --- -## 2. Setup (one-time, for the deeper manual tests below) +## 1. The demo — live OpenRouter provision -```bash -cd ~/Projects/agentkeys +> ⛔ **DO NOT RUN YET — blocked on ToS check.** The OpenRouter ToS compliance item in `TODOS.md` must clear first. Running this before the check may violate OpenRouter's terms and create a real account tied to your email. -# Build all binaries + install TS deps -cargo build --workspace --release -npm install --prefix provisioner-scripts -npx playwright install chromium --with-deps # downloads the headless browser - -# Convenience aliases -alias agentkeys="./target/release/agentkeys-cli" -alias agentkeys-daemon="./target/release/agentkeys-daemon" -alias agentkeys-mock-server="./target/release/agentkeys-mock-server" -``` - ---- - -## 3. Hermetic tests — run these any time - -### 3a. Rust unit tests (67 tests) +This is the end-to-end test that actually creates a real OpenRouter account. One command, and by the end you have a verified API key stored in `agentkeys`: ```bash -cargo test -p agentkeys-types # 8 tests — includes ProvisionEvent serde roundtrips -cargo test -p agentkeys-provisioner # 15 tests — subprocess IPC + mutex + orchestrator -cargo test -p agentkeys-mcp # 3 tests — agentkeys.provision tool registration -cargo test -p agentkeys-cli # 41 tests — includes 4 new provision tests -``` - -All 4 crates should exit 0 with no failures. - -### 3b. TypeScript unit tests (15 tests) - -```bash -npm install --prefix provisioner-scripts -npm test --prefix provisioner-scripts -``` - -**Expected:** -``` -Test Files 6 passed (6) - Tests 15 passed (15) -``` - -Breakdown: -- `src/types.test.ts` (3) — ProvisionEvent emit + roundtrip -- `src/lib/email.test.ts` (3) — IMAP happy/timeout/wrong-pattern -- `src/lib/verify.test.ts` (3) — 200/401/503 status mapping -- `tests/scrapers/openrouter.test.ts` (3) — scraper happy/selector-timeout/verification-failure -- `tests/patterns/signup_email_otp.test.ts` (2) — pattern happy/selector-timeout -- `tests/scrapers/openrouter.phantom.test.ts` (1) — phantom-key chaos - -### 3c. Phantom-key chaos test in isolation - -The key defense against silent-corrupt credentials. Fake-shaped key → verify() returns 401 → Error event, no Success. - -```bash -cd provisioner-scripts -npx vitest run tests/scrapers/openrouter.phantom.test.ts -cd - -``` - -**Expected ending:** -``` - ✓ tests/scrapers/openrouter.phantom.test.ts (1) ... - ✓ scraper (1) ... - ✓ phantom_key_caught ... - - Test Files 1 passed (1) - Tests 1 passed (1) -``` - -You will **not** see an `{"type":"error",...}` line in the terminal — the test intercepts stdout via a `process.stdout.write` proxy (`captureEmittedEvents` in the test file) and asserts programmatically that an Error event was emitted and no Success event was. The `✓ phantom_key_caught` is the signal the gate held. - -If this test ever fails, or a variant starts passing with a Success event present, **stop** — the verification gate is broken and a real phantom key could be stored in production. File an issue immediately. - -### 3d. Pattern grep guard - -Patterns must never reference service-specific strings. Enforce: - -```bash -grep -riE "openrouter|brave|jina|groq|anthropic|gemini|twitter|instagram" \ - provisioner-scripts/src/patterns/ -``` - -**Expected:** empty (no output). Any match means a pattern has leaked service-specific selectors or copy — extract them back into `scrapers/.ts` parameters. - -### 3e. Typecheck - -```bash -npm run typecheck --prefix provisioner-scripts -``` - -**Expected:** exit 0, no TypeScript errors. - -### 3f. Clippy (Rust lints) - -```bash -cargo clippy -p agentkeys-types -p agentkeys-provisioner -p agentkeys-mcp -p agentkeys-cli --all-targets -``` - -**Expected:** zero warnings in the Stage 5a crates. (Warnings in other crates like `agentkeys-mock-server` or `agentkeys-core` are pre-existing and out of scope.) - ---- - -## 4. Scraper walkthrough — inspect what it does without running live - -This is a read-only tour of how a provision actually works, useful when debugging -a failing scraper or onboarding a new service. - -### 4a. Inspect the Rust ↔ TS wire format - -Every line the TS subprocess emits is a tagged JSON event. Open two terminals. - -Terminal 1 — show the schema: -```bash -cat crates/agentkeys-types/src/provision.rs | grep -A 20 "enum ProvisionEvent" -``` - -Terminal 2 — show the TS mirror: -```bash -cat provisioner-scripts/src/types.ts | grep -A 15 "ProvisionEvent" -``` - -Fields match. JSON snake_case. `type` is the discriminator. This is the IPC contract. - -### 4b. Run the scraper against the hermetic fixture only - -The OpenRouter scraper can run with the local HTML fixture served via Playwright `page.route()`. No real network, no real OpenRouter. - -```bash -cd provisioner-scripts -npx vitest run tests/scrapers/openrouter.test.ts --reporter=verbose -cd - -``` - -Three scenarios run: -- `scraper::happy_path` — scraper walks the fixture, emits Progress events, extracts the fixture key, verify() returns valid, Success event fires -- `scraper::selector_timeout` — fixture served without the email input; scraper emits a Tripwire event within 15s -- `scraper::verification_failure` — mock verifier returns `{valid:false, reason:"phantom"}`; scraper emits an Error event - -Watch the `console.log` output for the emitted events in each test. - ---- - -## 5. MCP tool registration check - -Verify `agentkeys.provision` is discoverable through the daemon's MCP interface. - -### 5a. Start the daemon in a scratch environment - -Terminal 1: -```bash -cd ~/Projects/agentkeys - -# Start the mock backend (needed by daemon for credential backend wiring) -cargo run -p agentkeys-mock-server -- --port 8090 & -MOCK_PID=$! - -# Give it a second to bind -sleep 1 - -# Run the daemon with a test session seam (per Stage 3 test-seam pattern). -# AGENTKEYS_SESSION injects a pre-built session and bypasses the pair flow — -# without it the daemon blocks on master-device approval before serving MCP -# (see crates/agentkeys-daemon/src/main.rs and src/session.rs). Any string -# works for the token value; `test-token` is a convention. -AGENTKEYS_BACKEND=http://localhost:8090 \ - AGENTKEYS_SESSION=test-token \ - cargo run -p agentkeys-daemon -- --stdio -``` - -The daemon is now listening for MCP JSON-RPC on stdin/stdout. You should see `daemon ready, session wallet=local` on stderr and **no** `Pair code:` prompt. - -### 5b. List tools (Terminal 2, via a scratch stdin pipe) - -The daemon reads JSON-RPC from stdin. Easiest way to exercise it without an MCP client is a one-shot. The same `AGENTKEYS_SESSION` test-seam is required here, otherwise the daemon sits in the pair flow and never reads the piped JSON. - -```bash -cd ~/Projects/agentkeys -echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | \ - AGENTKEYS_BACKEND=http://localhost:8090 \ - AGENTKEYS_SESSION=test-token \ - cargo run -p agentkeys-daemon -- --stdio 2>/dev/null -``` - -**Expected:** the response JSON includes an entry with `"name":"agentkeys.provision"` and the schema `{"service":"string","force":"boolean (optional)"}`. - -If you see `Pair code: ... Approve on your Master device. OTP: ...` on the terminal, the `AGENTKEYS_SESSION` env var didn't propagate — double-check it's on the same shell line as the `cargo run` invocation. - -### 5c. Confirm the in-progress sentinel - -(Advanced — requires sending a provision call then immediately a second one. Easier via unit tests: `mcp::provision_in_progress_error` in `crates/agentkeys-mcp/src/lib.rs`.) - -```bash -cargo test -p agentkeys-mcp -- provision_in_progress_error --nocapture +agentkeys provision openrouter ``` -**Expected:** test passes; the output confirms a second concurrent call returns an MCP error with `code: "PROVISION_IN_PROGRESS"`. +### Prerequisites (once the ToS check clears) -### 5d. Cleanup +For the demo-only purpose of Stage 5, the goal is the **shortest path to a running provisioner** with an inbox the agent fully controls. Use a dedicated personal Gmail below — reuses our existing IMAP code path, ~10 minutes total setup, no Workspace subscription required. -```bash -kill $MOCK_PID -``` +> **This is a temporary demo solution.** For production (v0.1), the agent mailbox moves to SES-hosted `*@bots.wildmeta.ai` under the three-layer `TokenAuthority` abstraction. See the [email-system wiki page](../.omc/wiki/email-system.md) for the full architecture and why we're running demo-and-production on different backends deliberately. ---- +#### 🚀 Demo path: dedicated personal Gmail + TOTP + app password -## 6. CLI UX walkthrough +Why dedicated (not your personal inbox with plus-addressing): the agent gets a clean inbox it fully controls, no personal mail pollution, cleanup is a single account-delete. -All CLI provision tests can run without any real signup. They use the mock backend and a test-seam provisioner. +**1. Create a fresh Gmail account for the bot.** -### 6a. Masked key output format +Sign up at [accounts.google.com](https://accounts.google.com) with a name like `wildmeta-stage5-demo@gmail.com`. Google will ask for a recovery phone — use your personal phone; you only need it once for step 2. -```bash -cargo test -p agentkeys-cli --test cli_tests -- cli_provision_masked_output --nocapture -``` - -**Expected:** test passes. The test calls `run_provision()` in-process (it does not spawn the `agentkeys-cli` binary), feeds a scripted success event with raw key `sk-or-v1-realkey12345abcdefgh`, and asserts the returned `obtained_key_masked` field satisfies four properties: +**2. Enable 2-Step Verification and enroll TOTP as the second factor.** -1. Does **not** contain the raw key substring `realkey12345abcdefgh`. -2. Contains `****` as the mask marker. -3. Starts with the first 8 raw chars (`sk-or-v1`). -4. Ends with the last 4 raw chars (`efgh`). +Gmail IMAP access chain: `app password` requires `2FA enabled` requires `second factor enrolled`. Using an authenticator app as that second factor makes the account non-interactive after this one-time enrollment. -Because the assertions run on the in-memory struct, you will **not** see a masked-key line in stdout — only the `provision_metric` JSON lines tracing emits and the usual `test ... ok` banner. To observe the real stdout masking behavior of the CLI binary, run a full hermetic provision separately (§8 once unblocked, or a scripted provision against the mock backend). +- Open [myaccount.google.com](https://myaccount.google.com) → **Security** +- **Turn on 2-Step Verification.** Google sends an SMS to your recovery phone to start enrollment. +- Under 2-Step Verification settings, add **Authenticator app** as a second step. Google shows a QR code and a secret. +- Scan into Google Authenticator / Authy / 1Password / Bitwarden / whatever TOTP client you already use. You now own the second factor. +- (Optional) once TOTP is active, you can drop SMS as a 2FA method — Google keeps the phone for account recovery but stops using it as a live second factor. -### 6b. `--force` flag re-provisions +**3. Generate an app password for IMAP.** -```bash -cargo test -p agentkeys-cli --test cli_tests -- cli_provision_force_flag --nocapture -``` +- Visit [myaccount.google.com/apppasswords](https://myaccount.google.com/apppasswords). +- Create one named "agentkeys-stage5". Google gives you a 16-character password. +- Copy it immediately — it's shown once. Revoke anytime from the same page. -**Expected:** test passes. With an existing credential present, `--force` triggers a fresh subprocess call (not the verify-and-return shortcut). - -### 6c. Duplicate provision verify-and-report - -```bash -cargo test -p agentkeys-cli --test cli_tests -- cli_provision_duplicate_verified --nocapture -``` - -**Expected:** test passes. With an existing credential, no `--force`, the CLI prints to stderr `openrouter already provisioned, key valid`, prints the masked existing key on stdout, and does NOT re-run the subprocess. - -### 6d. Error message format (problem + cause + fix + docs) +**4. Export the four env vars.** ```bash -cargo test -p agentkeys-cli --test cli_tests -- cli_provision_error_format --nocapture +export AGENTKEYS_EMAIL_BACKEND=gmail +export AGENTKEYS_EMAIL_USER="wildmeta-stage5-demo@gmail.com" # the bot account from step 1 +export AGENTKEYS_EMAIL_PASSWORD="xxxx xxxx xxxx xxxx" # 16-char app password from step 3 +export AGENTKEYS_EMAIL_HOST="imap.gmail.com" +export AGENTKEYS_EMAIL_PORT="993" ``` -**Expected:** test passes. Error output to stderr contains (in order): -- `Problem: ...` -- `Cause: ...` -- `Fix: ...` -- `Docs: https://...` - -This is the CLAUDE.md-specified error format. Verify manually by triggering any known-bad state (e.g. missing AGENTKEYS_BACKEND) and checking the stderr shape. - ---- - -## 7. Observability check — structured metrics +Once the app password is set, the demo sees **zero 2FA prompts**. App passwords bypass 2FA by design — they're Google's non-interactive credential, scoped to IMAP only, revocable anytime. -The orchestrator emits JSON log lines to stderr for each metric. Easiest to see via a subprocess run in a test: +**5. Daemon running and paired** — see the Stage 4 manual test guide. -```bash -cargo test -p agentkeys-provisioner -- stores_credential --nocapture 2>&1 | \ - grep "provision_metric" -``` - -**Expected:** at least three log lines of the form: -``` -{"level":"info","event":"provision_metric","name":"tier_used","service":"openrouter","tier":2} -{"level":"info","event":"provision_metric","name":"duration_seconds","service":"openrouter","seconds":0.123} -{"level":"info","event":"provision_metric","name":"verification_result","service":"openrouter","result":"valid"} -``` +
+Alternative: Google Workspace DWD (for operators with an existing Workspace subscription) -The metric names are stable (`tier_used`, `duration_seconds`, `trip_wire_fired`, `verification_result`). Prometheus/OTel exporters come in v0.1. +See [`docs/stage5-workspace-email-setup.md`](stage5-workspace-email-setup.md). That path mints a throwaway `stage5test-@wildmeta.ai` per run, reads its inbox via the Gmail API (no app password, no interactive OAuth), and deletes the user at the end. One-time ~20-minute admin setup + currently 3-5 days of code work to replace the `imapflow` fetcher with a Gmail-API fetcher that uses DWD impersonation. Longer upfront cost than the dedicated-Gmail demo path, but the right choice for enterprise deployments that already run Workspace. ---- +
-## 8. Live provision (DO NOT RUN YET — blocked on ToS check) +
+Alternative: plus-addressed personal Gmail (shared-inbox quick demo) -This is the end-to-end test that actually creates a real OpenRouter account. -**Do not run until** the TODOS.md OpenRouter ToS compliance check completes. -Running this test before the ToS check may violate OpenRouter's terms and -create a real account tied to your email. +If you don't want to create a dedicated account and are OK with one-off OpenRouter mail landing in your real inbox, plus-addressing on your existing Gmail works for a single demo run. -### Prerequisites (when ToS check clears) - -1. **Your existing personal Gmail account** — do **not** create a new Gmail account for this demo. Plus-addressing is a Gmail-native feature: mail sent to `you+anything@gmail.com` is delivered to `you@gmail.com` without any configuration, so a single personal inbox already supports unlimited test aliases (e.g. `you+stage5test-20260418@gmail.com`). Creating a fresh Gmail account for automation purposes risks Google flagging it as a bot account and could itself violate Google's ToS; the whole point of plus-addressing is to avoid that. -2. Gmail app password (not your regular password) — generate at https://myaccount.google.com/apppasswords. This is scoped to IMAP access only; revoke it after the demo. -3. Environment: +1. **Your existing personal Gmail account** — plus-addressing is a Gmail-native feature: mail sent to `you+anything@gmail.com` is delivered to `you@gmail.com` without any configuration. A single inbox supports unlimited test aliases (`you+stage5test-20260418@gmail.com`). +2. **Gmail app password** (not your regular password) — generate at https://myaccount.google.com/apppasswords. Scoped to IMAP access only; revoke after the demo. +3. **Environment:** ```bash export AGENTKEYS_EMAIL_BACKEND=gmail - export AGENTKEYS_EMAIL_USER="you@gmail.com" # your real Gmail; Stage 5a appends the +alias at signup time - export AGENTKEYS_EMAIL_PASSWORD="" # from step 2, NOT your normal Google password - export AGENTKEYS_EMAIL_HOST="imap.gmail.com" # default, set explicitly if overriding + export AGENTKEYS_EMAIL_USER="you@gmail.com" # your real Gmail; Stage 5a appends +alias at signup + export AGENTKEYS_EMAIL_PASSWORD="" # NOT your normal Google password + export AGENTKEYS_EMAIL_HOST="imap.gmail.com" export AGENTKEYS_EMAIL_PORT="993" ``` -4. Daemon running and paired (see Stage 4 manual test guide) -### Run the provision +Downside: the agent doesn't fully control the inbox (shared with the human), and the OpenRouter confirmation email lingers in your personal mail until you delete it. -```bash -agentkeys provision openrouter -``` +
### Expected behavior -1. Stderr shows step lines (currently single-shot; real-time streaming ships in 5b): +1. Stderr shows single-shot step lines (real-time streaming ships in 5b): ``` Creating account... Waiting for email verification... @@ -367,68 +105,115 @@ agentkeys provision openrouter Verifying key against openrouter.ai... Stored. ``` -2. Stdout shows the masked key, e.g.: +2. Stdout shows the masked key: ``` sk-or-v1-abcd1234****...WXYZ ``` 3. Exit code 0. -4. A new OpenRouter account exists at `you+stage5test-@gmail.com`. +4. A new OpenRouter account exists at the email address you configured in `$AGENTKEYS_EMAIL_USER` (e.g. `wildmeta-stage5-demo@gmail.com` for the dedicated-Gmail path, or `you+stage5test-@gmail.com` for the plus-addressing fallback). 5. `agentkeys read openrouter` returns the full key. -6. Manually calling `curl -H "Authorization: Bearer $(agentkeys read openrouter)" https://openrouter.ai/api/v1/models` returns HTTP 200. +6. `curl -H "Authorization: Bearer $(agentkeys read openrouter)" https://openrouter.ai/api/v1/models` returns HTTP 200. ### Failure modes to watch for -- **CAPTCHA / Cloudflare challenge** — the Tier 2 script does not solve CAPTCHAs. Expect a Tripwire event with `kind: selector_timeout`. This is the signal that Stage 5b's agentic fallback is needed (human or LLM drives the browser through the challenge). Until 5b ships, just abort and retry from a different IP. -- **Email didn't arrive within 60s** — check spam folder, check plus-addressing is actually forwarding. Tripwire `email_timeout` indicates the IMAP fetch exhausted its polling window. -- **Key verification fails with `phantom`** — the scraper extracted something key-shaped that isn't a real API key. Inspect the page at the success-step selector; OpenRouter may have changed its DOM. File an issue with the HAR dump. -- **Store fails after verify** — the error message will include the obtained (masked) key. Run `agentkeys store openrouter ` manually to recover, then investigate why the backend rejected. +- **CAPTCHA / Cloudflare challenge** — the Tier 2 script does not solve CAPTCHAs. Expect a Tripwire event with `kind: selector_timeout`. This is the signal that Stage 5b's agentic fallback is needed. Until 5b ships, abort and retry from a different IP. +- **Email didn't arrive within 60 s** — check spam, check plus-addressing forwarding. Tripwire `email_timeout` means the IMAP fetch exhausted its polling window. +- **Key verification fails with `phantom`** — the scraper extracted something key-shaped that isn't a real API key. OpenRouter may have changed its DOM; inspect the page at the success-step selector and file an issue with the HAR dump. +- **Store fails after verify** — the error message includes the obtained (masked) key. Run `agentkeys store openrouter ` manually to recover, then investigate why the backend rejected. --- -## 9. Troubleshooting +## 2. Everything else — one command -### `npm test` hangs +Runs every non-live check in a single script. Use this before merging anything that touches Stage 5a crates or `provisioner-scripts/`. + +```bash +cd ~/Projects/agentkeys +bash harness/stage-5a-done.sh +``` + +**Expected last line:** +``` +STAGE 5a PASSED +``` + +Exit 0 = everything green. Non-zero = the failing step number and a red `✗` line. Do not merge on red. + +### What the script runs + +| # | Step | Asserts | +|---|---|---| +| 1 | Rust unit tests (`agentkeys-types`, `-provisioner`, `-mcp`, `-cli`) | 67 tests pass | +| 2 | TS install + `npm test` | 15 tests pass across 6 files | +| 3 | Phantom-key chaos test in isolation | silent-corrupt defense holds — Error event fires, no Success | +| 4 | Grep guard over `provisioner-scripts/src/patterns/` | zero service-specific strings leaked into patterns | +| 5 | TS typecheck | no TypeScript errors | +| 6 | `cargo clippy` on Stage 5a crates with `-D warnings` | zero clippy warnings | +| 7 | MCP `tools/list` on the daemon (with the `AGENTKEYS_SESSION` test seam) | `agentkeys.provision` advertised | +| 8 | Observability | orchestrator emits `tier_used`, `duration_seconds`, `verification_result` as JSON metric lines | + +### Setup (one-time) + +If any step fails with a "command not found" or "workspace member not found" error, you probably haven't installed dependencies yet: -Playwright might be waiting for a browser that isn't installed. ```bash +cd ~/Projects/agentkeys +cargo build --workspace --release +npm install --prefix provisioner-scripts npx playwright install chromium --with-deps ``` -### `cargo test` complains about missing `agentkeys-provisioner` +--- + +## 3. Troubleshooting + +### `npm test` hangs -The workspace member might not be listed in the top-level `Cargo.toml`. Check `[workspace]/members` contains `crates/agentkeys-provisioner`. +Playwright is waiting for a browser that isn't installed: + +```bash +npx playwright install chromium --with-deps +``` ### Grep guard fails -A pattern in `provisioner-scripts/src/patterns/` has a service-specific string. Find it: +A pattern under `provisioner-scripts/src/patterns/` has a service-specific string. Find it: + ```bash grep -rniE "openrouter|brave|jina|groq|anthropic|gemini|twitter|instagram" \ provisioner-scripts/src/patterns/ ``` + Extract the offender into a parameter in the calling scraper under `scrapers/`. ### Phantom chaos test passes with a Success event -**Critical.** The verification gate is broken. Check: -1. `provisioner-scripts/src/lib/verify.ts` — the fetch function actually returns 401 from the mock? -2. `provisioner-scripts/src/scrapers/openrouter.ts` — the Success event is only emitted AFTER verify returns `{valid:true}`? -3. The phantom test's `route.fulfill()` — the mock verify endpoint is actually being intercepted? +**Critical.** The verification gate is broken. Check, in order: + +1. `provisioner-scripts/src/lib/verify.ts` — does the fetch actually return 401 from the mock? +2. `provisioner-scripts/src/scrapers/openrouter.ts` — is Success only emitted AFTER verify returns `{valid:true}`? +3. The phantom test's `route.fulfill()` — is the mock verify endpoint actually being intercepted? Fix before merging anything. Silent-corrupt-credential is the primary threat this defends against. -### Clippy says "useless_vec" or "useless_format" +### MCP step reports `Pair code: …` + +The `AGENTKEYS_SESSION=test-token` test seam didn't reach the daemon — the env var needs to be on the same shell line as the binary invocation. The script already does this; if you're re-running step 7 by hand, make sure you keep the env vars on the same line. -These are slop markers. Apply the suggested `cargo clippy --fix` or replace `vec![...]` with `[...]` arrays / `format!("literal")` with `.to_string()`. Deslop passes catch these. +### Clippy says `useless_vec` / `useless_format` + +These are slop markers. Apply the suggested `cargo clippy --fix` or hand-replace `vec![...]` with `[...]` arrays and `format!("literal")` with `.to_string()`. Deslop passes catch these. --- -## What to do when Stage 5b lands +## 4. What to do when Stage 5b lands + +When Stage 5b ships (agentic fallback, `/agentkeys-record-scraper` skill, script-generation loop), this document will grow: -When Stage 5b ships (agentic fallback, `/agentkeys-record-scraper` skill, script generation loop), this document will grow new sections for: -- Triggering the agentic fallback via a failing Tier 2 script (expected Tripwire → Tier 3 engagement) -- Inspecting the audit JSONL at `~/.agentkeys/logs/provision-.jsonl` -- Running the `/agentkeys-record-scraper` skill to add a new service (Brave, Jina, etc.) -- Verifying the fallback→PR loop does NOT auto-submit for agent-driven callers (non-TTY) +- A new demo path that triggers the agentic fallback via a failing Tier 2 script (expected Tripwire → Tier 3 engagement). +- A step for inspecting the audit JSONL at `~/.agentkeys/logs/provision-.jsonl`. +- A `/agentkeys-record-scraper` walkthrough for adding a new service (Brave, Jina, etc.). +- An assertion that the fallback→PR loop does **not** auto-submit for agent-driven callers (non-TTY). For now, Stage 5a with OpenRouter as the only deterministic scraper is the full surface. @@ -436,15 +221,5 @@ For now, Stage 5a with OpenRouter as the only deterministic scraper is the full ## Summary checklist -- [ ] `bash harness/stage-5a-done.sh` exits 0 -- [ ] All 67 Rust tests pass across 4 crates -- [ ] All 15 TypeScript tests pass -- [ ] Phantom-key chaos test aborts with Error event (no Success) -- [ ] Pattern grep guard returns empty -- [ ] `npm run typecheck` exits 0 -- [ ] `cargo clippy` has zero warnings in Stage 5a crates -- [ ] `agentkeys.provision` appears in MCP `tools/list` response -- [ ] CLI masked-key output never contains the full raw key -- [ ] CLI error output follows problem + cause + fix + docs format -- [ ] Orchestrator emits all four metric names to stderr -- [ ] (Live, after ToS check) `agentkeys provision openrouter` creates a real account and stores a verified key +- [ ] `bash harness/stage-5a-done.sh` exits 0 (covers tests 1–8 above) +- [ ] (Once ToS cleared) `agentkeys provision openrouter` creates a real account, stores a verified key, `curl` against `/api/v1/models` returns 200 diff --git a/docs/spec/email-signing-backends.md b/docs/spec/email-signing-backends.md new file mode 100644 index 0000000..fb61723 --- /dev/null +++ b/docs/spec/email-signing-backends.md @@ -0,0 +1,525 @@ +# Email-Signing Backends: GCP-managed vs AgentKeys TEE + +**Date:** 2026-04-18 +**Status:** Design +**Stage:** 5a (alternative backend) → v0.1 (canonical) +**Related:** [#11 biometric gate](https://github.com/litentry/agentKeys/issues/11), +`docs/spec/credential-backend-interface.md`, `wiki/session-token.md`, +`wiki/blockchain-tee-architecture.md`, `docs/stage5-workspace-email-setup.md` + +--- + +## 1. What this doc decides + +When a child agent needs to read email (e.g. the OpenRouter OTP during a live +provision), *who signs the JWT that impersonates the target Workspace user*? + +Two answers: + +- **Backend A — GCP-managed SA key + `iamcredentials.signJwt`.** Google holds the + private key, never downloadable. AgentKeys calls Google's IAM API to sign + DWD JWTs on demand. +- **Backend B — AgentKeys TEE.** The RSA private key is sealed inside our own + enclave. The enclave signs DWD JWTs on demand, same wire format, same Google + token-exchange flow. + +Both back the same trait contract. The CLI and daemon don't know which one is +active. Switching backends is a config change. + +**Decision.** Stage 5 ships Backend A as an alternative path. v0.1 migrates to +Backend B (AgentKeys TEE) and Backend A stays available as a permanent +jurisdiction/deployment variant — the same way the existing `CredentialBackend` +spec already supports `MockBackend` (v0), `HeimaBackend` (v0.1), and +`CentralizedBackend` (regulated environments). + +## 2. Why an abstraction is required, not optional + +AgentKeys already has a clear architectural rule about credential signing +(`wiki/blockchain-tee-architecture.md` §6 rule #2): + +> **The TEE holds all private keys and does all computation.** The TEE holds the +> shielding key, the RSA JWT signing key, and per-user custodial wallet keys +> (per `pallet-bitacross` pattern). These are generated independently (not +> derived from a single master seed) and sealed inside the enclave. [...] +> No private key ever leaves the TEE. + +In v0, the "TEE" is a mock (a SQLite-backed process that holds what a real TEE +would hold). In v0.1, the TEE is Heima's enclave. In a Stage 5 prototype, we need +a signing authority for Gmail JWTs *today*, before v0.1 lands. A GCP-managed SA +key is a drop-in "TEE-equivalent" for signing: Google holds the key, we never +download it, and we call an API when we need a signature. It's an operator-trust +model rather than hardware-attestation model, but behind the `CredentialBackend` +trait the caller can't tell. + +The goal is to specify the contract so Stage 5's GCP implementation and v0.1's +TEE implementation are interchangeable. + +## 3. The trait contract (additions to `CredentialBackend`) + +Two new methods + one new `AuthRequestType` variant. Both backends implement +them; the CLI, daemon, and provisioner-scripts code call through the trait. + +```rust +pub enum AuthRequestType { + // existing variants: Pair, Recover, ScopeChange, HighValueRelease, KeyRotate ... + + /// Grant a child the ability to read/write mail on a set of Workspace + /// users. Biometric-gated on the master CLI (see §7). TTL = 30 days to + /// match the AgentKeys session-key policy (wiki/session-token.md §1). + EmailImpersonate { + user_pattern: EmailUserPattern, // exact, prefix, or /Automation OU + scopes: Vec, // Read, Modify, Send + ttl_seconds: u64, // ≤ 30 * 86400 + }, +} + +pub enum EmailUserPattern { + Exact(String), // "stage5test-20260419@wildmeta.ai" + Prefix(String), // "stage5test-*@wildmeta.ai" + OrgUnit(String), // "/Automation" — only our throwaway OU +} + +#[async_trait] +pub trait CredentialBackend /* existing */ { + /// Child side: mint a short-lived email access token, bounded by the + /// scope previously granted via AuthRequestType::EmailImpersonate. + /// Returns an opaque access token + its expiration. Both backends cap + /// ttl ≤ 3600s (Google's max DWD access-token lifetime). + /// + /// Authorization: the child's session token proves identity; the backend + /// checks the stored EmailImpersonate grant matches target_user and scope. + async fn mint_email_access_token( + &self, + session: &Session, + target_user: &str, + scope: EmailScope, + ) -> Result; + + /// Child side: perform a single email action. Backend mints a token + /// internally (never returns it to the child) and executes the call. + /// Preferred entry point for one-shot operations — leaves no token in + /// the child process at all. + /// + /// Authorization: same as mint_email_access_token. Backend audit-logs + /// every call regardless of backend type. + async fn email_operation( + &self, + session: &Session, + target_user: &str, + op: EmailOperation, + ) -> Result; +} + +pub enum EmailOperation { + ListMessages { query: String }, + GetMessage { id: String, format: MessageFormat }, + Send { raw_mime: Vec }, + Modify { id: String, add_labels: Vec, remove_labels: Vec }, + Trash { id: String }, +} + +pub struct EmailAccessToken { + pub access_token: String, // Bearer for api.gmail.com (≤ 1h) + pub expires_at_unix: i64, + pub target_user: String, // echoed for logging / sanity check + pub scope: EmailScope, +} +``` + +Callers never see a private key, JSON key file, or signed JWT. The wire is the +trait. + +## 4. Backend A — GCP-managed SA key + `iamcredentials.signJwt` + +### What Google holds + +- **SA private key** — created at SA creation, held inside Google's KMS-class + infrastructure, **never downloadable**. The `gcloud iam service-accounts keys + create` step in the current setup doc is *optional* and actually counterproductive + for this design (§6): the Google-managed key exists with or without a local + download. +- **DWD authorization** — the policy object created at step B4: "this SA can + impersonate any `wildmeta.ai` user with scopes `gmail.readonly, gmail.modify`." + +### What AgentKeys holds + +- **Nothing cryptographic.** The only credential is the OAuth token that + `agent@wildmeta.ai` has already authenticated with (refresh token in + `~/.config/gws/`), which grants the `roles/iam.serviceAccountTokenCreator` IAM + permission on the SA. + +### `mint_email_access_token` flow + +``` +1. child → backend: mint_email_access_token(session, target_user, scope=Read) +2. backend verifies session (JWT signature + expiration) ← standard AgentKeys path +3. backend checks stored EmailImpersonate grant: + - grant exists for this child? ✓ + - target_user matches grant's user_pattern? ✓ + - scope subset of grant's scopes? ✓ + - grant's TTL not expired (≤ 30 days from issuance)? ✓ +4. backend builds DWD JWT payload (iss=SA, sub=target_user, scope=gmail.readonly, ttl=1h) +5. backend → Google IAM: POST /v1/projects/-/serviceAccounts//signJwt + body: { payload: "" } + auth: agent@wildmeta.ai's OAuth token with roles/iam.serviceAccountTokenCreator + → returns signedJwt (string) +6. backend → Google OAuth: POST oauth2.googleapis.com/token + grant_type=urn:ietf:params:oauth:grant-type:jwt-bearer, assertion= + → returns access_token (1h TTL) +7. backend → child: EmailAccessToken { access_token, expires_at, target_user, scope } +8. backend → chain (async): audit extrinsic + { child, target_user, scope, backend=GCP, signjwt_request_id, timestamp } +``` + +### Trust assumptions + +- **Google operates the KMS correctly.** Same assumption as anyone using GCP + service accounts. Strong in practice; billions of dollars of infrastructure + run on this. +- **GCP IAM is the policy gate.** Only principals with + `roles/iam.serviceAccountTokenCreator` can call `signJwt`. We grant that role + exactly to `agent@wildmeta.ai`, scoped to this one SA, logged to Cloud Audit + Logs. +- **DWD authorization is the domain gate.** The SA can only impersonate users + in `wildmeta.ai` (the domain B4 authorized). Cross-domain impersonation is + structurally impossible. + +### What this backend cannot do that Backend B can + +- **Immutable audit** — GCP audit logs are strong, but they're operator- + controlled (Google is the operator). They're not chain-immutable. This is the + same "operator-verifiable vs publicly verifiable" tradeoff described in + `wiki/blockchain-tee-architecture.md` §5 under the pure-TEE-backend column. +- **"Leak of agent@wildmeta.ai is fully bounded"** — if `agent@wildmeta.ai`'s + OAuth refresh token is stolen, the attacker can mint JWTs for any + `wildmeta.ai` user (within the DWD scopes) for the token's lifetime. We + mitigate by narrow IAM, wrapper service with `/Automation` allow-list, and + revocation of the token on suspicion. But the ceiling is "any user in the + domain", not "just the granted child's user". + +## 5. Backend B — AgentKeys TEE + signJwt + +### What the TEE holds + +Same shape as the existing TEE-held primitives (`wiki/blockchain-tee-architecture.md` §1): + +- **RSA signing key for DWD JWTs** — generated inside the TEE, sealed storage, + never extractable. Distinct from the TEE's JWT *session-token* signing key + (which mints bearer tokens for master/children). Two RSA keys, two purposes. +- **DWD authorization registration** — the TEE registers its DWD public key + with Google Workspace admin console (same one-click flow as B4 today; the + "service account client ID" is replaced by the TEE's DWD pubkey fingerprint). + +### What AgentKeys holds + +- **Child session bearer tokens** (existing, 30-day) +- **No signing material** — identical posture to Backend A in that respect. + +### `mint_email_access_token` flow + +``` +1. child → TEE: mint_email_access_token(session, target_user, scope=Read) +2. TEE verifies session token (RSA signature + expiration) +3. TEE reads chain state: + - EmailImpersonate grant extrinsic for this child? ✓ + - target_user matches grant's user_pattern? ✓ + - scope subset of grant's scopes? ✓ + - grant's on-chain TTL not expired? ✓ +4. TEE builds DWD JWT payload (iss=TEE_sa_identity, sub=target_user, scope=..., ttl=1h) +5. TEE signs the JWT locally with its sealed DWD private key +6. TEE → Google OAuth: POST oauth2.googleapis.com/token with the signed JWT + → returns access_token (1h TTL) +7. TEE → child: EmailAccessToken { access_token, expires_at, target_user, scope } +8. TEE → chain (async): audit extrinsic + { child, target_user, scope, backend=TEE, jwt_nonce, timestamp } + signed by user's wallet key (TEE-held), submitted via paymaster +``` + +### Trust assumptions + +- **Intel SGX / AMD SEV / equivalent attestation is correct.** Code inside the + enclave is the code we signed. +- **Google honors the DWD registration.** Same as Backend A — DWD is still a + Google-side policy object. +- **Chain is the policy gate.** The `EmailImpersonate` grant is an on-chain + extrinsic. Revocation is an on-chain extrinsic. Scope changes are on-chain + extrinsics. + +### What this backend gives that Backend A cannot + +- **Chain-immutable audit** — every email access is a signed, block-included + extrinsic. Auditable by anyone with a Heima node. +- **Attacker-compromising-AgentKeys-can't-sign** — even if an attacker roots + the machine running the AgentKeys backend, they can't extract the DWD + signing key. The TEE is a mandatory signing gateway, and the enclave refuses + to sign outside the policy. +- **Per-child blast-radius bound** — if a child's bearer token leaks, the + attacker gets that child's scope (e.g. one email user pattern), not the + whole domain. The DWD key itself never leaves the enclave. +- **Revocation via on-chain list** — same ~6s propagation as every other + revocation in the system. A revoked child immediately fails at step 3. + +## 6. Side-by-side + +| Property | Backend A (GCP-managed) | Backend B (AgentKeys TEE) | +|---|---|---| +| DWD signing key location | Google KMS (never downloadable) | Our TEE (sealed) | +| Signing operator | Google | AgentKeys TEE operator | +| Policy gate | GCP IAM (`roles/iam.serviceAccountTokenCreator`) + DWD scope list | On-chain `EmailImpersonate` grant + on-chain revocation list | +| Audit log | GCP Cloud Audit Logs | On-chain extrinsic (publicly verifiable) | +| Audit immutability | Operator-controlled (Google) | Chain-finality (validator-attested) | +| Max access-token TTL | 3600 s (Google constraint) | 3600 s (same Google constraint) | +| Grant TTL (our layer) | 30 days (AgentKeys policy) | 30 days (AgentKeys policy) | +| Revocation latency | ~0 s (delete grant from our store) | ~6 s (on-chain list propagation) | +| Leak blast radius (AgentKeys side) | Any user in Workspace domain (DWD is domain-wide) | Only the granted child's user pattern | +| Leak blast radius (key material) | None — key is in Google KMS | None — key is in TEE | +| Infrastructure required | GCP project + one SA | Heima chain + TEE worker + DWD reg | +| Setup time | ~20 min (current `docs/stage5-workspace-email-setup.md`) | Weeks (TEE build + enclave deployment) | +| Appropriate stage | Stage 5 (now) | v0.1 (target) | + +Both deliver: no private key in memory or disk, per-child audit attribution, +30-day grant lifetime, 1-hour access-token lifetime, same wire format to Google. + +The honest difference: Backend A's audit is operator-trustable; Backend B's is +chain-verifiable. That's exactly the tradeoff the architecture doc +(`blockchain-tee-architecture.md` §5) chose for the credential layer overall. + +## 7. How Touch ID gates both backends (issue #11) + +This was explicit in the constraints list, and maps cleanly onto the existing +`AuthRequestType` pattern. + +### What's gated + +| Action | On which side | Gate | +|---|---|---| +| Grant `EmailImpersonate` to a child for the first time | **Master CLI** | **Touch ID required** (per #11 rule — creates credential-access privilege) | +| Change scope (e.g. add `Send` to an existing grant) | **Master CLI** | **Touch ID required** (same #11 rule applies — `ScopeChange` is already biometric-gated) | +| Revoke a grant | **Master CLI** | **Touch ID required** (#11 rule on `revoke`) | +| Mint an email access token within an existing grant | **Child/daemon** | **Silent** (#11 rule — normal ops stay silent) | +| Execute an email operation (`list_messages`, `send`, etc.) | **Child/daemon** | **Silent** (#11 rule — same as `store`/`read`) | + +In short: **Touch ID gates the grant, not the use.** Once the master has +approved "child-X may impersonate stage5test-*@wildmeta.ai for gmail-read for +30 days", child-X can mint access tokens and call Gmail APIs silently for 30 +days. If the grant expires or is revoked, the next `mint_email_access_token` +call fails. + +### Wire-level: where the Touch ID prompt fires + +Backend-agnostic. The biometric check sits in the master CLI, **before** the +CLI sends `approve_auth_request(request_id)` to the backend: + +``` +user types: agentkeys approve + ↓ +master CLI fetches AuthRequest (type = EmailImpersonate, details = {user, scopes, ttl}) + ↓ +master CLI displays: "Allow child X to read mail for stage5test-*@wildmeta.ai, 30 days?" + ↓ +master CLI calls require_biometric("grant email impersonation") ← #11 checkpoint + ↓ +Touch ID / Windows Hello / fprintd prompt → user confirms + ↓ +master CLI calls approve_auth_request(request_id) + ↓ +backend (A or B) persists the grant +``` + +Backend A stores the grant in its own datastore (SQLite for v0, whatever Stage 5 +uses). Backend B stores it as an on-chain extrinsic. Either way, the biometric +check fired *before* the backend ever saw the approval — so the backend doesn't +know or care which ceremony was used to obtain the master's consent. That means +Backend A and Backend B inherit #11's gate for free via the existing +`AuthRequestType` pipeline. + +### What's *not* gated by Touch ID — explicit list + +- `mint_email_access_token` — silent, agent-side, must work unattended. Gated + by the prior grant + session-token verification only. +- `email_operation` — same. +- Token refresh when the 1-hour access token expires — silent; the next child + call triggers a fresh mint. + +This exactly parallels how `agentkeys read openrouter` is silent today, while +`agentkeys approve ` (which grants the openrouter scope) is +biometric-gated. New backend, same rule. + +## 8. The 30-day constraint — how it maps + +`wiki/session-token.md` §1: *AgentKeys policy: 30-day TTL for session/bearer +tokens.* The constraint here maps to **the grant**, not to the email access +token. Three nested lifetimes: + +``` +┌──────────────────────────────────────────────────────────────┐ +│ │ +│ Child bearer token = 30 days (existing #10/#11) │ +│ │ │ +│ └── EmailImpersonate grant = 30 days (this spec) ──────────│ +│ │ │ +│ └── Email access token = 1 hour (Google constraint) │ +│ │ +└──────────────────────────────────────────────────────────────┘ +``` + +The child's bearer token authorizes its identity ("I am child-X"). The grant +authorizes its scope ("child-X may impersonate user pattern P for scope S"). +The access token is the ephemeral artifact actually accepted by Gmail's API. + +All three are independent: + +- **Access token expires first (1 h).** Child re-requests silently from the + backend. No user interaction. +- **Grant expires at 30 days.** Child's calls start failing + `GRANT_EXPIRED`. Master must re-approve (with Touch ID) to extend. This + re-consent matches the security model: a 30-day standing authorization to + impersonate arbitrary Workspace users should not be auto-renewable forever. +- **Bearer token expires at 30 days** (existing policy). Child re-authenticates + through the normal AgentKeys re-auth path. Independent of the grant. + +Typically the grant is *shorter* than 30 days for one-off demos (e.g. +24 h for a single stage5 run), but the ceiling of 30 days aligns with the +session-token spec. + +## 9. Practicality check (both paths must actually work) + +### Backend A — practical today + +| Requirement | Status | +|---|---| +| GCP project + SA created | ✅ Done (`wildmeta-agent-provisioner` / `stage5a-sa`) | +| DWD authorized for gmail.readonly, gmail.modify | ✅ Done (B4 of the setup doc) | +| `agent@wildmeta.ai` has `roles/iam.serviceAccountTokenCreator` on the SA | ⚠️ Needs grant (explicit, one `gcloud` command) | +| Custom admin role `stage5a-provisioner` assigned | ✅ Done | +| `/Automation` OU exists for throwaway users | ✅ Done | +| `CredentialBackend` Rust impl for Backend A | ❌ Not yet — needs `GcpEmailBackend` | +| `provisioner-scripts/src/lib/email.ts` reads `EmailAccessToken` from backend (replacing `imapflow`) | ❌ Not yet — tracked as Stage 5 follow-up | + +**Verdict: buildable in ~1 week of engineering**, nothing external blocks. + +### Backend B — path to practical + +| Requirement | Status | +|---|---| +| Heima TEE worker operational for credential signing | In progress (Heima integration TODO list) | +| DWD registration for the TEE's signing identity with Google Workspace | Unblocked technically (same admin-console flow as Backend A); open policy question whether Google accepts a TEE-attested public key as a DWD client ID | +| `EmailImpersonate` pallet extension for on-chain grants | Pallet work — deferred to v0.1 Heima integration | +| `HeimaEmailBackend` Rust impl | v0.1 | +| Attestation pipeline proves the TEE isn't modified | Standard TEE deployment work | +| Revocation list extension for `EmailImpersonate` grants | Pallet work — v0.1 | + +**Verdict: aligned with the existing v0.1 Heima work.** No new primitive needed +beyond what the Heima integration already ships (shielding key, JWT signing +key, pallet extensibility). The main open question is the Google-side policy +on DWD with an attested-TEE identity; if Google won't accept it, we fall back +to a hybrid where the TEE operator holds a Google-managed SA key and the TEE +calls `iamcredentials.signJwt` (essentially Backend A, but with the policy +gate shifted to the TEE). That's still a cleaner posture than raw Backend A +because the policy check is chain-authoritative. + +## 10. Migration plan + +``` +now ──────────────── Stage 5 ────────────────── v0.1 ──────────────── v0.2+ + ┌───────────────────────┐ ┌──────────────────────┐ + │ Backend A │ │ Backend B (primary) │ + │ GCP-managed SA key │ │ AgentKeys TEE │ + │ `signJwt` API │ │ sealed DWD key │ + │ IAM + DWD as gate │ │ on-chain grant │ + └───────────────────────┘ │ on-chain audit │ + └──────────────────────┘ + Backend A stays available + as a jurisdictional / deployment + variant (same pattern as + CentralizedBackend). +``` + +**Stage 5 — Backend A only** + +1. Add `GcpEmailBackend` implementing the new trait methods, backed by + `iamcredentials.signJwt` + a small in-process grant store (SQLite or a + file, parallel to `MockBackend`'s session store). +2. Extend `AuthRequestType` with `EmailImpersonate`. Wire the master CLI's + Touch ID check into `approve_auth_request` handler for this variant + (per §7). +3. Replace `provisioner-scripts/src/lib/email.ts`'s `imapflow` fetcher with a + caller that reads an access token from the backend (via `email_operation` + or `mint_email_access_token`). +4. The per-demo workflow in `docs/stage5-workspace-email-setup.md` shifts + from "export the JSON and set env vars" to "run `agentkeys approve` once, + confirm with Touch ID, then all subsequent demos run silently." + +**v0.1 — Backend B primary, Backend A optional** + +5. Build `HeimaEmailBackend` implementing the same trait. DWD registration for + the TEE's identity, or hybrid-via-GCP if the direct route doesn't fly with + Google. +6. The `EmailImpersonate` grant becomes an on-chain extrinsic; revocation + joins the standard on-chain revocation list. +7. Config chooses the backend: + + ```toml + [backend.email] + type = "heima" # default in v0.1 + # type = "gcp" # alternative for environments without Heima access + # type = "centralized" # future; for regulated jurisdictions + ``` + +**Always-true invariants** + +- The CLI, daemon, and provisioner-scripts code never import GCP or Heima + libraries directly. They speak the trait. +- `approve_auth_request(EmailImpersonate{…})` is Touch-ID-gated master-side + regardless of backend. +- `mint_email_access_token` and `email_operation` are silent agent-side + regardless of backend. +- `audit_event { child, target_user, scope, backend_type, ... }` is emitted + for every call; the storage layer differs, the event shape doesn't. + +## 11. Open questions / follow-ups + +1. **DWD with TEE-attested identity** — does Google Workspace admin console + accept a public key / DCAP attestation as a DWD client ID? If yes, Backend + B is clean; if no, Backend B proxies through Backend A's signing flow and + the "no key in Google's hands" property weakens. Track as + [#TBD — Heima DWD registration feasibility]. +2. **Per-child user provisioning still hits the SA-key abstraction** — the + `users.insert` / `users.delete` calls for throwaway accounts are Admin SDK + calls that don't go through DWD. Today they're authed by + `agent@wildmeta.ai`'s OAuth. Backend B inherits the same posture (or builds + its own TEE-held admin credential), which is an orthogonal problem from + the Gmail signing path. +3. **Refresh-token rotation for `agent@wildmeta.ai`** — Backend A depends on + that refresh token being valid. Should be rotated on a schedule and on any + suspected compromise. Add to `Rotation` section of + `docs/stage5-workspace-email-setup.md` once Backend A ships. +4. **Cross-scope grant compilation** — can one grant cover both `Read` and + `Send`? §3 says yes (scopes is a `Vec`), but the corresponding DWD scope + list in Google admin-console has to be pre-populated with both. Already + set in B4 today (gmail.readonly + gmail.modify). +5. **Backend A audit export** — Google Cloud Audit Logs can be routed to + Pub/Sub and then to BigQuery or an external SIEM. Add a section to the + setup doc with the `gcloud logging sinks create` command for operators who + want audit off-Google. Not a blocker. + +## 12. Cross-references + +- `docs/spec/credential-backend-interface.md` — the existing trait we're + extending. §3's `AuthRequestType` and the replay-resistance invariants + apply here unchanged. +- `wiki/blockchain-tee-architecture.md` §5 — the same + "stateless-TEE-plus-chain vs pure-TEE-backend" tradeoff, one layer down. + Backend B is the stateless-TEE-plus-chain choice; Backend A is the pure- + operator-backed choice. +- `wiki/session-token.md` §1 — 30-day TTL policy this spec inherits for + grants. +- `wiki/key-security.md` §1 — two-tier storage model; the `EmailAccessToken` + returned by `mint_email_access_token` is tier-1 (ephemeral bearer, handled + like a session token in memory) and `EmailImpersonate` grants are tier-2 + analog (long-lived, persisted). +- [#11](https://github.com/litentry/agentKeys/issues/11) — biometric gate. + §7 maps every new action onto its classification. +- `docs/stage5-workspace-email-setup.md` — operator setup for Backend A. A + pointer to this doc for the design rationale is added there alongside this + commit. diff --git a/docs/spec/plans/development-stages.md b/docs/spec/plans/development-stages.md index 0b00b27..2d94ac6 100644 --- a/docs/spec/plans/development-stages.md +++ b/docs/spec/plans/development-stages.md @@ -99,6 +99,35 @@ CEO plan with full decision record: `~/.gstack/projects/litentry-agentKeys/ceo-p --- +## Stage 5–7 roadmap update (2026-04-19) + +After the Stage 5a demo path landed and the email-system architecture + TEE-as-OIDC-provider design work matured, the post-Stage-5 roadmap is reordered. The new order is: + +| Stage | Title | Status | +|---|---|---| +| **5** | Provisioner: deterministic + patterns + quick-email demo (dedicated personal Gmail) | **Current** — stays as-is, ships the live OpenRouter-provision demo on the simplest email path | +| **6** | **Federated own-email** — `xxxxx@agentkeys-email.io` hosted on our infrastructure (AWS SES + TEE-derived Ed25519 DKIM + ES256 OIDC issuer + PrincipalTag-based per-user isolation) | **Next** | +| **7** | **Generalized OIDC provider** — expose `https://oidc.agentkeys.dev` as a universal federation target; any cloud that accepts external OIDC (AWS, GCP, Azure, Snowflake, Ali Cloud, K8s) trusts us once; bring-your-own domain/Workspace/GitHub paths become available | **After 6** | +| 6 (old) — npm Package + DX Polish | | **Postponed** (preserved below for reference) | +| 7 (old) — Full E2E Integration + MCP Auth Demo | | **Postponed** (preserved below for reference) | +| 8 (old) — Production Hardening | | **Postponed** (preserved below for reference) | +| 9 (old) — v0.1 Heima Migration Holding Pen | | **Postponed** (preserved below for reference) | + +### Why this reorder + +The three architectural wiki pages on our email/OIDC design surfaced a coherent v0.1 milestone that does more for product-and-user value than packaging or late-stage hardening: + +1. **Hosted-first default** — non-developer users get `xxxxx@agentkeys-email.io` with zero configuration, parallel to how AgentMail mints default-domain inboxes. See [[hosted-first]] (wiki). +2. **TEE holds all signing keys natively** — the Ed25519 DKIM key and ES256 OIDC-issuer key join the existing shielding/JWT/wallet derivation paths, all under `blockchain-tee-architecture.md` rule #2. See [[oidc-federation]] (wiki). +3. **Per-user isolation without per-user IAM** — JWT claim `agentkeys_user_wallet` → AWS session tag → `aws:PrincipalTag` in bucket/role policy = one bucket, N users, cryptographic separation. See [[tag-based-access]] (wiki). +4. **Knowledge-base decision deferred** — Stage 6/7 deliver the mechanism; which backend (GitHub / AWS S3 / Google Drive / Ali Cloud OSS) we ship as default is decided later per user segment. See [[knowledge-storage]] (wiki). + +**Broker-not-proxy principle.** Stages 6 and 7 both adhere to the principle that AgentKeys infrastructure mints ephemeral credentials and the daemon talks to remote services directly via MCP. Our backend never proxies per-user reads/writes. This keeps compute cost flat with user count (scales with sign-up rate, not operation frequency) and aligns with `blockchain-tee-architecture.md` rules #2–#3. + +Full stage contracts for 6 and 7 appear below in their own sections, right after Stage 5b and before the postponed ex-6/7/8/9 sections. + +--- + ## Stage 0: Foundation — Types + Core Trait **Goal:** Define the shared types and the `CredentialBackend` trait that every other crate depends on. @@ -825,7 +854,199 @@ Locked architectural decisions for 5b: --- -## Stage 6: npm Package + DX Polish +## Stage 6: Federated Own Email (`@agentkeys-email.io` hosted default) + +**Status (2026-04-19):** next stage after 5a/5b. + +**Goal:** Every AgentKeys user (non-developer default) gets a working agent email inbox at `xxxxx@agentkeys-email.io` with zero setup — no DNS, no admin console, no Workspace subscription, no custom domain. Hosted on our AWS SES infrastructure with TEE-held signing keys, per-user isolation enforced via PrincipalTag from JWT claims, chain-immutable audit. + +**Why this is Stage 6:** it replaces the Stage 5 "dedicated personal Gmail" quick demo with production infrastructure that scales to every AgentKeys user without per-user setup friction. Moves AgentKeys from "demo email works" to "every agent has an email inbox the moment it exists." + +### Architecture summary + +See `docs/spec/ses-email-architecture.md` for the full spec. High-level: + +1. **We operate `agentkeys-email.io`** — domain registered to AgentKeys, MX pointing at AWS SES `inbound-smtp.us-east-1.amazonaws.com`, DKIM records pointing at TEE-held keys. +2. **TEE-derived keys, both sealed:** + - `derive("dkim/agentkeys-email.io/v1")` → **Ed25519** DKIM key (RFC 8463) — signs outbound DKIM header + - `derive("oidc/issuer/v1")` → **ES256** OIDC-issuer key — signs JWTs for AWS STS federation +3. **Inbound path:** SES receives → writes raw MIME to S3 `agentkeys-mail///.eml` → no Lambda, no per-email compute on our side. +4. **Outbound path:** agent's daemon asks AgentKeys for temp SES send creds → TEE mints OIDC JWT → `sts:AssumeRoleWithWebIdentity` → daemon calls `SendRawEmail` directly. Our backend does zero work per-send. +5. **Read path:** daemon asks for temp S3 read creds → minted with `PrincipalTag/agentkeys_user_wallet=` → daemon calls S3 directly → bucket policy conditions ensure daemon can only read its own user's prefix. +6. **Audit:** every credential mint emits an on-chain extrinsic attributed to the calling child wallet. + +### Crates / Packages + +- `agentkeys-email-auth` (new Rust crate) — handler for the TokenAuthority covering email-related operations: mint S3/SES temp creds, sign DKIM headers, emit audit extrinsics. +- `agentkeys-mail-receive-stack` (Terraform / CDK module) — one-shot deploy of the SES receipt rule, S3 bucket with PrincipalTag-conditioned policy, IAM role with OIDC trust. Not an AgentKeys crate — shipped as operator infrastructure. +- Daemon updates — new MCP tools: `email.list`, `email.get`, `email.send`. Each unwraps into `agentkeys mint ` + direct SES/S3 call. +- `provisioner-scripts/src/lib/email.ts` — replace the `imapflow`-based fetcher with an S3-direct fetcher backed by minted creds. + +### Deliverables + +- [ ] `agentkeys-email.io` domain registered, SES domain verified +- [ ] MX + Ed25519 DKIM CNAME + SPF + DMARC published in our DNS +- [ ] S3 bucket `agentkeys-mail` + receipt rule configured +- [ ] IAM OIDC provider `oidc.agentkeys.dev` registered in our AWS account +- [ ] IAM role `agentkeys-agent` with trust policy conditioning on `mrenclave` + non-empty `agentkeys_user_wallet` tag +- [ ] Bucket policy with `${aws:PrincipalTag/agentkeys_user_wallet}` per-prefix isolation +- [ ] TEE-side JWT minter with ES256 derived key at `oidc/issuer/v1` +- [ ] TEE-side Ed25519 DKIM signing (`dkim/agentkeys-email.io/v1`) with locally-signed MIME before SES delivery +- [ ] Thin HTTPS proxy at `https://oidc.agentkeys.dev` serving `/.well-known/openid-configuration` + `/.well-known/jwks.json` (Let's Encrypt) +- [ ] Chain extrinsic pallet for `CredentialMinted` audit events +- [ ] Daemon MCP tools wired to real minted creds +- [ ] Stage 5's `provisioner-scripts` updated to read OTPs from the hosted inbox + +### Tests + +| Test | What it validates | +|---|---| +| `email::inbox_create_allocates_address` | New agent gets a unique `@agentkeys-email.io` deterministically derived from its wallet | +| `email::inbound_lands_in_user_prefix` | SES receives to `agent-X@agentkeys-email.io` → raw MIME in `s3://agentkeys-mail/0xX/agent-X/...` | +| `email::daemon_reads_own_prefix` | Daemon with `agentkeys_user_wallet=0xA` tag → S3 list/get on `0xA/*` succeeds | +| `email::daemon_blocked_from_other_prefix` | Daemon with `0xA` tag → S3 get on `0xB/*` returns AccessDenied | +| `email::dkim_verifies_at_recipient` | Send test message to a Gmail inbox → receiver sees `DKIM-Signature ed25519` header, Gmail reports `dkim=pass` | +| `email::jwt_without_wallet_claim_denied` | JWT missing `agentkeys_user_wallet` → `sts:AssumeRoleWithWebIdentity` fails per role trust policy | +| `email::audit_emitted_on_mint` | Every SES/S3 credential mint emits a chain extrinsic with `(child, scope, operation, timestamp)` | +| `email::grant_revocation_propagates` | Revoke user's email grant → next mint attempt fails within ≤6s | + +### Reviewer E2E Checklist + +```bash +# Create an agent; it has an email address +agentkeys agent create my-agent +# → prints: "my-agent has inbox abc123@agentkeys-email.io" + +# Send mail to it from outside +echo "test body" | mail -s "hello" abc123@agentkeys-email.io + +# Agent reads its inbox +agentkeys run my-agent -- \ + claude-mcp-client email.list | jq +# → shows the hello message + +# Agent sends mail +agentkeys run my-agent -- \ + claude-mcp-client email.send \ + --to me@example.com --subject "reply" --text "hi" + +# Audit trail on chain +agentkeys usage my-agent --filter email +# → shows mint events for s3.read and ses.send +``` + +### Stage Contract + +- **Inputs:** Stages 0-5a complete; TEE integration available (chain read/write + sealed key derivation); TokenAuthority trait stable. +- **Outputs:** Every AgentKeys user has a working email inbox on `agentkeys-email.io`. No user-side setup required. Per-user isolation enforced cryptographically. +- **Done when:** All 8 tests pass. An agent created via the CLI has a functioning inbox that can send/receive mail with real MTAs. Inbound deliverability verified against at least Gmail + Outlook. + +### Deferred to Stage 7+ (not blocking Stage 6) + +- Bring-your-own custom domain (`bots.theircompany.com`) — same architecture, different domain id in the DKIM derivation path +- Bring-your-own Workspace (DWD path) — existing `docs/stage5-workspace-email-setup.md` becomes the runbook; not the default +- Email drafts as HITL primitive (daemon-side, per our revised broker-not-proxy thesis) +- Advanced features: labels, threads, allow/block lists — implemented daemon-side in MCP; not server features + +--- + +## Stage 7: Generalized OIDC Provider (universal federation) + +**Status (2026-04-19):** follows Stage 6. + +**Goal:** `https://oidc.agentkeys.dev` is publicly documented as a universal OIDC identity provider. Any cloud or service that accepts external OIDC federation (AWS IAM, GCP Workload Identity Federation, Azure AD, Snowflake External OAuth, Ali Cloud RAM, Kubernetes, etc.) trusts our TEE-signed JWTs. Advanced bring-your-own paths (custom AWS account, custom GCP project, custom GitHub org) become possible by registering our issuer once per user. + +**Why this is Stage 7:** Stage 6 delivers the hosted-default path using OIDC federation inside our own AWS account. Stage 7 generalizes that capability as a public primitive — the same TEE-derived ES256 issuer key now federates into any user's or organization's cloud account without additional key material. + +### Architecture summary + +See [[oidc-federation]] (wiki) for the full design. High-level: + +1. **OIDC issuer endpoint** — stable HTTPS URL `https://oidc.agentkeys.dev` with Let's Encrypt cert, static `/.well-known/openid-configuration` and `/.well-known/jwks.json` served by a thin proxy. +2. **One signing key** — ES256 at `derive("oidc/issuer/v1")`, reused from Stage 6. No new key material. +3. **Per-consumer trust registration** — each user / org registers our OIDC issuer once in their cloud account (AWS `CreateOpenIDConnectProvider`, GCP `WorkloadIdentityPool`, Ali RAM `CreateOIDCProvider`, etc.) and sets up an IAM role trust policy. +4. **JWT format is consistent across consumers** — same `sub`, same `aud` varies per consumer, same `agentkeys_*` claim set for tag-based isolation. +5. **Consumer-side per-user isolation** — each consumer's trust policy conditions on `PrincipalTag` / attribute-mapping from the JWT's `agentkeys_user_wallet` claim. + +### Crates / Packages + +- Primarily **operator/documentation work** — the TEE signing path already exists from Stage 6. Stage 7 adds: +- `agentkeys-oidc-registration-cli` (new) — CLI commands that emit ready-to-paste configuration snippets for each major cloud: + - `agentkeys oidc register aws --account --region ` → prints the AWS CLI commands + JSON trust policy + - `agentkeys oidc register gcp --project ` → prints the gcloud commands for Workload Identity Pool + - `agentkeys oidc register alicloud --account ` → prints ali CLI commands + - `agentkeys oidc register github-app` → registers a new GitHub App installation path using derived ECDSA app key + +### Deliverables + +- [ ] `https://oidc.agentkeys.dev` publicly reachable, stable, documented +- [ ] Discovery doc + JWKS published and cacheable; rotation procedure documented +- [ ] AWS IAM OIDC registration runbook (for operators' own AWS accounts) +- [ ] GCP Workload Identity Federation registration runbook +- [ ] Ali Cloud RAM OIDC provider registration runbook +- [ ] Azure AD Federated Credential registration runbook +- [ ] `agentkeys-oidc-registration-cli` with four `register` subcommands +- [ ] Integration tests: end-to-end credential mint for each of the four clouds +- [ ] GitHub App (`AgentKeys Memory`) registered, ECDSA app key derived at `derive("github-app/v1")`, installation-token minting path +- [ ] Public documentation: "How to connect your AWS account to AgentKeys" + +### Tests + +| Test | What it validates | +|---|---| +| `oidc::discovery_doc_valid` | `curl https://oidc.agentkeys.dev/.well-known/openid-configuration` returns valid OIDC metadata | +| `oidc::jwks_served` | JWKS endpoint returns current ES256 public key with correct `kid` | +| `oidc::aws_federation_end_to_end` | TEE-minted JWT exchanged at AWS STS → usable temp creds → target S3 op succeeds | +| `oidc::gcp_federation_end_to_end` | Same flow via GCP Workload Identity Federation → GCS op succeeds | +| `oidc::alicloud_federation_end_to_end` | Same via Ali Cloud RAM → OSS op succeeds | +| `oidc::azure_federation_end_to_end` | Same via Azure AD Federated Credential | +| `oidc::key_rotation_dual_key_window` | Both v1 and v2 keys in JWKS during rotation window; JWTs signed by either accepted | +| `oidc::tag_claim_required_for_tagged_role` | JWT without `agentkeys_user_wallet` claim → role assumption denied where bucket policy requires tag | +| `github_app::installation_token_mint` | TEE signs app-level JWT with derived ECDSA → GitHub returns installation token | +| `registration_cli::aws_commands_executable` | `agentkeys oidc register aws ...` output runs on a fresh AWS account and registers successfully | + +### Reviewer E2E Checklist + +```bash +# Register our OIDC provider in a fresh test AWS account +agentkeys oidc register aws --account 999999999999 --region us-east-1 +# → prints commands; run them; IAM provider `oidc.agentkeys.dev` shows up + +# Create a role in that AWS account trusting our provider with PrincipalTag condition +# (commands included in the CLI output) + +# Demonstrate federation from an agent +agentkeys run test-agent -- \ + aws s3 ls s3://their-bucket/ +# → succeeds; CloudTrail shows the assumed role with the session tag + +# Rotate the issuer key; verify zero-downtime +agentkeys oidc rotate --window 24h +# → JWKS now has both v1 and v2; new JWTs signed with v2 + +# Install the GitHub App on a test org +# (via GitHub UI) +agentkeys run test-agent -- \ + claude-mcp-client github.list_repos --owner testorg +# → succeeds; mint shows in our audit log +``` + +### Stage Contract + +- **Inputs:** Stage 6 complete (OIDC issuer key and endpoint exist but are only used internally for our own AWS account). +- **Outputs:** Our OIDC provider is a publicly documented federation target. Users can plug their own AWS / GCP / Azure / Ali Cloud / GitHub accounts into AgentKeys without giving us static credentials. +- **Done when:** All 10 tests pass. End-to-end federation verified against at least AWS, GCP, and Ali Cloud. Registration CLI tested by a fresh external operator. + +### Deferred past Stage 7 + +- Enterprise-specific advanced integrations (SAML federation, SCIM provisioning) +- On-chain record of active OIDC-issuer pubkey fingerprint for external auditors +- Per-tenant OIDC issuer URLs (`oidc.agentkeys.dev/tenant//`) with isolated issuer keys per tenant +- Workload Identity Federation into consumer clouds like Cloudflare, Fly, or others AgentKeys users may prefer + +--- + +## Stage 6 (POSTPONED; original scope: npm Package + DX Polish) > **Status (2026-04-16 CEO review): POSTPONED past v0.** v0 ships at Stage 7 with `cargo install` and GH-release prebuilt binaries as the distribution path. npm packaging, `install.sh`, README polish, and the remaining DX artifacts move to the v0.1 milestone alongside Stage 5b and Stage 8. Stage 6 content below is preserved as-is for v0.1 execution — no scope change to Stage 6 itself, only a dependency relaxation. > @@ -881,7 +1102,7 @@ npx @agentkeys/daemon # → starts daemon, shows pair code --- -## Stage 7: Full E2E Integration + MCP Auth Demo +## Stage 7 (POSTPONED; original scope: Full E2E Integration + MCP Auth Demo) **Goal:** The complete system works end-to-end across all components. Includes the MCP auth demo (wrapping MCP servers with `agentkeys run`). @@ -1010,7 +1231,7 @@ Additionally verify: --- -## Stage 8: Production Hardening (Post-MVP) +## Stage 8 (POSTPONED; original scope: Production Hardening, Post-MVP) **Goal:** Close the daemon-side memory hygiene gaps not covered by Stage 3 kernel hardening, plus CLI defensive features and credential lifecycle controls. Stage 3 protects against external probes (ptrace, `/proc/pid/mem`, swap, core dumps); Stage 8 protects against internal bugs and reduces the in-memory exposure window for credential bytes that flow through the daemon between backend fetch and agent delivery. @@ -1156,7 +1377,7 @@ agentkeys read $WALLET openrouter --- -## Stage 9: v0.1 Heima Migration Design Decisions (Holding Pen) +## Stage 9 (POSTPONED; original scope: v0.1 Heima Migration Design Decisions Holding Pen) **Purpose:** Capture v0.1-specific design decisions that were resolved during v0 planning so they don't have to be rediscovered when migration begins. This is **not a formal stage** in the sense of Stages 0-8 — no harness deliverables, no unit tests, no stage-done script. It is a design notes section for things that were decided now but will be executed later. diff --git a/docs/spec/ses-email-architecture.md b/docs/spec/ses-email-architecture.md new file mode 100644 index 0000000..f4d917e --- /dev/null +++ b/docs/spec/ses-email-architecture.md @@ -0,0 +1,368 @@ +# SES-Based Email Architecture for AgentKeys + +**Date:** 2026-04-18 (updated 2026-04-19 with hosted-default + PrincipalTag isolation) +**Status:** Design +**Stage:** **Stage 6** primary email backend — hosted `xxxxx@agentkeys-email.io` is the default for all users; BYO custom domain deferred to Stage 7+ (alternative: Google Workspace DWD for enterprise, see `docs/stage5-workspace-email-setup.md`) +**Related:** +- `docs/spec/email-signing-backends.md` — generalized backend comparison +- `docs/spec/credential-backend-interface.md` — the trait we're extending +- `wiki/email-system.md` — high-level wrap-up + usage isolation rules +- `wiki/blockchain-tee-architecture.md` §5 — audit model this spec inherits +- Issue [#11](https://github.com/litentry/agentKeys/issues/11) — biometric gate + +--- + +## 1. Why a dedicated spec + +Email is the **dominant human-in-the-loop channel** every external API signup, OTP verification, and password-reset path runs through. If agents are going to provision credentials at machine speed, the email primitive has to be: + +1. **Per-agent isolated** — each agent's inbox is independent; compromising one doesn't leak others. +2. **Chain-immutable audit** — matches AgentKeys' headline security claim. +3. **Fast to provision** — inbox ready for first inbound mail within milliseconds. +4. **Cheap to scale** — thousands of throwaway inboxes per month without a seat-license model. +5. **No foreign admin-console step per inbox** — one-time domain onboarding only. +6. **Zero user setup in the default path** — Stage 6 target is "inbox exists the moment the agent is created; no DNS, no admin console, no Workspace subscription on the user side." +7. **Broker-not-proxy** — our backend mints credentials; the daemon calls SES and S3 directly via MCP. Per-operation compute on our side is zero. See [[hosted-first]] for the user-segmentation framework and [[knowledge-storage]] for the parallel deferred decision on knowledge storage. + +Gmail Workspace with DWD satisfies 1 but fails 2–7. AgentMail (SaaS) satisfies 1, 3, 4, 6 but fails 2 and adds vendor lock. **AWS SES with our own thin inbox-abstraction layer satisfies all seven.** This spec defines that layer. + +## 2. How we relate to AgentMail + +AgentMail is a SaaS built on AWS SES; verified by DNS (`agentmail.to` MX → `inbound-smtp.us-east-1.amazonaws.com`) and by their open Zod schemas exposing `dkim_signing_type: 'AWS_SES' | 'BYODKIM'`. They **proxy** per-operation on the user's behalf: their servers parse MIME, compute threads, manage drafts/labels/webhooks. Compute cost scales with operation frequency. + +We use the same SES primitives (inbound-to-S3, `SendRawEmail`, domain DKIM/MX/SPF) but **do not adopt the SaaS feature surface**. Per the broker-not-proxy principle (rule #4 in `wiki/blockchain-tee-architecture.md`), threading, labels, drafts, allow/block lists, webhook fan-out, and per-operation events live daemon-side (via MCP) or are absent until a real use case forces them in. Our backend is a credential broker + audit layer. Per-operation compute on our side is zero. + +**One shape we kept:** `inbox_id` IS the email-address string (`abc123@agentkeys-email.io`), not an opaque uuid. Saves an ID↔address lookup on every call. That's it — everything else from AgentMail's model stays in AgentMail's backend. + +## 3. Architectural goals + +Three invariants this spec preserves, derived directly from the existing AgentKeys specs: + +1. **The TEE is the sole holder of signing material** (`blockchain-tee-architecture.md` rule #2). In v0.1 the TEE holds the SES IAM access credentials + the DKIM signing key (under `BYODKIM`). +2. **The chain is the sole source of persistent truth** (rule #1). All email grants, inbox ownership records, and audit events are on-chain extrinsics. +3. **Clients hold only bearer tokens, never keys** (rule #3). Children call a broker over the wire; the broker — colocated with the authority — enforces policy. + +The SES layer is plumbing under those invariants, not an exception. + +## 4. Data model (minimal — broker-not-proxy) + +Only two entities live on our side. Everything else — messages, threads, drafts, labels, attachments — lives daemon-side (parsed from raw MIME in S3 on demand) or not at all. + +```rust +// Inbox: on-chain row tying a user wallet to an email address they own. +pub struct Inbox { + pub inbox_address: EmailAddress, // "abc123@agentkeys-email.io" — address IS the ID + pub user_wallet: WalletAddress, // owner; maps to PrincipalTag for isolation + pub agent_wallet: WalletAddress, // which child agent uses this inbox + pub domain_id: String, // foreign key to Domain + pub created_at: SystemTime, + pub deleted_at: Option, // soft delete; S3 lifecycle prunes raw mail +} + +// Domain: operator-level configuration for a custom domain (hosted default: agentkeys-email.io). +pub struct Domain { + pub domain_id: String, // "agentkeys-email.io" + pub dkim_mode: DkimMode, // AwsSes (default) | TeeByoDkim (future) + pub dkim_selector: Option, // only for TeeByoDkim + pub status: DomainStatus, // NotStarted | Pending | Verifying | Verified | Failed + pub created_at: SystemTime, + pub updated_at: SystemTime, +} +``` + +That's it on our side. No `Message`, `Thread`, `Draft`, `AttachmentMetadata`, `EmailListEntry`, `Webhook` structs — if the daemon wants threading or labels or drafts, it stores that in its local state or in a JSON manifest file under the user's S3 prefix. Our backend never reads those concerns. + +## 5. Events (minimal — on-chain extrinsics) + +Three events cover everything our backend is responsible for. Per-message events (received, sent, delivered, bounced) are SES-native SNS notifications the daemon subscribes to directly; they are not our chain events. + +```rust +pub enum EmailEvent { + InboxCreated { inbox_address, user_wallet, agent_wallet, timestamp }, + InboxDeleted { inbox_address, timestamp }, + CredsMinted { child_wallet, inbox_address, scope: Scope, expires_at, timestamp }, + // ^ the audit record for every SES/S3 credential handed to a daemon +} +``` + +`CredsMinted` is the auditable event that backs "every credential access is public" for email — fires per mint, not per underlying SES/S3 call, which is the right granularity for the broker-not-proxy shape. + +## 6. Receive pipeline (inbound) + +One SES receipt rule writes raw MIME directly to S3. No Lambda. No MIME parsing. No metadata DB write. No per-email compute on our side. + +``` +External SMTP → SES MX (inbound-smtp.us-east-1.amazonaws.com) + → receipt rule: recipient matches *@agentkeys-email.io + → S3Action: put raw MIME at s3://agentkeys-mail///.eml + (done — ~200 ms end-to-end, no AgentKeys compute) +``` + +Daemon side (not our concern, but for completeness): the daemon polls its S3 prefix or subscribes to an SNS topic SES writes to; it lists/gets objects with minted creds and parses MIME locally. + +## 7. Send pipeline (outbound) + +Daemon mints credentials once; uses them to call SES directly. Our backend's involvement per-send: zero after the mint. + +``` +daemon → AgentKeys: mint SES send creds for this inbox (OIDC JWT exchange at STS) + → gets temp AWS creds (≤1h) + +daemon → assemble MIME locally with the right From: + → SES SendRawEmail with temp creds + (IAM role condition pins ses:FromAddress to this daemon's inbox address; + AWS_SES DKIM signs outbound with the domain's SES-managed key) + → SES delivers + +(delivery / bounce / complaint notifications come from SES → SNS topic the daemon + can subscribe to directly; we do not proxy them) +``` + +On our side, the only work per send is the initial credential mint, which is amortized across many sends within the 1-hour temp-cred lifetime. TEE-held BYODKIM is a future option (§15 open questions) if operator trust in SES-managed DKIM becomes insufficient; AWS_SES DKIM is the default. + +## 8. Auth, scope, and TTL + +Per the general `TokenAuthority` abstraction (see `docs/spec/email-signing-backends.md` §3 for the full trait): + +``` +Child bearer token 30 days (AgentKeys policy) + └── EmailImpersonate grant 30 days (master approves under Touch ID) + └── Live per-call policy check in the broker — no short-lived + email access token needed, because SES authorizes AgentKeys' + backend identity (the IAM role), not per-user identity. +``` + +The `grant.allowed_subjects` for an SES inbox grant is typically a pattern: + +- `Exact("bot-42@bots.wildmeta.ai")` — one specific inbox +- `Prefix("bot-*@bots.wildmeta.ai")` — all inboxes with that prefix (owned by this child) +- `DomainWildcard("bots.wildmeta.ai")` — any inbox on this domain (master-only pattern) + +`grant.allowed_scopes` is a small enum (more scopes are daemon-side concerns, not ours): + +- `Read` — mint S3 read creds scoped to the inbox's prefix +- `Send` — mint SES send creds scoped to the inbox's `FromAddress` +- `Admin` — create/delete the inbox itself (master-only grant) + +Touch ID gates the *creation* of a grant (master CLI, via the existing `approve_auth_request` path per #11). All subsequent `mint_creds` calls from the child are silent. + +Higher-level concerns like drafts-with-human-approval, per-message reply/forward semantics, or HITL gating of high-value sends are **daemon-side**: the daemon's MCP can offer `draft_create` / `draft_approve` tools that store drafts in S3 and only call `mint Send creds` after approval. No server-side draft state. + +## 9. AWS SES primitives we use + +| Primitive | Use | +|---|---| +| **Verified identity (domain)** | `bots.wildmeta.ai` verified once; all inboxes live under it. | +| **DKIM records** | 3 CNAMEs if `AwsSes` type; 1 CNAME to our key fingerprint if `ByoDkim` (v0.1 TEE-held; **Ed25519 per RFC 8463**, derived from TEE master seed — see §10.5). | +| **MX** | `10 inbound-smtp.us-east-1.amazonaws.com` — single record, catches all inbound to the domain. | +| **Receipt rule** | One rule matching `*@agentkeys-email.io` → `S3Action` writes raw MIME directly to the bucket. No Lambda. | +| **SES SendRawEmail** | Outbound. IAM access is via OIDC federation from the TEE — no static access keys held anywhere. See §10.5. | +| **SES event destinations** (SNS) | Delivery / bounce / complaint notifications. Subscribed to by the daemon directly, not proxied by us. | +| **Mail-from subdomain** (optional) | `bounce.agentkeys-email.io` for bounce handling — adds 2 records. | +| **S3 for raw MIME** | `s3://agentkeys-mail///.eml`. Bucket policy with `aws:PrincipalTag/agentkeys_user_wallet` enforces per-user isolation (§10.4). Lifecycle rule prunes > 90 days. | + +## 10. Domain setup (one-time per custom domain) + +> **Stage 6 default — `agentkeys-email.io`.** We (AgentKeys) operate this domain; users do not configure DNS. The records below are the one-time setup on AgentKeys' side and are preserved here so the pattern is reproducible for Stage 7+ bring-your-own-domain users. + +DNS records needed for a fresh domain (AgentKeys-hosted default: `agentkeys-email.io`; user-owned BYO example shown as `bots.wildmeta.ai`): + +| Record | Value | Purpose | +|---|---|---| +| `bots.wildmeta.ai. MX` | `10 inbound-smtp.us-east-1.amazonaws.com.` | Inbound | +| `_amazonses.bots.wildmeta.ai. TXT` | `` | SES domain verification | +| `._domainkey.bots.wildmeta.ai. CNAME` | `.dkim.amazonses.com.` | DKIM selector 1 (AWS_SES mode) | +| `._domainkey.bots.wildmeta.ai. CNAME` | `.dkim.amazonses.com.` | DKIM selector 2 | +| `._domainkey.bots.wildmeta.ai. CNAME` | `.dkim.amazonses.com.` | DKIM selector 3 | +| `bots.wildmeta.ai. TXT` (SPF) | `v=spf1 include:amazonses.com ~all` | Outbound authorization | +| `_dmarc.bots.wildmeta.ai. TXT` | `v=DMARC1; p=quarantine; rua=mailto:dmarc@wildmeta.ai` | DMARC | + +For `BYODKIM` (v0.1, TEE-held key): one CNAME pointing to the TEE's registered DKIM pubkey fingerprint, replacing the three AWS-provided CNAMEs. DKIM-signing moves into the enclave. + +Optionally, a MAIL FROM subdomain `bounce.bots.wildmeta.ai` with its own MX + SPF for bounce handling (adds 2 records). + +State machine per domain: `NotStarted → Pending → Verifying → Verified` (or `Invalid` / `Failed`). Our backend polls SES's `GetIdentityVerificationAttributes` every ~60 s during `Verifying` and transitions the state. + +We **deliver these records as a BIND zone file download** (same UX as AgentMail) so operators can drop into Route 53, Cloudflare, etc. with one import. + +## 10.4. Per-user isolation on the shared `agentkeys-mail` bucket — PrincipalTag pattern + +Stage 6 hosts every user's inbox in one AWS account, one S3 bucket, one IAM role. Per-user isolation is cryptographically enforced by AWS using the **PrincipalTag-from-JWT-claim** pattern. See [[tag-based-access]] for the full mechanics. + +### Summary of the mechanism + +1. TEE mints OIDC JWT with `agentkeys_user_wallet: ` claim +2. `sts:AssumeRoleWithWebIdentity` (with `sts:TagSession` allowed on the role) maps the claim to a session tag +3. Bucket policy on `agentkeys-mail`: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "AllowListOwnPrefix", + "Effect": "Allow", + "Principal": { "AWS": "arn:aws:iam:::role/agentkeys-agent" }, + "Action": "s3:ListBucket", + "Resource": "arn:aws:s3:::agentkeys-mail", + "Condition": { "StringLike": { "s3:prefix": "${aws:PrincipalTag/agentkeys_user_wallet}/*" } } + }, + { + "Sid": "AllowCrudOwnPrefix", + "Effect": "Allow", + "Principal": { "AWS": "arn:aws:iam:::role/agentkeys-agent" }, + "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"], + "Resource": "arn:aws:s3:::agentkeys-mail/${aws:PrincipalTag/agentkeys_user_wallet}/*" + }, + { + "Sid": "DenyEverythingElse", + "Effect": "Deny", + "Principal": { "AWS": "arn:aws:iam:::role/agentkeys-agent" }, + "NotAction": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket"], + "Resource": "*" + } + ] +} +``` + +4. Role trust policy requires the claim to be non-empty (`StringNotEquals aws:RequestTag/agentkeys_user_wallet ""`), so JWTs missing the claim cannot assume the role. + +### Why this is the right primitive + +Without PrincipalTag isolation, Stage 6 would force one of three compromises: + +- **Per-user IAM role** — scales only to a few thousand users (AWS role quotas) and creates per-user operator state we have to manage +- **Per-user S3 bucket** — expensive at scale and the same quota problem +- **Our backend proxies every read/write** — violates rule #4 (broker-not-proxy), grows compute cost with operation frequency + +PrincipalTag is the one path that keeps a single shared bucket, zero per-user state on our side, and cryptographic per-user isolation enforced by AWS itself. + +--- + +## 10.5. Key derivation and cloud credentials (no static secrets) + +Two TEE-derived key families serve the email backend, both sealed inside the enclave and derived deterministically from the TEE master seed: + +``` +TEE master seed (sealed; disaster-recovery root; one per enclave) + ├── derive("dkim//") → Ed25519 (DKIM signing, per custom domain) + │ purpose: sign DKIM-Signature header on every outbound MIME before SES carries it + │ why Ed25519: RFC 8463; fast; we control both sides of the signing contract + │ + └── derive("oidc/issuer/") → ES256 (ECDSA P-256; OIDC-issuer JWT signing) + purpose: mint short-lived JWTs that AWS STS (+ GCP, Azure, …) exchange for temp creds + why ES256: AWS OIDC accepts only RSA (RS256/384/512) and ECDSA (ES256/384/512) — NOT Ed25519 +``` + +Both derivation algorithms are SLIP-0010 / BIP-32-style — the same primitive the TEE already uses for custodial wallet keys. Both keys survive TEE restart from master seed alone. Both rotate via version-bump in the path. + +### Why the OIDC-issuer key exists + +AWS SES API calls require IAM authentication. Rather than seal a long-lived IAM access key inside the TEE, we use the **AWS IAM OIDC federation** path: + +1. TEE exposes itself as a conforming OIDC identity provider at `https://oidc.agentkeys.dev` +2. A thin HTTPS proxy in front of the TEE serves static `/.well-known/openid-configuration` + `/.well-known/jwks.json` (containing the TEE's public ES256 key); proxy holds no private material +3. When the backend needs to call SES, TEE mints a 5-minute JWT signed with the ES256 key, containing claims like `{iss, sub, aud=sts.amazonaws.com, exp, agentkeys_operation=ses.send}` +4. Backend calls `sts:AssumeRoleWithWebIdentity` with the JWT → AWS returns temp SES credentials (≤1h) +5. Backend makes SES API calls with temp creds; discards them after use + +Net: **no static AWS credentials at rest anywhere in AgentKeys.** TEE compromise = all federated creds compromised (same as before). Anything short of TEE compromise = zero blast radius. + +The same OIDC provider federates into GCP Workload Identity, Azure AD, Snowflake, Kubernetes, and any other external-OIDC consumer. One issuer, N clouds. See `.omc/wiki/oidc-federation.md` for the generalization. + +## 11. How this plugs into the three-layer abstraction + +Recap from `docs/spec/email-signing-backends.md`: + +| Layer | v0 (mock) | v0.1 (TEE + chain) | +|---|---|---| +| `TokenAuthority` | Mock backend holds static SES IAM keys + local AES key | TEE-derived **Ed25519 DKIM key** (per domain) + TEE-derived **ES256 OIDC-issuer key** (singleton) + temp SES creds minted per-request via `sts:AssumeRoleWithWebIdentity`; **no static cloud credentials sealed at rest** | +| `TokenBroker` | Mock backend (verifies session, checks grant, calls SES) | TEE (same, but chain-reads for grants; includes JWT minting for SES federation) | +| `GrantStore` | SQLite `email_grants` table | `pallet-email-grants` on chain | +| Audit sink | SQLite `ses_events` table | `pallet-email-audit` on chain | + +Same daemon code for both. A `config.toml` toggle picks the backend. + +## 12. v0 vs v0.1 specifics + +| Concern | v0 | v0.1 | +|---|---|---| +| Inbox table | SQLite row per inbox | Chain pallet entry per inbox | +| Message metadata | (none — daemon-side only; no server metadata store) | (same — no backend metadata) | +| Raw MIME blob | S3 (AWS account: test) | S3 (AWS account: production) | +| DKIM key | SES-managed (AWS_SES) | SES-managed by default; **TEE-derived Ed25519** at `dkim//v1` (BYODKIM) as a future option | +| OIDC-issuer key | n/a — mock uses static IAM keys | **TEE-derived ES256** at `oidc/issuer/v1`; published via HTTPS JWKS endpoint | +| SES IAM credentials | Static access key in `.env` on mock server | **Federated: no static creds.** Temp creds minted per-request via `sts:AssumeRoleWithWebIdentity`; creds live ≤1h | +| Audit events | SQLite table (operator-queryable) | On-chain extrinsics (publicly verifiable) | +| Scope grant | SQLite row | On-chain extrinsic | +| Grant revocation | `UPDATE row SET revoked_at=...` | On-chain revocation list (≤6s propagation) | + +## 13. Comparison with AgentMail's SaaS model (why we don't just use them) + +Both stacks run on SES under the hood. The divergence is at the *trust* layer: + +| | AgentMail (SaaS proxy) | Our SES backend (broker, not proxy) | +|---|---|---| +| Shape | SaaS proxy — they parse MIME, store threads, run webhooks on your behalf | Credential broker — mint SES/S3 creds; daemon does every operation itself | +| Compute cost scaling | O(user operation frequency) | O(user count) — flat per user | +| Signing identity for DKIM | AWS_SES-managed or BYODKIM | AWS_SES (default) → TEE-held BYODKIM (future, §15) | +| Who reads inbox contents | AgentMail operators (for support) | Our TEE only at mint time; daemon reads S3 directly afterward | +| Who sees audit events | AgentMail dashboards | Anyone with a Heima node (on-chain `CredsMinted` extrinsics) | +| Outage domain | AgentMail + AWS | AWS alone | +| Per-inbox credential | Long-lived scoped API key | 30-day AgentKeys session token + grant; ephemeral SES/S3 creds per mint | +| Revocation latency | Unspecified (their SaaS) | ≤6s via chain revocation list | +| Attribution | Per-API-key in their logs | Per-child-wallet on-chain, publicly verifiable | +| Drafts, labels, threading, allow-block lists, per-operation webhooks, `client_id` idempotency, reply/forward semantics | Server-side, first-class in their API | **Daemon-side** via MCP, or absent until needed. Not our backend's concern. | +| Multi-tenancy unit | `Pod` (infrastructure-level) | Per-user + per-child grants (policy-level) | + +AgentMail is a **good reference for the SES underpinnings** — but structurally they sit on the opposite side of the broker-not-proxy line. Their model accepts compute cost scaling with operation frequency in exchange for a richer server-side feature surface. Our model refuses that tradeoff: compute stays flat with user count, and per-operation concerns live daemon-side via MCP (where they don't pressure our backend and don't require us to see content). + +## 14. Concrete build plan (1-2 weeks) + +| Day | Milestone | +|---|---| +| 1 | Register `agentkeys-email.io`. SES domain verification. DNS: MX, DKIM (AWS_SES managed), SPF, DMARC. Request SES production access. | +| 2 | S3 bucket `agentkeys-mail` with per-user-prefix structure + `aws:PrincipalTag/agentkeys_user_wallet` bucket policy + lifecycle rules. SES receipt rule with `S3Action` writing raw MIME directly to the bucket (no Lambda). | +| 3 | IAM OIDC provider `oidc.agentkeys.dev` registered in our AWS account. IAM role `agentkeys-agent` with trust policy pinned to TEE enclave + requiring non-empty `agentkeys_user_wallet` claim. Role permissions for `s3:GetObject`/`s3:ListBucket` (per prefix) and `ses:SendRawEmail` (with `ses:FromAddress` condition). | +| 4 | TEE-side ES256 OIDC-issuer key derivation at `oidc/issuer/v1` + JWT minter. Thin HTTPS proxy at `oidc.agentkeys.dev` serving static discovery doc + JWKS (Let's Encrypt). | +| 5 | `SesEmailAuthority` Rust impl: implements `mint_read_creds(inbox) -> STS response` and `mint_send_creds(inbox) -> STS response` via `sts:AssumeRoleWithWebIdentity`. Emits `CredsMinted` audit extrinsic per call. | +| 6 | Daemon MCP tools: `email.list` (S3 list), `email.get` (S3 get + MIME parse locally), `email.send` (assemble MIME + SES SendRawEmail). Each unwraps into `mint` + direct AWS call. | +| 7 | Replace `provisioner-scripts/src/lib/email.ts` imapflow-based fetcher with an S3-direct fetcher that uses minted read creds. | +| 8 | End-to-end test: create agent → SES delivers to `@agentkeys-email.io` → daemon reads from S3 via minted creds → warmup-verify deliverability on Gmail + Outlook. | +| 9 | `EmailImpersonate` grant type in GrantStore. Wire master CLI's Touch-ID gate to this grant type. | +| 10 | Operator runbook, migration notes, and zone-file download endpoint for Stage 7 BYO domains. | + +Total: ~2 weeks. No Lambda, no DynamoDB, no server-side MIME parsing — the broker-not-proxy shape cuts roughly a week of work vs a SaaS-style build. + +## 15. Open questions / follow-ups + +1. **BYODKIM in v0.1 — how does the TEE register its Ed25519 DKIM pubkey with DNS?** Proposed: the TEE signs an attestation; the backend publishes the DKIM record to DNS (Route 53 API); the `Domain` row tracks `dkim_selector` pointing at the TEE-held key. No per-inbox rotation, but key rotation is a per-domain operation via path-version bump. + +2. **OIDC-issuer hostname.** `oidc.agentkeys.dev`? `tee.agentkeys.io/oidc/`? Needs to be a stable HTTPS URL we control with a public-CA cert (Let's Encrypt works, satisfies AWS's default cert validation). Suggested: `oidc.agentkeys.dev` as a dedicated subdomain never repurposed. + +3. **OIDC `sub` claim format.** Proposed: `enclave:::agent:`. Consumer trust policies (AWS role trust policy, GCP workload identity provider) condition on `sub` patterns to pin a specific enclave build. To finalize once the Heima TEE's attestation format is confirmed. + +4. **Multi-region.** First v0.1 cut is `us-east-1` only. Global deployment means either (a) SES inbound in multiple regions with per-region DNS routing, or (b) one global ingress + cross-region S3 replication. Daemon-side threading/label state goes wherever the daemon runs. + +5. **Abuse handling.** A child whose throwaway inbox starts receiving spam en masse should be disposable cheaply. Plan: cold-delete inboxes on explicit agent request; warm-delete (soft mark with retention) for everything else; S3 lifecycle prunes old messages. + +6. **Inbox TTL.** Should inboxes auto-expire if unused for N days? Default proposal: 90d soft-delete, 180d hard-delete. Master can override per-grant. + +7. **Agent-to-agent email.** Two AgentKeys-provisioned agents can email each other; both sides go through SES. Worth looking at a short-circuit (direct S3 → S3) in v0.2 if the volume justifies it. + +8. **Disaster recovery.** S3 is durable; chain state is self-healing; TEE master seed is the root of all derived keys. No stateful middle tier to back up — the broker-not-proxy shape eliminates the mid-write-crash recovery problem entirely. + +7. **User's personal Gmail integration.** Confirmed: we **do not** OAuth into users' Gmail. User's Gmail is a send-only target from our SES for identity + notifications + optional 2FA approvals. See `wiki/email-system.md` §usage-isolation. + +## 16. Cross-references + +- **`.omc/wiki/oidc-federation.md`** — the generalized OIDC-provider design that §10.5 references; explains how the same ES256 key federates into AWS, GCP, Azure, Snowflake, K8s +- `docs/spec/email-signing-backends.md` — the generalized trait (needs an SES section added; this spec supplies the content) +- `docs/spec/credential-backend-interface.md` — the parent trait this extends +- `docs/stage5-workspace-email-setup.md` — alternative: Google DWD operator runbook (preserved for enterprise deployments) +- `docs/manual-test-stage5.md` §1 — demo path (currently uses dedicated personal Gmail; will migrate to SES once built) +- `wiki/email-system.md` — high-level architecture wrap-up + usage isolation +- `wiki/blockchain-tee-architecture.md` §5 — stateless-TEE-plus-chain rationale +- `wiki/session-token.md` §1 — 30-day TTL policy +- Issue [#11](https://github.com/litentry/agentKeys/issues/11) — biometric gate +- AWS docs consulted for §10.5: [`IAM OIDC provider`](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html), [`AssumeRoleWithWebIdentity`](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRoleWithWebIdentity.html) — signing algorithm list (RSA + ECDSA only) verified verbatim diff --git a/docs/stage5-workspace-email-setup.md b/docs/stage5-workspace-email-setup.md new file mode 100644 index 0000000..7d3851e --- /dev/null +++ b/docs/stage5-workspace-email-setup.md @@ -0,0 +1,471 @@ +# Google Workspace email path — ADVANCED / BYO (deferred past Stage 7) + +> **⚠️ Deferred (2026-04-19).** This runbook is now an **advanced bring-your-own path**, not the default for Stage 5 or Stage 6. The Stage 6 default is hosted `xxxxx@agentkeys-email.io` on AgentKeys infrastructure — zero setup for non-developers. See `docs/spec/ses-email-architecture.md` and [[hosted-first]] (wiki) for the hosted default, and `docs/spec/plans/development-stages.md` for the revised stage roadmap. +> +> **Preserved here** for operators who specifically want to run AgentKeys email inside their existing Google Workspace (enterprise / regulated / data-residency reasons). The architecture is parallel to the hosted SES path — same three-layer abstraction, same Touch-ID gate, same chain audit — just a different cloud. + +**Purpose.** An alternative to the plus-addressed-personal-Gmail + app-password flow +described in `docs/manual-test-stage5.md` § 1. Use this path when you run a Google +Workspace domain and want the live OpenRouter provision demo (and every future +Stage 5 automation) to spin up throwaway identities inside that domain, read their +mail via the Gmail API, and tear them down at the end. No personal inbox involved, +no app password, no human-interactive consent per run. + +**Scope.** One-time super-admin setup + per-run workflow. Assumes you already own +a Google Workspace subscription for your domain (e.g. `wildmeta.ai`). + +> **Design rationale and backend comparison.** This doc is the *operator runbook* +> for the GCP-managed-key variant. For the *design decision* — why we have two +> email-signing backends (GCP-managed vs AgentKeys TEE), how they plug into the +> existing `CredentialBackend` trait, how the 30-day session-key policy and the +> Touch ID gate (#11) apply to both — see +> [`docs/spec/email-signing-backends.md`](spec/email-signing-backends.md). +> Short version: this GCP path is Stage 5's alternative backend; v0.1 migrates to +> the AgentKeys TEE path; both satisfy the same trait so the CLI and daemon code +> never change. + +--- + +## When to use which path + +| Your situation | Use | +|---|---| +| Personal `@gmail.com`, one-off manual test | `manual-test-stage5.md` § 1 (plus-addressing + app password) | +| Google Workspace domain, CI-friendly automation, recurring demos | **this doc** | +| Company Workspace, one-off manual test | either, but this doc once the 20-minute admin setup is done — it pays for itself after ~2 demo runs | + +The Workspace path has strictly better properties for automation: + +- **No personal inbox pollution** — throwaway users are isolated, visible in a + dedicated `/Automation` OU, and deleted after each run. +- **No human-held secrets in env vars** — the only secret is a GCP service-account + JSON key, which lives in a secret manager and is rotated by CI, not by a person. +- **Non-interactive** — zero OAuth consent screens per run; `agent@wildmeta.ai` + uses its admin role for user CRUD, the service account impersonates the + throwaway user for Gmail reads. +- **Auditable** — every user create/delete shows up in the Workspace admin audit + log; every Gmail read shows up in the service account's GCP audit log. + +--- + +## What `@gmail.com` *cannot* do (and why this matters) + +Google does **not** expose an API to create personal `@gmail.com` accounts. The +public signup flow is reCAPTCHA-gated and browser-only. This means: + +- Plus-addressed personal Gmail (`you+stage5test@gmail.com`) is the only "new + identity per run" option you get without a Workspace subscription — but every + OpenRouter confirmation still lands in your real inbox. +- With a Workspace domain, you can genuinely mint a fresh identity per demo + (`stage5test-20260418@yourdomain`) and delete it when done. + +If the long-term plan is "every Stage 5 provision run spins up a fresh identity", +this path is the only one that actually scales. + +--- + +## Architecture + +``` + ┌─────────────────────────┐ + │ agent@wildmeta.ai │ + │ (non-admin Workspace │ + │ user, assigned the │ + │ stage5a-provisioner │ + │ custom admin role) │ + └────────────┬────────────┘ + │ gws auth login (OAuth) + │ scope: admin.directory.user + ▼ + ┌────────────────────────────────────┐ + │ gws admin users.insert/.delete │ + │ (in /Automation OU only) │ + └────────────────────────────────────┘ + │ + │ creates + ▼ + stage5test-@wildmeta.ai + + ┌─────────────────────────┐ + │ stage5a-sa@... │ + │ GCP service account │ + │ + Domain-Wide Delegation│ + │ scopes: gmail.readonly, │ + │ gmail.modify │ + └────────────┬────────────┘ + │ impersonates any wildmeta.ai user + ▼ + ┌────────────────────────────────────┐ + │ gws gmail users.messages.list/get │ + │ (reads throwaway user's inbox for │ + │ the OpenRouter OTP) │ + └────────────────────────────────────┘ +``` + +Two identities do two jobs: + +1. **`agent@wildmeta.ai`** — a regular Workspace user with a narrow custom admin + role that lets it create and delete users in one OU only. Humans log into this + account; OAuth handles its scopes. +2. **`stage5a-sa@...` (service account)** — not a user. Used purely to impersonate + throwaway users over the Gmail API via Domain-Wide Delegation. Its JSON key + lives in a secret manager. + +Neither identity can substitute for the other: the service account cannot create +or delete users (that would need Admin SDK domain-wide delegation, which is a +deliberate decision not to grant), and `agent@wildmeta.ai` cannot read other +users' mail (no user ever can — only DWD service accounts can). + +--- + +## What "non-interactive" means — a concrete trace + +"Non-interactive" means **no human ever sees a browser, a consent screen, or a +password prompt after one-time setup**. The one-time setup spends some interactive +effort (super-admin clicks through A3 and B4; `agent@wildmeta.ai` runs `gws auth +login` once). After that, a demo run involves zero UI. + +### Example: reading the OpenRouter OTP from a throwaway user's inbox + +**Preconditions (all already true by the time the demo runs):** + +- `~/stage5a-sa.json` exists on disk (from B5) +- `stage5test-20260419@wildmeta.ai` exists (created 60 s earlier by this same run) +- That throwaway user has **never logged in** — it has no browser session, no + saved password on any device, and no human who knows its password +- `agent@wildmeta.ai`'s local `~/.config/gws/` already holds a refresh token + from the one-time `gws auth login` + +**The command the script runs:** + +```bash +GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE=~/stage5a-sa.json \ +GOOGLE_WORKSPACE_CLI_IMPERSONATE=stage5test-20260419@wildmeta.ai \ +gws gmail users.messages.list \ + --params '{"userId":"me","q":"from:noreply@openrouter.ai"}' +``` + +**What happens under the hood — zero human involvement, ~300 ms end to end:** + +``` +1. gws loads stage5a-sa.json → gets (client_email, private_key) +2. gws builds a JWT: + iss: stage5a-sa@wildmeta-agent-provisioner.iam.gserviceaccount.com + sub: stage5test-20260419@wildmeta.ai ← the impersonation target + scope: https://www.googleapis.com/auth/gmail.readonly + aud: https://oauth2.googleapis.com/token + iat: , exp: + signed: RSA-SHA256 with the SA's private key +3. POST https://oauth2.googleapis.com/token + grant_type = urn:ietf:params:oauth:grant-type:jwt-bearer + assertion = + + Google's token endpoint validates: + - SA exists + key signature is valid + - SA is authorized for DWD on wildmeta.ai with gmail.readonly ← from B4 + - sub user (stage5test-20260419@wildmeta.ai) exists in wildmeta.ai + + → returns an access_token scoped to "read stage5test-20260419's Gmail" + +4. GET https://gmail.googleapis.com/gmail/v1/users/me/messages?q=... + Authorization: Bearer + + → Gmail returns the message list as JSON. +``` + +**Zero browser windows opened. Zero consent screens. Zero password prompts.** +The access token is minted, used, and thrown away inside one shell command. + +### Contrast: what *would* be interactive + +If we didn't have the DWD service account, the equivalent Gmail read would need +one of these instead, all of which break automation: + +| Alternative | Why it's interactive | +|---|---| +| OAuth as the throwaway user | Needs a browser → "Sign in as stage5test-20260419" → consent screen → "Allow". Per run. There's no one at the keyboard for an ephemeral user. | +| Gmail IMAP with app password | App password generation requires the user to be logged in to myaccount.google.com (browser) and have 2FA enrolled (requires phone). Throwaway users can't do either. | +| Gmail IMAP with OAuth XOAUTH2 | Same OAuth consent dance as row 1, just for IMAP instead of the REST API. | +| `gws auth login` as the throwaway user | `gws auth login` opens a browser by design. Throwaway users aren't staffed. | + +DWD sidesteps all of them because the consent was granted **once**, at B4, by +the super-admin: *"this service account is allowed to wear any wildmeta.ai +user's Gmail hat with these scopes."* After that, per-user impersonation is a +signed JWT away — no user-side consent required, because the domain itself +already consented on their behalf. + +### What about `agent@wildmeta.ai`'s OAuth? + +`agent@wildmeta.ai` does go through a consent screen **exactly once** — the +first time it runs `gws auth login`. That writes a refresh token to +`~/.config/gws/`. From then on: + +- Every `gws admin users.insert` / `users.delete` call silently exchanges the + refresh token for a fresh access token against Google's token endpoint. +- No browser opens. +- Refresh tokens for Workspace OAuth apps are long-lived (don't expire unless + revoked or unused for 6 months). + +So: one browser visit at setup time, then a quiet lifetime of scripted calls. + +--- + +## One-time setup + +The full checklist. Steps A1–A3 grant user-CRUD privileges to `agent@wildmeta.ai`. +Steps B1–B5 create the service account and authorize it for Gmail impersonation. +Only **A3** and **B4** require super-admin; everything else is delegable. + +### A1. Create the `/Automation` OU + +Admin Console → Directory → Organizational units → Create child unit. + +- Name: `Automation` +- Parent: `/` + +This OU holds throwaway users and bounds the custom admin role below. + +### A2. Create the `stage5a-provisioner` custom admin role + +Admin Console → Account → Admin roles → Create new role. + +- Name: `stage5a-provisioner` +- Privileges: + - Users → **Create** ✓ + - Users → **Delete** ✓ + - Users → **Update** ✓ + - Organizational Units → **Read** ✓ + +Do not grant any other privilege. No super-admin, no Gmail admin, no security +center, no groups admin. + +### A3. Assign the role to `agent@wildmeta.ai`, scoped to `/Automation` + +*(super-admin action)* + +Admin Console → Account → Admin roles → `stage5a-provisioner` → Assign admin → +`agent@wildmeta.ai`. + +- Scope: **Only selected organizational units** → add `/Automation`. + +Do **not** leave the default "All organizational units" selected — that would +give `agent@wildmeta.ai` create/delete rights over real employees. + +### B1. Create the GCP project + +```bash +gcloud projects create wildmeta-agent-provisioner \ + --name="Agent Provisioner" +``` + +Grant `agent@wildmeta.ai` **Owner** on this project (or the tighter +**Service Account Token Creator** if you want to prevent it from editing project +settings): + +```bash +gcloud projects add-iam-policy-binding wildmeta-agent-provisioner \ + --member=user:agent@wildmeta.ai \ + --role=roles/owner +``` + +### B2. Enable the required APIs + +```bash +gcloud services enable \ + admin.googleapis.com \ + gmail.googleapis.com \ + --project=wildmeta-agent-provisioner +``` + +### B3. Create the service account + +```bash +gcloud iam service-accounts create stage5a-sa \ + --display-name="Stage 5a provisioner" \ + --project=wildmeta-agent-provisioner +``` + +Record the email: `stage5a-sa@wildmeta-agent-provisioner.iam.gserviceaccount.com`. + +### B4. Authorize domain-wide delegation + +*(super-admin action — and the **only** DWD step per service account; see +"Is B4 one-time?" below)* + +Admin Console → Security → Access and data control → API controls → Domain-wide +delegation → **Add new**. + +- **Client ID**: the service account's numeric unique ID. Find it with: + ```bash + gcloud iam service-accounts describe \ + stage5a-sa@wildmeta-agent-provisioner.iam.gserviceaccount.com \ + --format='value(oauth2ClientId)' + ``` +- **OAuth scopes** (comma-separated, one field): + ``` + https://www.googleapis.com/auth/gmail.readonly, + https://www.googleapis.com/auth/gmail.modify + ``` + +### B5. Mint the key and hand it off + +```bash +gcloud iam service-accounts keys create ~/stage5a-sa.json \ + --iam-account=stage5a-sa@wildmeta-agent-provisioner.iam.gserviceaccount.com +# created key [f720…] of type [json] as [~/stage5a-sa.json] +``` + +Put `~/stage5a-sa.json` in a secret manager. Never commit. Share with the +principal(s) that will run the demo — GCP Secret Manager with +`roles/secretmanager.secretAccessor` granted to `agent@wildmeta.ai` is the +cleanest channel, 1Password shared-vault works for manual flows. + +--- + +## Is B4 one-time? + +Yes — and this is the property that makes the whole scheme worth the setup cost. + +| Action | Re-do B4? | +|---|---| +| Create `stage5test-@wildmeta.ai` and read its Gmail | ❌ No | +| Create any number of new users (`b4-s@wildmeta.ai`, `ci-bot@wildmeta.ai`, …) | ❌ No | +| Delete and recreate a user with the same email | ❌ No | +| Rotate the **service-account key** (JSON file) | ❌ No — DWD binds to the SA's client ID, not its key material | +| Add a new scope (e.g. `gmail.compose`) | ⚠️ Edit the existing DWD entry and append the scope — super-admin action | +| Create a **second** service account | ✅ Yes — DWD is per-SA | +| Move to a different Workspace domain | ✅ Yes — DWD is per-domain | + +Day-to-day: new throwaway users cost one `gws admin users.insert` call and zero +admin involvement. + +--- + +## Per-demo-run workflow + +Assumes one-time setup is complete and `~/stage5a-sa.json` is readable by the +principal running the demo. + +### Log in to `gws` as the agent user (once per session) + +```bash +gws auth login -s admin.directory.user +``` + +### Mint a throwaway user + +```bash +EMAIL="stage5test-$(date +%s)@wildmeta.ai" +PASSWORD="$(openssl rand -base64 24)" + +gws admin users.insert --json "{ + \"primaryEmail\": \"$EMAIL\", + \"name\": {\"givenName\": \"Stage5\", \"familyName\": \"Test\"}, + \"password\": \"$PASSWORD\", + \"changePasswordAtNextLogin\": false, + \"orgUnitPath\": \"/Automation\" +}" +``` + +Workspace takes 30–60 seconds to replicate the new user across services (Gmail, +Directory); give it a sleep before the demo hits the inbox: + +```bash +sleep 60 +``` + +### Run the demo pointed at this user + +```bash +export AGENTKEYS_EMAIL_USER="$EMAIL" +export GOOGLE_WORKSPACE_CLI_CREDENTIALS_FILE="$HOME/stage5a-sa.json" +export GOOGLE_WORKSPACE_CLI_IMPERSONATE="$EMAIL" + +agentkeys provision openrouter +``` + +The Gmail fetcher uses the SA key + `IMPERSONATE` to read the throwaway user's +inbox over the Gmail API — no password, no IMAP, no app password. + +> **Code change required.** The current `provisioner-scripts/src/lib/email.ts` +> uses `imapflow` against `imap.gmail.com:993` — it has no Gmail-API backend. +> To make this path work end-to-end, replace the IMAP fetcher with a Gmail-API +> fetcher that reads from `gws gmail users.messages.list/get` (or the +> `googleapis` npm package directly) using the service-account credentials above. +> Tracked as a follow-up; the setup in this doc is a prerequisite for that +> change but stands on its own as the documented admin path. + +### Teardown + +```bash +gws admin users.delete --params "{\"userKey\":\"$EMAIL\"}" +``` + +--- + +## Secret management + +| Secret | Where it lives | Who can read | +|---|---|---| +| Service account JSON key | GCP Secret Manager (preferred) or 1Password shared vault | `agent@wildmeta.ai` + CI runner | +| `agent@wildmeta.ai` OAuth tokens | `~/.config/gws/` on whichever machine ran `gws auth login` | that machine's local user | +| Throwaway user password | Emitted to stdout at creation, discarded — we never log in interactively as the throwaway user, so it doesn't need to be stored | — | + +Do not check `stage5a-sa.json` into git, not even encrypted. `agentkeys` already +has `.gitignore` coverage for `*.json` under config paths; extend it if your +working copy of the key lives somewhere surprising. + +--- + +## Rotation + +Quarterly (or after any key exposure): + +```bash +# Mint a new key +gcloud iam service-accounts keys create ~/stage5a-sa-new.json \ + --iam-account=stage5a-sa@wildmeta-agent-provisioner.iam.gserviceaccount.com + +# Update the secret manager / CI runner to use the new file + +# List existing keys to confirm both are active +gcloud iam service-accounts keys list \ + --iam-account=stage5a-sa@wildmeta-agent-provisioner.iam.gserviceaccount.com + +# After confirming the new key works in one demo run, delete the old key by ID +gcloud iam service-accounts keys delete \ + --iam-account=stage5a-sa@wildmeta-agent-provisioner.iam.gserviceaccount.com +``` + +No admin console involvement — DWD is bound to the SA's client ID, not to the +specific key file, so rotating keys is invisible to the Workspace side. + +--- + +## Teardown (full retire-the-setup) + +If you decide this path was a mistake and want to back out cleanly: + +1. `gcloud iam service-accounts delete stage5a-sa@…` — removes the SA and + implicitly invalidates all its keys. +2. Admin Console → Domain-wide delegation → remove the `stage5a-sa` entry. +3. Admin Console → Admin roles → `stage5a-provisioner` → Delete role. +4. Admin Console → Directory → Organizational units → delete any remaining + users in `/Automation`, then delete the OU itself. +5. `gcloud projects delete wildmeta-agent-provisioner` — deletes the GCP project. + +After step 1 the blast radius is already fully contained (no live credential can +reach Workspace). Steps 2–5 are cleanup. + +--- + +## Cross-references + +- `docs/manual-test-stage5.md` § 1 — the plus-addressing alternative (keep as-is; + this doc is the alternative for Workspace users) +- `provisioner-scripts/src/lib/email.ts` — currently IMAP-only; needs a Gmail-API + backend for this path to be end-to-end usable +- `crates/agentkeys-provisioner/` — the Rust orchestrator is unchanged; only the + TS email fetcher needs the new backend +- `TODOS.md` — OpenRouter ToS compliance check still blocks the *first* live run + on either path diff --git a/harness/stage-5a-done.sh b/harness/stage-5a-done.sh index 0632235..b6e55e7 100755 --- a/harness/stage-5a-done.sh +++ b/harness/stage-5a-done.sh @@ -1,20 +1,87 @@ #!/usr/bin/env bash +# Stage 5a completion gate — runs every non-live check in one shot. +# +# What this covers: +# 1. Rust unit tests across the four Stage 5a crates +# 2. TS install + unit tests (provisioner-scripts) +# 3. Phantom-key chaos test in isolation (silent-corrupt defense) +# 4. Pattern grep guard (patterns must have zero service strings) +# 5. TS typecheck +# 6. Clippy on Stage 5a crates, warnings treated as errors +# 7. MCP `tools/list` advertises agentkeys.provision +# 8. Observability — orchestrator emits the three core provision_metric names +# +# What this does NOT cover (by design): +# - The live OpenRouter signup demo. See §1 of docs/manual-test-stage5.md. +# +# Exit 0 = Stage 5a is intact. Non-zero = stage broken, do not merge. set -euo pipefail cd "$(git rev-parse --show-toplevel)" -echo "=== Stage 5a: Rust tests ===" +GREEN='\033[0;32m' +RED='\033[0;31m' +BOLD='\033[1m' +NC='\033[0m' +banner() { printf "\n${BOLD}=== %s ===${NC}\n" "$1"; } +ok() { printf "${GREEN}✓${NC} %s\n" "$1"; } +fail() { printf "${RED}✗${NC} %s\n" "$1" >&2; exit 1; } + +banner "1/8 Rust tests (types, provisioner, mcp, cli)" cargo test -p agentkeys-types -p agentkeys-provisioner -p agentkeys-mcp -p agentkeys-cli +ok "Rust tests passed" -echo "=== Stage 5a: TS tests ===" +banner "2/8 TS install + unit tests" +npm install --prefix provisioner-scripts --silent npm test --prefix provisioner-scripts +ok "TS tests passed" + +banner "3/8 Phantom-key chaos test (isolated)" +( cd provisioner-scripts && npx vitest run tests/scrapers/openrouter.phantom.test.ts ) +ok "phantom chaos held" + +banner "4/8 Pattern grep guard — zero service strings" +if grep -riE "openrouter|brave|jina|groq|anthropic|gemini|twitter|instagram" \ + provisioner-scripts/src/patterns/ 2>/dev/null; then + fail "service-specific string leaked into provisioner-scripts/src/patterns/" +fi +ok "grep guard empty" + +banner "5/8 TS typecheck" +npm run typecheck --prefix provisioner-scripts +ok "typecheck clean" + +banner "6/8 Clippy (Stage 5a crates, warnings as errors, --no-deps)" +# --no-deps so pre-existing lints in out-of-scope crates (e.g. agentkeys-core) +# don't fail this gate. Only Stage 5a crates are linted under -D warnings. +cargo clippy --no-deps \ + -p agentkeys-types -p agentkeys-provisioner -p agentkeys-mcp -p agentkeys-cli \ + --all-targets -- -D warnings +ok "clippy clean" -echo "=== Stage 5a: grep guard — patterns have zero service strings ===" -if grep -riE "openrouter|brave|jina|groq|anthropic|gemini|twitter|instagram" provisioner-scripts/src/patterns/ 2>/dev/null; then - echo "FAIL: service-specific string found in patterns/" >&2 - exit 1 +banner "7/8 MCP tools/list — agentkeys.provision registered" +cargo build --release -q -p agentkeys-mock-server -p agentkeys-daemon +./target/release/agentkeys-mock-server --port 8090 >/tmp/stage5a-mock.log 2>&1 & +MOCK_PID=$! +trap 'kill $MOCK_PID 2>/dev/null || true' EXIT +sleep 1 +MCP_RESPONSE=$(echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | \ + AGENTKEYS_BACKEND=http://localhost:8090 \ + AGENTKEYS_SESSION=test-token \ + ./target/release/agentkeys-daemon --stdio 2>/dev/null | head -1) +if ! echo "$MCP_RESPONSE" | grep -q '"name":"agentkeys.provision"'; then + fail "agentkeys.provision missing from MCP tools/list" fi +ok "agentkeys.provision registered" +kill $MOCK_PID 2>/dev/null || true +trap - EXIT -echo "=== Stage 5a: phantom chaos test isolated ===" -cd provisioner-scripts && npx vitest run tests/scrapers/openrouter.phantom.test.ts && cd - +banner "8/8 Observability — three core provision_metric names emitted" +METRICS=$(cargo test -p agentkeys-provisioner -- stores_credential --nocapture 2>&1 | \ + grep "provision_metric" || true) +for name in tier_used duration_seconds verification_result; do + echo "$METRICS" | grep -q "\"name\":\"$name\"" || \ + fail "missing provision_metric name=$name" +done +ok "tier_used, duration_seconds, verification_result all emitted" -echo "STAGE 5a PASSED" +printf "\n${GREEN}${BOLD}STAGE 5a PASSED${NC}\n" diff --git a/wiki/Home.md b/wiki/Home.md index 5a335a6..26559db 100644 --- a/wiki/Home.md +++ b/wiki/Home.md @@ -1,30 +1,78 @@ # AgentKeys — Wiki -> **This wiki is auto-generated from the `wiki/` folder in the main repo.** -> Edit the source files there, not through the web UI — direct edits will be -> overwritten on the next push to `main`. The canonical source is -> [`wiki/` in `litentry/agentKeys`](https://github.com/litentry/agentKeys/tree/main/wiki). +> **This wiki is auto-generated from the `wiki/` folder in the main repo.** Edit the source files there, not through the web UI — direct edits will be overwritten on the next push to `main`. The canonical source is [`wiki/` in `litentry/agentKeys`](https://github.com/litentry/agentKeys/tree/main/wiki). -AgentKeys is a credential custody service: a TEE-backed vault that issues long-lived bearer tokens for per-agent credential access, with on-chain audit. +AgentKeys is a credential custody service: a TEE-backed vault that issues long-lived bearer tokens for per-agent credential access, with on-chain audit. **We mint ephemeral credentials; daemons use them to call remote services directly.** Credential broker, not operation proxy. -## Pages +--- -- [Home](Home) — you are here. -- [Blockchain TEE Architecture](blockchain-tee-architecture) — how AgentKeys rides Heima TEE for signing + credential custody. -- [Credential Usage](credential-usage) — lifecycle of a credential from store → run/read → revoke. -- [Data Classification](data-classification) — what each data class is, where it lives, how long it stays. -- [Key Security](key-security) — TEE keys, master session key (MSK), storage tiers, threat model. -- [Serve and Audit](serve-and-audit) — Pattern-4 per-read audit flow. -- [Session Token](session-token) — 30-day bearer credential — what it is, how it's protected. +## The four rules -## Related documents in the main repo +Every spec and every service on top of AgentKeys preserves these four invariants (details in [Blockchain TEE Architecture §6](blockchain-tee-architecture#6-summary-the-four-rules)): -- [`README.md`](https://github.com/litentry/agentKeys/blob/main/README.md) -- [`docs/spec/plans/development-stages.md`](https://github.com/litentry/agentKeys/blob/main/docs/spec/plans/development-stages.md) — 8-stage build plan. -- [`docs/spec/architecture.md`](https://github.com/litentry/agentKeys/blob/main/docs/spec/architecture.md) -- [`docs/manual-test-stage4.md`](https://github.com/litentry/agentKeys/blob/main/docs/manual-test-stage4.md) — human-reviewable end-to-end walkthrough. -- [`docs/contradictions.md`](https://github.com/litentry/agentKeys/blob/main/docs/contradictions.md) — living tracker of cross-doc contradictions and their resolutions. -- [`docs/field-name-translation.md`](https://github.com/litentry/agentKeys/blob/main/docs/field-name-translation.md) — "translate at the layer closest to the human" design note. +1. **Chain stores everything persistent** — single source of truth. +2. **TEE holds all private keys and does all computation** — no key leaves the enclave. +3. **Clients hold only a JWT, not private keys** — bearer tokens, short blast radius. +4. **AgentKeys brokers credentials, not operations** — daemons call remote services directly; our compute scales with user count, not operation frequency. + +--- + +## Wiki tree + +### Foundations (canonical, published wiki) + +- **[Blockchain TEE Architecture](blockchain-tee-architecture)** — chain + TEE + clients; the four rules in §6 +- **[Session Token](session-token)** — 30-day JWT bearer; issuance, storage, revocation +- **[Key Security](key-security)** — TEE keys, master session key, storage tiers, threat model +- **[Data Classification](data-classification)** — data classes, where each lives, retention policy + +### Credential lifecycle (canonical, published wiki) + +- **[Credential Usage](credential-usage)** — store → run/read → revoke +- **[Serve and Audit](serve-and-audit)** — Pattern-4 per-read audit flow + +### Service architectures (project-local scratchpad, `.omc/wiki/` — not published) + +Design docs for specific services built on top of the foundations. Short high-level names, kept project-local because they evolve fast: + +- **overview** — tree + reading order for the service-architecture pages +- **hosted-first** — Stage 6 default (`xyz@agentkeys-email.io` on our infra) vs bring-your-own (advanced) +- **tag-based-access** — `agentkeys_user_wallet` JWT claim → AWS PrincipalTag → per-user isolation on shared buckets +- **oidc-federation** — TEE as a conforming OIDC issuer; one ES256 key federates into AWS / GCP / Azure / Ali / K8s +- **email-system** — Stage 6 email architecture on AWS SES; broker-not-proxy; zero per-operation compute +- **knowledge-storage** — deferred backend decision between GitHub / AWS S3 / Google Drive / Ali Cloud OSS, by user segment + +--- + +## Reading order by role + +| Role | Start here | Then | Then | +|---|---|---|---| +| New engineer | [Blockchain TEE Architecture](blockchain-tee-architecture) | [Session Token](session-token) | `.omc/wiki/email-system.md` | +| Product / roadmap | This page, §Wiki tree | `docs/spec/plans/development-stages.md` | `.omc/wiki/hosted-first.md` | +| Operator / infra | [Key Security](key-security), [Serve and Audit](serve-and-audit) | `docs/spec/ses-email-architecture.md` | `.omc/wiki/oidc-federation.md` §Consumer-registration recipes | +| Security reviewer | [Blockchain TEE Architecture](blockchain-tee-architecture) §6 (four rules) | [Data Classification](data-classification) | `.omc/wiki/tag-based-access.md` §Security and attacker surface | + +--- + +## Specs (outside the wiki) + +Canonical design records live in `docs/spec/`: + +- **`docs/spec/plans/development-stages.md`** — build plan. Stages 0–5 shipped; **Stage 6 = federated own email**; **Stage 7 = generalized OIDC provider**; remaining stages postponed. +- **`docs/spec/ses-email-architecture.md`** — Stage 6 SES email spec. +- **`docs/spec/email-signing-backends.md`** — generalized backend comparison (SES / DWD / SaaS). +- **`docs/spec/credential-backend-interface.md`** — the `CredentialBackend` trait. +- **`docs/spec/architecture.md`** — 13-component system architecture. + +Demo / operator docs: + +- **`docs/manual-test-stage4.md`** — Stage 4 end-to-end walkthrough +- **`docs/manual-test-stage5.md`** — Stage 5 demo (dedicated-Gmail quick path) +- **`docs/stage5-workspace-email-setup.md`** — advanced BYO Workspace runbook (deferred past Stage 7) +- **`docs/contradictions.md`** — living tracker of cross-doc contradictions + +--- ## How to edit this wiki @@ -33,6 +81,6 @@ AgentKeys is a credential custody service: a TEE-backed vault that issues long-l 3. Merge to `main`. 4. The `Publish wiki` GitHub Action mirrors `wiki/**` to the wiki repo. -A maintainer can also trigger the mirror manually from the repo's Actions tab — the workflow exposes `workflow_dispatch` for re-runs against an unchanged `wiki/` tree. +A maintainer can also trigger the mirror manually from the repo's Actions tab — the workflow exposes `workflow_dispatch`. See `.github/workflows/publish-wiki.yml` for the implementation. diff --git a/wiki/blockchain-tee-architecture.md b/wiki/blockchain-tee-architecture.md index b2c8600..f1965e3 100644 --- a/wiki/blockchain-tee-architecture.md +++ b/wiki/blockchain-tee-architecture.md @@ -515,17 +515,19 @@ This gets the per-read latency down to pure-TEE-backend levels for hot-path read --- -## 6. Summary: the three rules +## 6. Summary: the four rules +> **Updated 2026-04-19** to add rule #4 (credential broker, not operation proxy) after the email, knowledge-base, and OIDC-federation design rounds. > **Corrected 2026-04-12** after verifying against the actual Heima source code (`litentry/heima` on GitHub). The previous version of rule #3 stated "clients hold only their own private keys" — this was wrong. Clients hold JWTs (bearer tokens), not private keys. All private keys live inside the TEE. -The entire AgentKeys v0.1 architecture follows three rules: +The entire AgentKeys v0.1 architecture follows four rules: 1. **Chain stores everything persistent.** Account records, credential blobs (encrypted), pair requests, approvals, audit events, wallet balances, revocation lists. The chain is the single source of truth. If the TEE restarts, if the daemon crashes, if the user switches devices — chain state is always there. 2. **TEE holds all private keys and does all computation.** The TEE holds the shielding key, the RSA JWT signing key, and per-user custodial wallet keys (per `pallet-bitacross` pattern). These are generated independently (not derived from a single master seed) and sealed inside the enclave. The TEE decrypts credential blobs, issues and verifies JWTs, signs on-chain extrinsics using the user's wallet key, and enforces scope + rate limits. No private key ever leaves the TEE. 3. **Clients hold only a JWT (bearer token), not private keys.** The master CLI and agent daemon each hold a JWT string issued by the TEE upon authentication. The JWT is a signed bearer token (`AuthTokenClaims { sub, typ, exp, aud }`), not a private key. However, it IS still a bearer credential — anyone with the string can impersonate the user until it expires. **OS keychain is the recommended default** for the master CLI (provides app-level ACL against malware-as-same-user). Plain file (mode 0600) is an acceptable fallback for daemon/sandbox/CI where keychain isn't available. If the JWT leaks, the blast radius is bounded by its expiration time (~~24h) and the on-chain revocation list (~~6s). If the JWT expires, the client re-authenticates and gets a new one. +4. **AgentKeys brokers credentials, not operations.** Our infrastructure mints ephemeral credentials (JWTs, temp cloud creds, decrypted API keys) and emits audit extrinsics at mint time. The daemon then calls remote services (SES, S3, GitHub, Notion, LLM APIs, …) **directly** using those credentials — we never proxy per-operation reads/writes. Compute cost on our side scales with user count, not with operation frequency. Per-user isolation on shared cloud resources is enforced by the cloud itself via PrincipalTag / session-tag conditions derived from JWT claims (see `.omc/wiki/tag-based-access.md`). This rule is why the email, knowledge-base, and OIDC-federation designs never build proxies, SaaS feature surfaces, or per-operation compute on our side. -Every flow in the system (credential store, credential read, pairing, revocation, audit query) is an instance of: +Every flow in the system (credential store, credential read, pairing, revocation, audit query, email read/send, knowledge-base ops) is an instance of: ``` client sends request + JWT to TEE From 4dccdddd1b84be782f5327cfce28df249e7746b3 Mon Sep 17 00:00:00 2001 From: wildmeta-agent Date: Sun, 19 Apr 2026 22:55:02 +0800 Subject: [PATCH 2/5] docs(wiki): flip key model to HDKD + add Heima-gap spec MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Wiki now consistently describes the desired architecture: - blockchain-tee-architecture.md §1 TEE-internal-keys table lists a sealed master seed at the top and marks every long-lived subkey (shielding, issuer JWT, per-user wallet, per-domain DKIM) as SLIP-0010 HDKD-derived. - Rule #2 in §6 re-anchored on the same model. - session-token.md JWT-signing-key section follows the same story and adds the ES256 path needed for AWS OIDC in Stage 7. New docs/spec/heima-gaps-vs-desired-architecture.md captures the six deltas between upstream litentry/heima today and what the wiki describes: key-derivation model, OIDC provider, BYODKIM, email pallets, session-tag propagation, attested pubkey publication. Each gap has current/desired/impact/migration-path sections and flags whether it blocks Stage 6, Stage 7, or neither. --- .../heima-gaps-vs-desired-architecture.md | 200 ++++++++++++++++++ wiki/Home.md | 1 + wiki/blockchain-tee-architecture.md | 22 +- wiki/session-token.md | 9 +- 4 files changed, 217 insertions(+), 15 deletions(-) create mode 100644 docs/spec/heima-gaps-vs-desired-architecture.md diff --git a/docs/spec/heima-gaps-vs-desired-architecture.md b/docs/spec/heima-gaps-vs-desired-architecture.md new file mode 100644 index 0000000..5347d19 --- /dev/null +++ b/docs/spec/heima-gaps-vs-desired-architecture.md @@ -0,0 +1,200 @@ +# Heima Gaps vs. AgentKeys Desired Architecture + +**Status:** living document (gap-tracking). +**Owner:** blockchain team. +**Last updated:** 2026-04-19. + +## 1. Why this doc exists + +The [wiki](../../wiki/) always describes the **desired** architecture — the shape AgentKeys v0.1 is targeting, not the shape the upstream `litentry/heima` chain ships today. That's the right default for a design wiki: specs should describe where we're going, not where we happened to be when they were written. + +This document is the other half. Every delta between: + +- **desired**: what the AgentKeys wiki + spec docs describe, and +- **current**: what the upstream `litentry/heima` repo actually implements today, + +gets one section below. Each section has a **Current**, **Desired**, **Impact**, and **Migration path**. Gaps are closed by (a) patches landing upstream, (b) AgentKeys shipping a fork with the delta, or (c) the desired spec being revised downward — we mark which resolution a gap is taking as it lands. + +Related docs: + +- [`wiki/blockchain-tee-architecture.md`](../../wiki/blockchain-tee-architecture.md) — canonical desired architecture (four rules). +- [`wiki/key-security.md`](../../wiki/key-security.md) — TEE key security model. +- [`docs/spec/plans/development-stages.md`](./plans/development-stages.md) — stage roadmap; this gap list is the critical path for Stage 6 and Stage 7. +- [`docs/spec/ses-email-architecture.md`](./ses-email-architecture.md) — Stage 6 email spec; depends on gaps §2, §3, §5. + +--- + +## 2. Gap: key derivation model — independent generation vs. SLIP-0010 HDKD from a sealed master seed + +### Current (upstream `litentry/heima`) + +Every long-lived TEE key is generated independently: + +- **Shielding keypair** — generated at enclave startup from a hardware RNG, sealed in its own slot. +- **RSA JWT signing key** — generated via `RsaPrivateKey::new(&mut rng, 2048)` and persisted as a PKCS#1 DER file, on its own. +- **Per-user custodial wallet keys** — generated per-account via `pallet-bitacross` at account-creation time, each stored in its own sealed record keyed by `(chain, omni_account)`. + +There is **no master seed**. OmniAccount *addresses* are deterministically derived via `OmniAccountConverter::convert(&identity, &client_id)`, but the underlying *private keys* are not. + +### Desired (AgentKeys wiki + specs) + +A single 256-bit master seed is generated once, at first enclave provisioning, from the hardware RNG and sealed. Every other long-lived key is deterministically derived from that seed via SLIP-0010 HDKD (BIP-32-style): + +| Subkey | Derivation path | Alg | Consumer | +| ------------------------------- | --------------------------------------------------- | ----------- | -------------------------------------- | +| Shielding keypair | `shielding/v1` | Curve25519 | Credential-blob encrypt/decrypt | +| Issuer JWT signing key | `issuer/jwt/v1` | RSA-2048 *or* ES256 | Session-token minting + OIDC issuer (Stage 7) | +| Per-user wallet key | `wallet///v1` | secp256k1 / ed25519 (per chain) | Custodial wallet signing | +| Per-domain DKIM key | `dkim//v1` | Ed25519 | Outbound mail signing (Stage 6) | + +### Impact + +- **New services multiply storage today.** Each new key surface (DKIM per domain, OIDC per audience, any future K8s / cloud-IdP signers) would have to add a new sealed-storage slot and its own key-lifecycle code. With HDKD, new surfaces are new derivation paths — no new storage, no new lifecycle. +- **Disaster recovery is painful.** If a sealed slot is lost or corrupted today, the affected key is gone and every downstream record has to be re-issued. With HDKD, a reprovisioned enclave that has the sealed master seed reconstructs every subkey deterministically. +- **Auditability is weaker.** With independent keys, the relationship between the root trust anchor and each operational key has to be tracked out of band. With HDKD, the root attestation + the derivation path is the proof. + +### Migration path + +**Option A — upstream patch (preferred).** Introduce a `TeeMasterSeed` sealed record; add a `derive_subkey(path)` helper in the TEE worker; port the shielding keypair, JWT signing key, and `pallet-bitacross` wallet key derivation to call through it. Existing independently-generated keys are grandfathered: the master seed is only consulted for newly-derived paths (DKIM, OIDC), so the migration is additive. + +**Option B — AgentKeys fork.** If upstream is slow, keep the master-seed addition in our fork. This is the default for Stage 6 and Stage 7 if §2 + §5 haven't landed upstream by then. + +**Option C — downgrade the spec.** We could drop HDKD from the desired architecture and live with independent keys forever. We're explicitly **not** choosing this — broker-not-proxy amplifies the key-surface problem (every new federated target is another key) and HDKD is the cheapest answer. + +--- + +## 3. Gap: TEE does not expose an OIDC provider + +### Current + +The TEE issues JWTs for internal AgentKeys authentication, but: + +- No `/.well-known/openid-configuration` discovery document is published. +- No JWKS endpoint is published. +- The `iss` claim on existing JWTs is not a resolvable HTTPS URL. +- The signing alg is RSA-2048 only; there is no ES256 path. + +### Desired (Stage 7 — Generalized OIDC Provider) + +The TEE's issuer signing key (derivation path `issuer/jwt/v1`, alg **ES256**) doubles as a conforming OpenID Connect issuer: + +- `iss = https://oidc.agentkeys.io` (or per-tenant subdomain). +- `/.well-known/openid-configuration` served from a plain HTTPS endpoint (static file, no compute; just publishes the issuer URL, JWKS URL, supported algs). +- `/.well-known/jwks.json` serves the ES256 public key as a JWK. +- JWT claims include the user's OmniAccount wallet as a custom claim (`agentkeys_user_wallet`) so relying parties can gate access via `sts:TagSession` / `aws:PrincipalTag` conditions (see [`.omc/wiki/tag-based-access.md`](../../.omc/wiki/tag-based-access.md)). + +### Impact + +Without this, AWS / GCP / Azure / Ali Cloud / K8s cannot federate identity to the TEE. This is the single gating change for **every** broker-not-proxy integration the wiki describes: S3 knowledge base, SES inbound S3Action, cross-account AWS calls, everything. Stage 7 cannot ship without it. + +### Migration path + +- **Issuer key alg:** add ES256 derivation alongside RSA-2048 (AWS IAM OIDC accepts RS256 and ES256, but not Ed25519 — this was verified directly from the AWS docs). +- **Discovery document + JWKS:** static S3/CloudFront-served JSON; no TEE compute required for serving (compute is key-derivation-on-demand for JWKS rotation, which is rare). +- **Publish pipeline:** the TEE computes its own JWK from the derived public key; we mirror it to the discovery URL on rotation. + +Depends on §2 (HDKD) landing first, because the ES256 key is a subkey of the master seed. + +--- + +## 4. Gap: no BYODKIM (TEE-held per-domain DKIM keys) + +### Current + +Outbound mail is not a Heima concern today — there is no DKIM signing anywhere in the TEE. + +### Desired (Stage 6 — Federated Own Email) + +- Per-domain Ed25519 DKIM signing keys (RFC 8463) derived at path `dkim//v1`. +- Public key published as a DNS TXT record at `._domainkey.` (our hosted `agentkeys-email.io` zone for default users; the user's DNS for BYO domains). +- Outbound mail is DKIM-signed inside the TEE on the send path, then the DKIM-signed raw MIME is sent via AWS SES `SendRawEmail` (broker-not-proxy: SES is the delivery channel; signing is ours). + +### Impact + +Without BYODKIM in the TEE, the send path either (a) delegates signing to SES (AWS sees the plaintext content and controls the signing key — violates Rule #2) or (b) drops DKIM entirely (deliverability tanks and domain reputation is unclaimed). + +### Migration path + +Trivial once §2 lands: add the `dkim//v1` derivation path, publish the pubkey via DNS at domain-provisioning time, wire a `sign_outbound_mail(mime, domain)` TEE entrypoint. + +Depends on §2 (HDKD). + +--- + +## 5. Gap: on-chain email pallets (`pallet-email-grants`, `pallet-email-audit`) do not exist + +### Current + +`pallet-bitacross` exists (for custodial wallets) and `pallet-secrets-vault` exists (for encrypted credential blobs), but there is no pallet for: + +- **Email grants** — who is allowed to send from `` or read from `` (the email equivalent of a credential scope). +- **Email audit** — per-operation append-only log of `send`, `read`, `attach-S3-key`, etc., keyed by `omni_account`. + +### Desired + +- `pallet-email-grants` — on-chain store of `(domain, omni_account, capability)` tuples; TEE consults it on every send/read before touching SES. Revocation is an extrinsic; enforcement is ≤ 1 block (same as credential revocation). +- `pallet-email-audit` — append-only audit log, identical shape to the credential audit log we already have. Every TEE-brokered email operation emits one extrinsic. + +### Impact + +Without these pallets, there is no on-chain source of truth for email authorization — violates Rule #1. We'd be running email with an in-TEE table, which (a) breaks the "chain is truth" invariant and (b) loses the public-verifiability property the credential pipeline depends on. + +### Migration path + +The shape of both pallets is mechanically similar to the existing credential-grants + credential-audit pallets, so this is mostly forking their code and renaming. Stage 6 blocker. No dependency on §2 or §3 — can land in parallel. + +--- + +## 6. Gap: no session-tag propagation on TEE-minted JWTs into STS + +### Current + +The TEE mints JWTs with standard claims (`sub`, `typ`, `exp`, `aud`). There is no infrastructure for: + +- Adding a custom `agentkeys_user_wallet` claim to the JWT (trivial — just claim encoding). +- Exposing that claim to AWS STS via `sts:TagSession` so AWS IAM can evaluate `aws:PrincipalTag/agentkeys_user_wallet` in bucket policies / KMS policies. + +### Desired (Stage 6 + Stage 7) + +The JWT the TEE mints carries `agentkeys_user_wallet = ` as a claim. When a client does `sts:AssumeRoleWithWebIdentity` with that JWT, STS extracts the claim and attaches it as a session tag. Downstream bucket policies and KMS policies pattern-match on `aws:PrincipalTag/agentkeys_user_wallet = ${aws:SourceIdentity}` or similar, giving us per-user isolation on shared cloud resources **without** per-user IAM roles. + +See [`.omc/wiki/tag-based-access.md`](../../.omc/wiki/tag-based-access.md) for the full pattern. + +### Impact + +This is the mechanism that makes broker-not-proxy work on shared AWS resources. Without it, either (a) we provision one IAM role per user (doesn't scale — IAM role quotas + management overhead) or (b) we proxy every call through our infra (violates Rule #4). + +### Migration path + +- **TEE side:** extend the JWT claim set. Minor change. +- **AWS side:** role trust policies declare `sts:TagSession` is allowed; bucket/KMS policies reference `aws:PrincipalTag/agentkeys_user_wallet`. This is all AWS configuration, not Heima source. +- **OIDC side:** depends on §3 (OIDC provider) landing first so AWS IAM can trust the issuer at all. + +--- + +## 7. Gap: no enclave-attested publication of issuer + shielding pubkeys to a public trust document + +### Current + +`register_enclave()` publishes pubkeys on chain, which is good for clients who can query the chain. It does **not** publish them in a form that arbitrary web-service trust stores (AWS IAM OIDC thumbprint list, GCP Workload Identity Federation issuer config, K8s OIDC discovery) can consume. + +### Desired + +- Per issuer pubkey, a JWK is published at a stable HTTPS URL. +- The JWK's signing attestation (DCAP quote) is either inline in the discovery document or served from a neighbouring URL, so a relying party can verify "this pubkey was generated inside an attested AgentKeys TEE" before adding it to their trust store. + +### Impact + +Without the discovery-side story, Stage 7 OIDC federation works but the security argument is weaker: AWS trusts our issuer URL, but a pivoted / compromised TEE could publish whatever pubkey it wants. With attested publication, compromise-to-impersonation requires also compromising the attestation pipeline. + +### Migration path + +Defer past Stage 7 — this is a hardening follow-up, not a Stage 6/7 blocker. Tracked here so it's not forgotten. + +--- + +## 8. Tracking + +- Each gap is owned as a separate issue in the `litentry/agentKeys` repo (TBD — file when this doc merges). +- When a gap closes, mark the section **RESOLVED** with the merge commit(s) and the resolution path (A/B/C from §2). +- When a new delta is discovered, append a new section here before revising the wiki, so the wiki stays "desired" and this doc stays "gap". diff --git a/wiki/Home.md b/wiki/Home.md index 26559db..3e9240a 100644 --- a/wiki/Home.md +++ b/wiki/Home.md @@ -64,6 +64,7 @@ Canonical design records live in `docs/spec/`: - **`docs/spec/email-signing-backends.md`** — generalized backend comparison (SES / DWD / SaaS). - **`docs/spec/credential-backend-interface.md`** — the `CredentialBackend` trait. - **`docs/spec/architecture.md`** — 13-component system architecture. +- **`docs/spec/heima-gaps-vs-desired-architecture.md`** — living gap list: where current upstream `litentry/heima` differs from what the wiki describes (HDKD master seed, OIDC provider, BYODKIM, email pallets, session-tag propagation). Demo / operator docs: diff --git a/wiki/blockchain-tee-architecture.md b/wiki/blockchain-tee-architecture.md index f1965e3..38e7929 100644 --- a/wiki/blockchain-tee-architecture.md +++ b/wiki/blockchain-tee-architecture.md @@ -51,16 +51,18 @@ The TEE is a **stateless computation oracle**. It reads chain state, performs cr **What it holds (TEE-internal, sealed/persistent):** -| Data | Lifetime | How generated | Purpose | -| -------------------------------------------- | ------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- | -| Shielding keypair | Permanent (sealed storage, pubkey registered on chain via `register_enclave()`) | Generated at enclave startup | Encrypt/decrypt credential blobs | -| RSA JWT signing key | Permanent (stored as PKCS#1 DER file) | `RsaPrivateKey::new(&mut rng, 2048)` — randomly generated, NOT derived from a master seed | Sign session tokens (JWT format) issued to clients | -| Per-user custodial wallet keys (BTC/ETH/TON) | Permanent (sealed, per `pallet-bitacross` pattern) | Generated per account creation, independently per user | Sign on-chain extrinsics on behalf of user wallets. Private key never leaves the enclave. | -| AES response keys | Ephemeral (per-request) | From `RequestAesKey` parameter | Encrypt sensitive responses to specific clients | -| Chain state cache (optional) | ≤ 1 block (~6s) | Read from chain | Performance optimization. Not authoritative — chain is truth. | +| Data | Lifetime | How generated | Purpose | +| ------------------------------------------------- | --------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | +| **TEE master seed** | Permanent (sealed storage, never leaves enclave, never exposed) | Generated once at first enclave provisioning from a hardware RNG (256-bit) | Root of all HD derivation. Every other key below derives from this seed. | +| Shielding keypair | Permanent (sealed storage, pubkey registered on chain via `register_enclave()`) | Derived from master seed at path `shielding/v1` (SLIP-0010 / BIP-32-style HDKD) | Encrypt/decrypt credential blobs | +| Issuer signing key (RSA-2048 or ES256) | Permanent (sealed storage, pubkey published via OIDC discovery + `register_enclave()`) | Derived from master seed at path `issuer/jwt/v1` (SLIP-0010) | Sign session tokens (JWT format) issued to clients; also the OIDC issuer key (Stage 7) | +| Per-user custodial wallet keys (BTC/ETH/TON) | Permanent (derived on demand, cacheable; deterministic re-derivation after restart) | Derived from master seed at path `wallet///v1` (SLIP-0010) | Sign on-chain extrinsics on behalf of user wallets. Private key never leaves the enclave. | +| Per-domain DKIM signing key (Stage 6) | Permanent (derived on demand, public key published as DNS TXT record) | Derived from master seed at path `dkim//v1` (Ed25519, RFC 8463) | Sign outbound mail for `@agentkeys-email.io` and user-owned domains | +| AES response keys | Ephemeral (per-request) | From `RequestAesKey` parameter | Encrypt sensitive responses to specific clients | +| Chain state cache (optional) | ≤ 1 block (~6s) | Read from chain | Performance optimization. Not authoritative — chain is truth. | -> **Correction (verified against Heima source 2026-04-12):** The TEE holds **multiple independent keys**, not a single master seed with HD derivation. The RSA JWT key, shielding key, and per-user wallet keys are each generated independently and stored separately. OmniAccount *addresses* are deterministically derived (`OmniAccountConverter::convert(&identity, &client_id)`), but the underlying *private keys* are not HD-derived. +> **Desired architecture (this spec):** All long-lived TEE keys are deterministically derived from a single sealed master seed via SLIP-0010 HDKD. This makes the TEE's key surface infinitely extensible (new services add new derivation paths, no new randomness or new storage slots), supports clean disaster recovery (a reprovisioned enclave with the same sealed seed reconstructs every subkey), and matches how we already treat OmniAccount addresses. Current Heima source generates keys independently instead — the gap, its impact, and the migration path are tracked in [`docs/spec/heima-gaps-vs-desired-architecture.md`](../docs/spec/heima-gaps-vs-desired-architecture.md). **What it does:** @@ -517,13 +519,13 @@ This gets the per-read latency down to pure-TEE-backend levels for hot-path read ## 6. Summary: the four rules -> **Updated 2026-04-19** to add rule #4 (credential broker, not operation proxy) after the email, knowledge-base, and OIDC-federation design rounds. +> **Updated 2026-04-19** to (a) add rule #4 (credential broker, not operation proxy) after the email, knowledge-base, and OIDC-federation design rounds, and (b) re-anchor rule #2 on the DESIRED architecture: a single TEE master seed with SLIP-0010 HDKD for every long-lived subkey (shielding, issuer JWT, per-user wallet, per-domain DKIM). Current Heima source generates these independently — the gap list lives in [`docs/spec/heima-gaps-vs-desired-architecture.md`](../docs/spec/heima-gaps-vs-desired-architecture.md). > **Corrected 2026-04-12** after verifying against the actual Heima source code (`litentry/heima` on GitHub). The previous version of rule #3 stated "clients hold only their own private keys" — this was wrong. Clients hold JWTs (bearer tokens), not private keys. All private keys live inside the TEE. The entire AgentKeys v0.1 architecture follows four rules: 1. **Chain stores everything persistent.** Account records, credential blobs (encrypted), pair requests, approvals, audit events, wallet balances, revocation lists. The chain is the single source of truth. If the TEE restarts, if the daemon crashes, if the user switches devices — chain state is always there. -2. **TEE holds all private keys and does all computation.** The TEE holds the shielding key, the RSA JWT signing key, and per-user custodial wallet keys (per `pallet-bitacross` pattern). These are generated independently (not derived from a single master seed) and sealed inside the enclave. The TEE decrypts credential blobs, issues and verifies JWTs, signs on-chain extrinsics using the user's wallet key, and enforces scope + rate limits. No private key ever leaves the TEE. +2. **TEE holds all private keys and does all computation.** The TEE holds a single sealed master seed and deterministically derives every other long-lived key from it via SLIP-0010 HDKD: the shielding key (`shielding/v1`), the issuer signing key for session JWTs and the OIDC provider (`issuer/jwt/v1`), per-user custodial wallet keys (`wallet///v1`, per `pallet-bitacross` pattern), and per-domain DKIM signing keys (`dkim//v1`, Ed25519, Stage 6). The TEE decrypts credential blobs, issues and verifies JWTs, signs on-chain extrinsics using the user's wallet key, signs outbound mail, and enforces scope + rate limits. No private key ever leaves the TEE. (Current Heima source generates these keys independently rather than HD-derived — see [`docs/spec/heima-gaps-vs-desired-architecture.md`](../docs/spec/heima-gaps-vs-desired-architecture.md) for the migration gap.) 3. **Clients hold only a JWT (bearer token), not private keys.** The master CLI and agent daemon each hold a JWT string issued by the TEE upon authentication. The JWT is a signed bearer token (`AuthTokenClaims { sub, typ, exp, aud }`), not a private key. However, it IS still a bearer credential — anyone with the string can impersonate the user until it expires. **OS keychain is the recommended default** for the master CLI (provides app-level ACL against malware-as-same-user). Plain file (mode 0600) is an acceptable fallback for daemon/sandbox/CI where keychain isn't available. If the JWT leaks, the blast radius is bounded by its expiration time (~~24h) and the on-chain revocation list (~~6s). If the JWT expires, the client re-authenticates and gets a new one. 4. **AgentKeys brokers credentials, not operations.** Our infrastructure mints ephemeral credentials (JWTs, temp cloud creds, decrypted API keys) and emits audit extrinsics at mint time. The daemon then calls remote services (SES, S3, GitHub, Notion, LLM APIs, …) **directly** using those credentials — we never proxy per-operation reads/writes. Compute cost on our side scales with user count, not with operation frequency. Per-user isolation on shared cloud resources is enforced by the cloud itself via PrincipalTag / session-tag conditions derived from JWT claims (see `.omc/wiki/tag-based-access.md`). This rule is why the email, knowledge-base, and OIDC-federation designs never build proxies, SaaS feature surfaces, or per-operation compute on our side. diff --git a/wiki/session-token.md b/wiki/session-token.md index 5db4f31..283b565 100644 --- a/wiki/session-token.md +++ b/wiki/session-token.md @@ -64,12 +64,11 @@ TEE signs a session token with its RSA private key: TEE returns the token string to the client ``` -The RSA signing key: +The issuer signing key: -- Lives inside the TEE (sealed storage) -- Is a 2048-bit RSA key generated randomly (`RsaPrivateKey::new(&mut rng, 2048)`) -- Is NOT derived from a master seed — it's an independent key per TEE worker instance -- Public key is derivable from the private key for verification +- Lives inside the TEE (sealed storage), derived from the sealed TEE master seed at path `issuer/jwt/v1` via SLIP-0010 HDKD — the same seed that roots the shielding key, per-user wallet keys, and per-domain DKIM keys (see [Blockchain TEE Architecture §1](blockchain-tee-architecture#tee-trusted-execution-environment-worker) and [`docs/spec/heima-gaps-vs-desired-architecture.md`](../docs/spec/heima-gaps-vs-desired-architecture.md) for the current-vs-desired gap) +- Default alg is RSA-2048 (SHA-256) for backward compatibility with existing JWT verifiers; ES256 (ECDSA P-256) is the preferred alg once Stage 7 ships, since AWS IAM OIDC accepts ES256 but not Ed25519 +- Public key published on chain via `register_enclave()` AND (Stage 7) via the OIDC discovery document at `/.well-known/openid-configuration` + JWKS endpoint --- From 535ac9642eb338d8c694bc1ff29303f1f98c8355 Mon Sep 17 00:00:00 2001 From: wildmeta-agent Date: Sun, 19 Apr 2026 23:54:17 +0800 Subject: [PATCH 3/5] docs(wiki): promote 6 service-architecture pages to published wiki MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The dev-stages doc (and others) referenced Obsidian-style [[hosted-first]] wiki-links pointing into .omc/wiki/, which is git-ignored and lives only on the author's machine. Readers on GitHub saw the reference with nothing behind it. Move the six content pages from .omc/wiki/ to wiki/ so the mirrored GitHub wiki publishes them: - wiki/hosted-first.md - wiki/oidc-federation.md - wiki/tag-based-access.md - wiki/knowledge-storage.md - wiki/email-system.md - wiki/overview.md Inside each page, convert [[foo]] wiki-links to [foo](foo) Markdown links (GitHub's Markdown doesn't resolve double-bracket syntax). Update every external reference: - wiki/Home.md: drop the ".omc/wiki/ — not published" section, relink the reading-order matrix to published paths. - wiki/blockchain-tee-architecture.md Rule #4: tag-based-access wiki link. - docs/spec/plans/development-stages.md: all [[foo]] (wiki) refs become ../../../wiki/foo.md Markdown links. - docs/spec/ses-email-architecture.md: same, relative ../../wiki/. - docs/spec/heima-gaps-vs-desired-architecture.md: same. - docs/manual-test-stage5.md: ../.omc/wiki/ -> ../wiki/. - docs/stage5-workspace-email-setup.md: same. .omc/wiki/{index,log}.md stay — they're LLM-wiki tooling artifacts, not content. --- docs/manual-test-stage5.md | 2 +- .../heima-gaps-vs-desired-architecture.md | 4 +- docs/spec/plans/development-stages.md | 10 +- docs/spec/ses-email-architecture.md | 8 +- docs/stage5-workspace-email-setup.md | 2 +- wiki/Home.md | 24 +- wiki/blockchain-tee-architecture.md | 2 +- wiki/email-system.md | 274 +++++++++++++++ wiki/hosted-first.md | 239 +++++++++++++ wiki/knowledge-storage.md | 195 +++++++++++ wiki/oidc-federation.md | 326 ++++++++++++++++++ wiki/overview.md | 125 +++++++ wiki/tag-based-access.md | 266 ++++++++++++++ 13 files changed, 1451 insertions(+), 26 deletions(-) create mode 100644 wiki/email-system.md create mode 100644 wiki/hosted-first.md create mode 100644 wiki/knowledge-storage.md create mode 100644 wiki/oidc-federation.md create mode 100644 wiki/overview.md create mode 100644 wiki/tag-based-access.md diff --git a/docs/manual-test-stage5.md b/docs/manual-test-stage5.md index 42916da..706b83a 100644 --- a/docs/manual-test-stage5.md +++ b/docs/manual-test-stage5.md @@ -28,7 +28,7 @@ agentkeys provision openrouter For the demo-only purpose of Stage 5, the goal is the **shortest path to a running provisioner** with an inbox the agent fully controls. Use a dedicated personal Gmail below — reuses our existing IMAP code path, ~10 minutes total setup, no Workspace subscription required. -> **This is a temporary demo solution.** For production (v0.1), the agent mailbox moves to SES-hosted `*@bots.wildmeta.ai` under the three-layer `TokenAuthority` abstraction. See the [email-system wiki page](../.omc/wiki/email-system.md) for the full architecture and why we're running demo-and-production on different backends deliberately. +> **This is a temporary demo solution.** For production (v0.1), the agent mailbox moves to SES-hosted `*@bots.wildmeta.ai` under the three-layer `TokenAuthority` abstraction. See the [email-system wiki page](../wiki/email-system.md) for the full architecture and why we're running demo-and-production on different backends deliberately. #### 🚀 Demo path: dedicated personal Gmail + TOTP + app password diff --git a/docs/spec/heima-gaps-vs-desired-architecture.md b/docs/spec/heima-gaps-vs-desired-architecture.md index 5347d19..d3b11be 100644 --- a/docs/spec/heima-gaps-vs-desired-architecture.md +++ b/docs/spec/heima-gaps-vs-desired-architecture.md @@ -81,7 +81,7 @@ The TEE's issuer signing key (derivation path `issuer/jwt/v1`, alg **ES256**) do - `iss = https://oidc.agentkeys.io` (or per-tenant subdomain). - `/.well-known/openid-configuration` served from a plain HTTPS endpoint (static file, no compute; just publishes the issuer URL, JWKS URL, supported algs). - `/.well-known/jwks.json` serves the ES256 public key as a JWK. -- JWT claims include the user's OmniAccount wallet as a custom claim (`agentkeys_user_wallet`) so relying parties can gate access via `sts:TagSession` / `aws:PrincipalTag` conditions (see [`.omc/wiki/tag-based-access.md`](../../.omc/wiki/tag-based-access.md)). +- JWT claims include the user's OmniAccount wallet as a custom claim (`agentkeys_user_wallet`) so relying parties can gate access via `sts:TagSession` / `aws:PrincipalTag` conditions (see [`wiki/tag-based-access.md`](../../wiki/tag-based-access.md)). ### Impact @@ -158,7 +158,7 @@ The TEE mints JWTs with standard claims (`sub`, `typ`, `exp`, `aud`). There is n The JWT the TEE mints carries `agentkeys_user_wallet = ` as a claim. When a client does `sts:AssumeRoleWithWebIdentity` with that JWT, STS extracts the claim and attaches it as a session tag. Downstream bucket policies and KMS policies pattern-match on `aws:PrincipalTag/agentkeys_user_wallet = ${aws:SourceIdentity}` or similar, giving us per-user isolation on shared cloud resources **without** per-user IAM roles. -See [`.omc/wiki/tag-based-access.md`](../../.omc/wiki/tag-based-access.md) for the full pattern. +See [`wiki/tag-based-access.md`](../../wiki/tag-based-access.md) for the full pattern. ### Impact diff --git a/docs/spec/plans/development-stages.md b/docs/spec/plans/development-stages.md index 2d94ac6..86942e6 100644 --- a/docs/spec/plans/development-stages.md +++ b/docs/spec/plans/development-stages.md @@ -117,10 +117,10 @@ After the Stage 5a demo path landed and the email-system architecture + TEE-as-O The three architectural wiki pages on our email/OIDC design surfaced a coherent v0.1 milestone that does more for product-and-user value than packaging or late-stage hardening: -1. **Hosted-first default** — non-developer users get `xxxxx@agentkeys-email.io` with zero configuration, parallel to how AgentMail mints default-domain inboxes. See [[hosted-first]] (wiki). -2. **TEE holds all signing keys natively** — the Ed25519 DKIM key and ES256 OIDC-issuer key join the existing shielding/JWT/wallet derivation paths, all under `blockchain-tee-architecture.md` rule #2. See [[oidc-federation]] (wiki). -3. **Per-user isolation without per-user IAM** — JWT claim `agentkeys_user_wallet` → AWS session tag → `aws:PrincipalTag` in bucket/role policy = one bucket, N users, cryptographic separation. See [[tag-based-access]] (wiki). -4. **Knowledge-base decision deferred** — Stage 6/7 deliver the mechanism; which backend (GitHub / AWS S3 / Google Drive / Ali Cloud OSS) we ship as default is decided later per user segment. See [[knowledge-storage]] (wiki). +1. **Hosted-first default** — non-developer users get `xxxxx@agentkeys-email.io` with zero configuration, parallel to how AgentMail mints default-domain inboxes. See [`wiki/hosted-first.md`](../../../wiki/hosted-first.md). +2. **TEE holds all signing keys natively** — the Ed25519 DKIM key and ES256 OIDC-issuer key join the existing shielding/JWT/wallet derivation paths, all under `blockchain-tee-architecture.md` rule #2. See [`wiki/oidc-federation.md`](../../../wiki/oidc-federation.md). +3. **Per-user isolation without per-user IAM** — JWT claim `agentkeys_user_wallet` → AWS session tag → `aws:PrincipalTag` in bucket/role policy = one bucket, N users, cryptographic separation. See [`wiki/tag-based-access.md`](../../../wiki/tag-based-access.md). +4. **Knowledge-base decision deferred** — Stage 6/7 deliver the mechanism; which backend (GitHub / AWS S3 / Google Drive / Ali Cloud OSS) we ship as default is decided later per user segment. See [`wiki/knowledge-storage.md`](../../../wiki/knowledge-storage.md). **Broker-not-proxy principle.** Stages 6 and 7 both adhere to the principle that AgentKeys infrastructure mints ephemeral credentials and the daemon talks to remote services directly via MCP. Our backend never proxies per-user reads/writes. This keeps compute cost flat with user count (scales with sign-up rate, not operation frequency) and aligns with `blockchain-tee-architecture.md` rules #2–#3. @@ -960,7 +960,7 @@ agentkeys usage my-agent --filter email ### Architecture summary -See [[oidc-federation]] (wiki) for the full design. High-level: +See [`wiki/oidc-federation.md`](../../../wiki/oidc-federation.md) for the full design. High-level: 1. **OIDC issuer endpoint** — stable HTTPS URL `https://oidc.agentkeys.dev` with Let's Encrypt cert, static `/.well-known/openid-configuration` and `/.well-known/jwks.json` served by a thin proxy. 2. **One signing key** — ES256 at `derive("oidc/issuer/v1")`, reused from Stage 6. No new key material. diff --git a/docs/spec/ses-email-architecture.md b/docs/spec/ses-email-architecture.md index f4d917e..e072601 100644 --- a/docs/spec/ses-email-architecture.md +++ b/docs/spec/ses-email-architecture.md @@ -22,7 +22,7 @@ Email is the **dominant human-in-the-loop channel** every external API signup, O 4. **Cheap to scale** — thousands of throwaway inboxes per month without a seat-license model. 5. **No foreign admin-console step per inbox** — one-time domain onboarding only. 6. **Zero user setup in the default path** — Stage 6 target is "inbox exists the moment the agent is created; no DNS, no admin console, no Workspace subscription on the user side." -7. **Broker-not-proxy** — our backend mints credentials; the daemon calls SES and S3 directly via MCP. Per-operation compute on our side is zero. See [[hosted-first]] for the user-segmentation framework and [[knowledge-storage]] for the parallel deferred decision on knowledge storage. +7. **Broker-not-proxy** — our backend mints credentials; the daemon calls SES and S3 directly via MCP. Per-operation compute on our side is zero. See [`wiki/hosted-first.md`](../../wiki/hosted-first.md) for the user-segmentation framework and [`wiki/knowledge-storage.md`](../../wiki/knowledge-storage.md) for the parallel deferred decision on knowledge storage. Gmail Workspace with DWD satisfies 1 but fails 2–7. AgentMail (SaaS) satisfies 1, 3, 4, 6 but fails 2 and adds vendor lock. **AWS SES with our own thin inbox-abstraction layer satisfies all seven.** This spec defines that layer. @@ -187,7 +187,7 @@ We **deliver these records as a BIND zone file download** (same UX as AgentMail) ## 10.4. Per-user isolation on the shared `agentkeys-mail` bucket — PrincipalTag pattern -Stage 6 hosts every user's inbox in one AWS account, one S3 bucket, one IAM role. Per-user isolation is cryptographically enforced by AWS using the **PrincipalTag-from-JWT-claim** pattern. See [[tag-based-access]] for the full mechanics. +Stage 6 hosts every user's inbox in one AWS account, one S3 bucket, one IAM role. Per-user isolation is cryptographically enforced by AWS using the **PrincipalTag-from-JWT-claim** pattern. See [`wiki/tag-based-access.md`](../../wiki/tag-based-access.md) for the full mechanics. ### Summary of the mechanism @@ -268,7 +268,7 @@ AWS SES API calls require IAM authentication. Rather than seal a long-lived IAM Net: **no static AWS credentials at rest anywhere in AgentKeys.** TEE compromise = all federated creds compromised (same as before). Anything short of TEE compromise = zero blast radius. -The same OIDC provider federates into GCP Workload Identity, Azure AD, Snowflake, Kubernetes, and any other external-OIDC consumer. One issuer, N clouds. See `.omc/wiki/oidc-federation.md` for the generalization. +The same OIDC provider federates into GCP Workload Identity, Azure AD, Snowflake, Kubernetes, and any other external-OIDC consumer. One issuer, N clouds. See [`wiki/oidc-federation.md`](../../wiki/oidc-federation.md) for the generalization. ## 11. How this plugs into the three-layer abstraction @@ -356,7 +356,7 @@ Total: ~2 weeks. No Lambda, no DynamoDB, no server-side MIME parsing — the bro ## 16. Cross-references -- **`.omc/wiki/oidc-federation.md`** — the generalized OIDC-provider design that §10.5 references; explains how the same ES256 key federates into AWS, GCP, Azure, Snowflake, K8s +- **[`wiki/oidc-federation.md`](../../wiki/oidc-federation.md)** — the generalized OIDC-provider design that §10.5 references; explains how the same ES256 key federates into AWS, GCP, Azure, Snowflake, K8s - `docs/spec/email-signing-backends.md` — the generalized trait (needs an SES section added; this spec supplies the content) - `docs/spec/credential-backend-interface.md` — the parent trait this extends - `docs/stage5-workspace-email-setup.md` — alternative: Google DWD operator runbook (preserved for enterprise deployments) diff --git a/docs/stage5-workspace-email-setup.md b/docs/stage5-workspace-email-setup.md index 7d3851e..9dd5d5b 100644 --- a/docs/stage5-workspace-email-setup.md +++ b/docs/stage5-workspace-email-setup.md @@ -1,6 +1,6 @@ # Google Workspace email path — ADVANCED / BYO (deferred past Stage 7) -> **⚠️ Deferred (2026-04-19).** This runbook is now an **advanced bring-your-own path**, not the default for Stage 5 or Stage 6. The Stage 6 default is hosted `xxxxx@agentkeys-email.io` on AgentKeys infrastructure — zero setup for non-developers. See `docs/spec/ses-email-architecture.md` and [[hosted-first]] (wiki) for the hosted default, and `docs/spec/plans/development-stages.md` for the revised stage roadmap. +> **⚠️ Deferred (2026-04-19).** This runbook is now an **advanced bring-your-own path**, not the default for Stage 5 or Stage 6. The Stage 6 default is hosted `xxxxx@agentkeys-email.io` on AgentKeys infrastructure — zero setup for non-developers. See `docs/spec/ses-email-architecture.md` and [`wiki/hosted-first.md`](../wiki/hosted-first.md) for the hosted default, and `docs/spec/plans/development-stages.md` for the revised stage roadmap. > > **Preserved here** for operators who specifically want to run AgentKeys email inside their existing Google Workspace (enterprise / regulated / data-residency reasons). The architecture is parallel to the hosted SES path — same three-layer abstraction, same Touch-ID gate, same chain audit — just a different cloud. diff --git a/wiki/Home.md b/wiki/Home.md index 3e9240a..c854e1e 100644 --- a/wiki/Home.md +++ b/wiki/Home.md @@ -31,16 +31,16 @@ Every spec and every service on top of AgentKeys preserves these four invariants - **[Credential Usage](credential-usage)** — store → run/read → revoke - **[Serve and Audit](serve-and-audit)** — Pattern-4 per-read audit flow -### Service architectures (project-local scratchpad, `.omc/wiki/` — not published) +### Service architectures (published wiki, Stage 6/7) -Design docs for specific services built on top of the foundations. Short high-level names, kept project-local because they evolve fast: +Design docs for specific services built on top of the foundations: -- **overview** — tree + reading order for the service-architecture pages -- **hosted-first** — Stage 6 default (`xyz@agentkeys-email.io` on our infra) vs bring-your-own (advanced) -- **tag-based-access** — `agentkeys_user_wallet` JWT claim → AWS PrincipalTag → per-user isolation on shared buckets -- **oidc-federation** — TEE as a conforming OIDC issuer; one ES256 key federates into AWS / GCP / Azure / Ali / K8s -- **email-system** — Stage 6 email architecture on AWS SES; broker-not-proxy; zero per-operation compute -- **knowledge-storage** — deferred backend decision between GitHub / AWS S3 / Google Drive / Ali Cloud OSS, by user segment +- **[Overview](overview)** — tree + reading order for the service-architecture pages +- **[Hosted-First](hosted-first)** — Stage 6 default (`xyz@agentkeys-email.io` on our infra) vs bring-your-own (advanced) +- **[Tag-Based Access](tag-based-access)** — `agentkeys_user_wallet` JWT claim → AWS PrincipalTag → per-user isolation on shared buckets +- **[OIDC Federation](oidc-federation)** — TEE as a conforming OIDC issuer; one ES256 key federates into AWS / GCP / Azure / Ali / K8s +- **[Email System](email-system)** — Stage 6 email architecture on AWS SES; broker-not-proxy; zero per-operation compute +- **[Knowledge Storage](knowledge-storage)** — deferred backend decision between GitHub / AWS S3 / Google Drive / Ali Cloud OSS, by user segment --- @@ -48,10 +48,10 @@ Design docs for specific services built on top of the foundations. Short high-le | Role | Start here | Then | Then | |---|---|---|---| -| New engineer | [Blockchain TEE Architecture](blockchain-tee-architecture) | [Session Token](session-token) | `.omc/wiki/email-system.md` | -| Product / roadmap | This page, §Wiki tree | `docs/spec/plans/development-stages.md` | `.omc/wiki/hosted-first.md` | -| Operator / infra | [Key Security](key-security), [Serve and Audit](serve-and-audit) | `docs/spec/ses-email-architecture.md` | `.omc/wiki/oidc-federation.md` §Consumer-registration recipes | -| Security reviewer | [Blockchain TEE Architecture](blockchain-tee-architecture) §6 (four rules) | [Data Classification](data-classification) | `.omc/wiki/tag-based-access.md` §Security and attacker surface | +| New engineer | [Blockchain TEE Architecture](blockchain-tee-architecture) | [Session Token](session-token) | [Email System](email-system) | +| Product / roadmap | This page, §Wiki tree | `docs/spec/plans/development-stages.md` | [Hosted-First](hosted-first) | +| Operator / infra | [Key Security](key-security), [Serve and Audit](serve-and-audit) | `docs/spec/ses-email-architecture.md` | [OIDC Federation](oidc-federation) §Consumer-registration recipes | +| Security reviewer | [Blockchain TEE Architecture](blockchain-tee-architecture) §6 (four rules) | [Data Classification](data-classification) | [Tag-Based Access](tag-based-access) §Security and attacker surface | --- diff --git a/wiki/blockchain-tee-architecture.md b/wiki/blockchain-tee-architecture.md index 38e7929..f09c909 100644 --- a/wiki/blockchain-tee-architecture.md +++ b/wiki/blockchain-tee-architecture.md @@ -527,7 +527,7 @@ The entire AgentKeys v0.1 architecture follows four rules: 1. **Chain stores everything persistent.** Account records, credential blobs (encrypted), pair requests, approvals, audit events, wallet balances, revocation lists. The chain is the single source of truth. If the TEE restarts, if the daemon crashes, if the user switches devices — chain state is always there. 2. **TEE holds all private keys and does all computation.** The TEE holds a single sealed master seed and deterministically derives every other long-lived key from it via SLIP-0010 HDKD: the shielding key (`shielding/v1`), the issuer signing key for session JWTs and the OIDC provider (`issuer/jwt/v1`), per-user custodial wallet keys (`wallet///v1`, per `pallet-bitacross` pattern), and per-domain DKIM signing keys (`dkim//v1`, Ed25519, Stage 6). The TEE decrypts credential blobs, issues and verifies JWTs, signs on-chain extrinsics using the user's wallet key, signs outbound mail, and enforces scope + rate limits. No private key ever leaves the TEE. (Current Heima source generates these keys independently rather than HD-derived — see [`docs/spec/heima-gaps-vs-desired-architecture.md`](../docs/spec/heima-gaps-vs-desired-architecture.md) for the migration gap.) 3. **Clients hold only a JWT (bearer token), not private keys.** The master CLI and agent daemon each hold a JWT string issued by the TEE upon authentication. The JWT is a signed bearer token (`AuthTokenClaims { sub, typ, exp, aud }`), not a private key. However, it IS still a bearer credential — anyone with the string can impersonate the user until it expires. **OS keychain is the recommended default** for the master CLI (provides app-level ACL against malware-as-same-user). Plain file (mode 0600) is an acceptable fallback for daemon/sandbox/CI where keychain isn't available. If the JWT leaks, the blast radius is bounded by its expiration time (~~24h) and the on-chain revocation list (~~6s). If the JWT expires, the client re-authenticates and gets a new one. -4. **AgentKeys brokers credentials, not operations.** Our infrastructure mints ephemeral credentials (JWTs, temp cloud creds, decrypted API keys) and emits audit extrinsics at mint time. The daemon then calls remote services (SES, S3, GitHub, Notion, LLM APIs, …) **directly** using those credentials — we never proxy per-operation reads/writes. Compute cost on our side scales with user count, not with operation frequency. Per-user isolation on shared cloud resources is enforced by the cloud itself via PrincipalTag / session-tag conditions derived from JWT claims (see `.omc/wiki/tag-based-access.md`). This rule is why the email, knowledge-base, and OIDC-federation designs never build proxies, SaaS feature surfaces, or per-operation compute on our side. +4. **AgentKeys brokers credentials, not operations.** Our infrastructure mints ephemeral credentials (JWTs, temp cloud creds, decrypted API keys) and emits audit extrinsics at mint time. The daemon then calls remote services (SES, S3, GitHub, Notion, LLM APIs, …) **directly** using those credentials — we never proxy per-operation reads/writes. Compute cost on our side scales with user count, not with operation frequency. Per-user isolation on shared cloud resources is enforced by the cloud itself via PrincipalTag / session-tag conditions derived from JWT claims (see [Tag-Based Access](tag-based-access)). This rule is why the email, knowledge-base, and OIDC-federation designs never build proxies, SaaS feature surfaces, or per-operation compute on our side. Every flow in the system (credential store, credential read, pairing, revocation, audit query, email read/send, knowledge-base ops) is an instance of: diff --git a/wiki/email-system.md b/wiki/email-system.md new file mode 100644 index 0000000..93e0141 --- /dev/null +++ b/wiki/email-system.md @@ -0,0 +1,274 @@ +--- +title: "Email System — Architecture, Backends, and Usage Isolation" +tags: ["email", "architecture", "iam", "token-authority", "grant-store", "stage5", "isolation"] +created: 2026-04-18T06:58:59.906Z +updated: 2026-04-18T06:58:59.906Z +sources: [] +links: ["session-token.md", "blockchain-tee-architecture.md", "key-security.md"] +category: architecture +confidence: medium +schemaVersion: 1 +--- + +# Email System — Architecture, Backends, and Usage Isolation + +**Status:** design consolidation (2026-04-18) +**Scope:** how AgentKeys handles email across Stage 5 (provisioning demo) and v0.1 (production) +**Companion specs:** +- `docs/spec/ses-email-architecture.md` — Stage 6 primary: SES data model, pipelines, DNS setup, key derivation, PrincipalTag isolation +- [oidc-federation](oidc-federation) — how the TEE federates as an OIDC identity provider into AWS/GCP/Azure/etc. (the "no static cloud credentials" story; SES IAM access rides on this) +- [tag-based-access](tag-based-access) — the JWT-claim → session-tag → bucket-policy mechanism enforcing per-user isolation on the shared `agentkeys-mail` bucket +- [hosted-first](hosted-first) — why `xxxxx@agentkeys-email.io` is the default; BYO custom domain is deferred to Stage 7+ +- [knowledge-storage](knowledge-storage) — the parallel deferred decision for knowledge-base storage (GitHub / S3 / Drive / Ali Cloud) +- `docs/spec/email-signing-backends.md` — generalized three-layer backend comparison (DWD / TEE / SES) +- `docs/stage5-workspace-email-setup.md` — **advanced / deferred:** Google Workspace DWD operator runbook for enterprise BYO + +--- + +## TL;DR + +AgentKeys treats email as a credential-managed resource under the same `Authority` / `Broker` / `GrantStore` abstraction used for session tokens and API keys. Three email channels exist in the product, strictly separated by role: + +1. **Agent mailbox** — hosted on AgentKeys infrastructure at `xxxxx@agentkeys-email.io` (Stage 6 default, zero setup). Service providers (OpenRouter, Anthropic, Brave, …) send OTPs here. Agent reads via minted creds directly from S3; we never proxy. Per-user isolation enforced by AWS via `aws:PrincipalTag/agentkeys_user_wallet` from the OIDC JWT claim. +2. **User identity / notification** — user's own Gmail (or whatever). We send TO this address; we NEVER read from it. +3. **User approval (optional 2FA)** — same address as #2. We send a magic-link or 6-digit code for high-value operations; user clicks/types; backend creates the grant. Alternative / supplement to Touch ID (#11) when the master Mac is unavailable. + +**Three principles anchor the design:** +- **Hosted-first**: every user gets a working inbox at zero setup cost. See [hosted-first](hosted-first). BYO custom domain is deferred past Stage 7. +- **Send to the user's Gmail, never read from it**: agent-side mail lives on our SES; user-side mail stays fully under user control. +- **Broker, not proxy**: our backend mints ephemeral SES/S3 credentials; the daemon calls SES and S3 directly via MCP. Per-operation compute cost on our side is zero. + +--- + +## Usage isolation — three email channels, three roles + +``` +┌──────────────────────────────────────────────────────────────┐ +│ 1. AGENT MAILBOX │ +│ @bots.wildmeta.ai (our SES, v0.1) │ +│ Hosts: our infra. Agent reads via backend only. │ +│ Purpose: receive service-provider OTPs + confirmations. │ +│ User does NOT share this inbox. │ +│ │ +│ 2. USER IDENTITY / NOTIFICATION │ +│ jane@gmail.com (her own Gmail) │ +│ Hosts: Google. We SEND TO it via our SES. Never read. │ +│ Purpose: login identity + status notifications. │ +│ │ +│ 3. USER APPROVAL (optional 2FA) │ +│ Same as #2 (jane@gmail.com). │ +│ Purpose: magic-link or 6-digit code for approvals when │ +│ Touch ID on master Mac isn't reachable. Send-only. │ +│ │ +└──────────────────────────────────────────────────────────────┘ +``` + +### Why the separation matters + +- **Audit attribution.** Agent mail is under our full control → every read logs per-child. The user's real Gmail never pollutes our audit trail. +- **Credential isolation.** Each agent has its own mailbox (native in SES; per-throwaway-user in DWD). No shared root credential reads multiple users' mail. +- **No per-user OAuth integration.** We never store refresh tokens into users' Gmail. Onboarding is zero-friction for users. +- **User stays in control.** Their personal inbox is untouched by their agents' work. + +### What this rules out + +- Reading the user's real Gmail for OTPs. Even if we could get OAuth consent, it collapses channels #1 and #2 into one inbox — bad for audit, bad for user experience, fragile against Google's policy changes. +- Using a service account or DWD to impersonate the user on their personal Gmail. Personal Gmail can't have DWD pointed at it; would require an OAuth app with install-wide consent, which Google restricts. + +--- + +## Three-layer abstraction + +The centralized-root-key + signed-short-lived-token-after-policy-check pattern (common to AWS STS, Google DWD, Kubernetes TokenRequest, OAuth2, Vault) decomposes into three concerns: + +| Layer | Responsibility | Max TTL enforced here | +|---|---|---| +| `TokenAuthority` | Holds the signing key, produces tokens | Platform limit (e.g. Google 1h) | +| `GrantStore` | Durable long-lived authorization policy; Touch-ID-gated on create | AgentKeys policy (30d) | +| `TokenBroker` | Verifies session, checks grant, clamps TTL to `min(spec, grant, authority)`, calls authority, emits audit | — | + +**Critical invariant: Broker is colocated with Authority** (same trust boundary). The daemon is a thin client calling broker over RPC; policy enforcement never lives behind the daemon (that would allow policy-skip attacks). + +### Services × Versions matrix + +Each cell: *Authority / Broker / GrantStore · max TTL* + +| Service | v0 (mock) | v0.1 (Heima TEE) | +|---|---|---| +| Session tokens | MockBackend / MockBackend / SQLite · 30d | TEE / TEE / chain · 30d | +| Credential access | MockBackend (AES) / MockBackend / SQLite · — | TEE (shielding key) / TEE / chain · — | +| Email access | MockBackend (wraps SES or DWD) / MockBackend / SQLite · 30d grant | TEE (wraps SES or DWD) / TEE / chain · 30d grant | +| Pairing / AuthRequest | MockBackend / MockBackend / SQLite · 60s → 30d | TEE / TEE / chain · 60s → 30d | +| Audit events | MockBackend (appends rows) / MockBackend / SQLite · — | TEE (signs extrinsics) / TEE / chain · — | + +### One-line rule per version + +- **v0:** *"Mock backend is the authority AND broker. SQLite is the grant store."* +- **v0.1:** *"TEE is the authority AND broker. Chain is the grant store."* + +Five services, one architectural shape. + +--- + +## Backend options for the agent mailbox + +Three credible implementations. Stage 5 picks one, v0.1 picks one, the abstraction allows switching. + +| | Google Workspace DWD | AWS SES (self-hosted) | AgentMail (SaaS) | +|---|---|---|---| +| Mailbox infra | Google Workspace | Our AWS SES (own domain + MX + DKIM + Lambda) | AgentMail (also built on AWS SES under the hood — verified via DNS: `agentmail.to` MX → `inbound-smtp.us-east-1.amazonaws.com`, NS → Route 53) | +| Setup cost | ~20 min admin console (custom role + OU + SA + DWD) | 1-2 weeks code + 30 min DNS | ~5 min signup | +| Per-inbox credential | DWD JWT signing → 1h access token (exchange via `oauth2/token`) | Long-lived scoped bearer (we mint, we verify) | Scoped API key | +| Max token TTL | **1h (Google platform cap)** | **30d (our policy)** | Until revoked | +| Provisioning speed | ~60s (Workspace user replication across services) | <1ms (DB insert) | ~1s (HTTP) | +| Audit model | GCP Cloud Audit Logs (operator-controlled by Google) | Our pipeline (chain-immutable in v0.1) | AgentMail dashboard (operator-controlled by vendor) | +| Cost at 300 inboxes/mo | ~$1800 (seats) | ~$1 | pricing opaque | +| Vendor risk | Google (stable, but deprecation-prone historically) | AWS (very stable, 20yr service) | Startup (pivot / shutdown risk) | +| TEE fit in v0.1 | Open question: does Google accept TEE-attested pubkey as DWD client ID? | TEE holds AWS IAM creds (standard sealed-credential pattern) | TEE holds API key (simple) | +| Chain-immutable audit alignment | ❌ | ✅ | ❌ | + +### Decision + +**SES as primary for v0.1.** Matches chain-immutable audit model; cost scales; no foreign admin-console step after one-time DNS. + +**DWD as a documented operator variant.** For environments with an existing Workspace subscription that don't want to build/maintain SES infra. Smaller operational footprint, at the cost of 1h access-token dance + operator-controlled audit. + +**AgentMail is not shipped as a first-party backend.** The three-layer abstraction lets a customer plug in `AgentMailAuthority` if they want, but we don't own it because (a) fights our audit model, (b) vendor lock to a startup, (c) duplicates SES without advantage for our scale. + +--- + +## SES architecture, one page (full spec: `docs/spec/ses-email-architecture.md`) + +Minimal broker-not-proxy shape. Our infrastructure handles only credential minting, ingress routing, and audit. Every per-operation concern — MIME parsing, threading, labels, drafts, webhooks — lives in the daemon, not on our side. + +| Piece | What it is | +|---|---| +| **Inbox** | On-chain row `(user_wallet, agent_wallet, inbox_address)`. `inbox_address` is the email itself — e.g. `abc123@agentkeys-email.io`. No server-side message/thread/label store. | +| **Receive** | SES receipt rule drops raw MIME to `s3://agentkeys-mail///.eml`. No Lambda. No parsing. No DB writes. Per-email compute on our side: zero. | +| **Send** | Daemon mints temp SES creds (via OIDC federation), assembles MIME, calls `ses:SendRawEmail` directly. IAM role condition on `ses:FromAddress` pins the daemon to its own inbox address. AWS_SES-managed DKIM for the hosted domain. Per-send compute on our side: zero. | +| **Read** | Daemon mints temp S3 creds with `PrincipalTag/agentkeys_user_wallet` from the JWT claim. Bucket policy conditions limit the daemon to its own user's prefix. Daemon lists/gets S3 objects and parses MIME client-side. Per-read compute on our side: zero. | +| **Domain (hosted default)** | We operate `agentkeys-email.io`; MX + SPF + DMARC + AWS_SES DKIM set once on our side. User-side DNS: none. | +| **OIDC federation** | TEE-derived ES256 issuer key at `derive("oidc/issuer/v1")`. JWT claims include `agentkeys_user_wallet` for PrincipalTag isolation. See [oidc-federation](oidc-federation). | +| **Per-user isolation** | `aws:PrincipalTag/agentkeys_user_wallet` conditions on both the S3 bucket policy (read) and IAM role / SES identity policy (send). One bucket + one role, cryptographic per-user separation. See [tag-based-access](tag-based-access). | +| **Audit** | Every credential mint emits an on-chain extrinsic: `(child, operation, inbox_address, ts)`. No per-operation audit on our side — CloudTrail is the AWS-side record; our chain log is the attribution-per-agent record. | + +**Why SES wins over Gmail DWD for Stage 6:** +- Per-user isolation via PrincipalTag; one bucket/role scales to every user without per-user IAM state. +- 30-day AgentKeys session-token model matches policy; no 1-hour Google-access-token dance. +- Cost ~10–100× cheaper at agent scale. +- **No static cloud credentials at rest.** SES/S3 access comes from OIDC federation; the TEE's ES256 key generalizes to GCP/Azure/Ali Cloud/K8s via the same [oidc-federation](oidc-federation). + +--- + +## How we differ from AgentMail + +AgentMail is a SaaS running on AWS SES. They proxy per-operation: agents call their API, their servers parse MIME, compute threads, manage drafts/labels/webhooks on the client's behalf. Their compute cost scales with user operation frequency. + +We use the same underlying SES primitives (inbound to S3, SendRawEmail for outbound, domain DKIM/MX/SPF) but **do not adopt their SaaS feature surface**. Threading, labels, drafts, allow-block lists, webhook fan-out, and per-operation events are either (a) handled daemon-side via MCP, (b) deferred to the user's own tooling, or (c) absent until a real use-case forces them in. Our backend stays a credential broker and audit layer — no per-operation compute. + +Single architectural learning we kept from AgentMail: `inbox_id` is the email address string itself (saves an ID↔address lookup). Everything else — drafts, labels, webhooks, the event taxonomy, server-side threading — lives outside our backend per the broker-not-proxy principle. + +--- + +## TTL nesting + +``` +Child bearer token = 30 days (AgentKeys session-key policy, wiki/session-token.md §1) + └── EmailImpersonate grant = 30 days (our policy ceiling on email scope) + └── Email access token = 1 hour (only if DWD; SES has no short-lived token) +``` + +- **Access token** expires first (DWD: 1h; SES: N/A). Child re-requests silently from backend; no human involvement. +- **Grant** expires at 30d. Child's email ops start failing with `GRANT_EXPIRED`; master must re-approve (with Touch ID) to extend. No auto-renewal. +- **Bearer token** expires at 30d. Child re-authenticates via the standard AgentKeys re-auth path. Independent of the grant. + +The 1-hour access-token dance is a Google/DWD constraint only. With SES, our backend authorizes operations live per-call; no short-lived token minting needed. The abstraction captures this as `authority.max_token_ttl()` — DWD returns 1h, SES returns 30d, callers clamp automatically. + +--- + +## Touch ID gate (issue #11) — backend-agnostic + +The gate fires at **`GrantStore::create`** (master-side, via `approve_auth_request`). Silent everywhere else: + +| Action | Side | Gate | +|---|---|---| +| Create `EmailImpersonate` grant for a child | Master CLI | 🔒 Touch ID required | +| Change scope on an existing grant | Master CLI | 🔒 Touch ID required | +| Revoke a grant | Master CLI | 🔒 Touch ID required | +| Mint email access token (DWD path) | Child/daemon | 🔇 silent (grant-authorized) | +| Execute an email operation (list / get / send / trash) | Child/daemon | 🔇 silent | + +Same rule as `agentkeys approve` (gated) vs `agentkeys read openrouter` (silent). The biometric prompt fires in the master CLI *before* it ever talks to the backend, so the gate inherits for every `TokenAuthority` implementation. DWD, SES, AgentMail — all share the same approval ceremony. + +--- + +## Stage roadmap (2026-04-19) + +| Stage | Email backend | User setup | Status | +|---|---|---|---| +| **5 (current)** | Dedicated personal Gmail + TOTP + app password | ~10 min one-time for the developer running the demo | Ships the live OpenRouter-provision demo | +| **6 (next)** | **Hosted `xxxxx@agentkeys-email.io`** — SES + TEE-held Ed25519 DKIM + ES256 OIDC + PrincipalTag isolation | **Zero** — inbox exists the moment the agent is created | Next stage after 5a/5b | +| **7** | Adds **bring-your-own custom domain** as an advanced opt-in (same architecture, different domain in the DKIM derivation path) | One-time DNS + BIND zone import on the user's domain | After 6 | +| Advanced (post-7) | Bring-your-own Workspace DWD (existing `docs/stage5-workspace-email-setup.md` runbook) | Workspace admin console setup | Enterprise deployment path | + +### Why Stage 6 is hosted, not BYO + +The BYO path we spec'd for Stage 5 Workspace DWD takes ~1 hour of admin-console clicking, requires a Workspace subscription ($6/user/month), and gates on "is the user a Google Workspace admin". That's enterprise friction; most users aren't enterprises. + +The hosted default is zero-setup: AgentKeys operates the domain, the TEE holds the DKIM key, AWS enforces per-user isolation via PrincipalTag, and the user's agent gets a working inbox automatically. Cost on our side: pennies per user per month. See [hosted-first](hosted-first) for the full segmentation + parity argument (hosted and BYO share one architecture — migration is a cloud-account swap, not a code change). + +--- + +## 2FA considerations for the dedicated demo Gmail + +Gmail IMAP access chain: + +> **App password** → requires **2FA enabled** → requires **second factor enrolled** + +### Options for the second factor on a bot account + +| Option | Ease | Suitable for | +|---|---|---| +| **TOTP via authenticator app** (recommended) | Enroll once into Google Authenticator / Authy / 1Password / Bitwarden | Long-term, CI-compatible | +| **Personal phone SMS** | Simplest, but ties to a person | Demo-only, not CI | +| **FIDO2 hardware key** | Overkill for demo | Very high-security accounts | + +### Key insight + +Once the app password is generated at [myaccount.google.com/apppasswords](https://myaccount.google.com/apppasswords), **the demo sees zero 2FA prompts**. App passwords bypass 2FA by design — they're Google's non-interactive credential, scoped to IMAP only, revocable anytime. The 2FA dance is one-time setup only. + +--- + +## Open items / follow-ups + +- ~~**Add SES as first-class backend spec**~~ — done: `docs/spec/ses-email-architecture.md` covers the data model, receive/send pipelines, DNS setup, key derivation (§10.5), and v0/v0.1 split. `docs/spec/email-signing-backends.md` still needs its SES row folded in. +- ~~**TEE as OIDC provider**~~ — done: [oidc-federation](oidc-federation) spec covers the ES256 key derivation, AWS-docs-verified algorithm constraints, JWT shape, consumer-registration recipes, and build cost. +- **Write `docs/spec/token-authority-model.md`** — the general three-layer spec, with this email instance as one of several (alongside session tokens, credentials, pairing). +- **BYODKIM-with-TEE detailed flow** — how the TEE registers its Ed25519 DKIM pubkey with DNS (Route 53 API? attestation step?), key rotation cadence, and what happens when DNS propagation lags enclave restart. +- **OIDC-issuer hostname decision** — `oidc.agentkeys.dev` proposed. Needs DNS + Let's Encrypt cert setup once we commit. +- **OIDC `sub` claim format decision** — proposed `enclave:::agent:`; finalize once Heima TEE attestation format is confirmed. +- **v0.1 open question:** will Google accept a TEE-attested public key as a DWD client ID? If no, the TEE-native DWD path degrades to "TEE holds GCP OAuth creds and calls `iamcredentials.signJwt`" — still better than raw Backend A because the policy check is inside the enclave, but weakens the "no key in Google's hands" property. Only relevant if DWD stays as the enterprise-alternative path. (Note: with OIDC federation into GCP Workload Identity, we can skip DWD entirely for any cloud consumer that accepts OIDC — making DWD a purely Google-specific workaround if we want Gmail integration specifically.) +- **Email-2FA approval flow spec** — the #11 biometric gate doc should grow a "mobile fallback via email" section with: message templates, magic-link vs 6-digit-code trade-off, TTL (≤ 10 min), replay protection via single-use nonce, CSRF on the magic-link endpoint. +- **Decide on AgentMail**: third-class backend (plugin-only) or drop entirely. Leaning drop, given their infra is SES and our SES impl gives us the things their SaaS does not (chain audit, per-child isolation via grants, no static cloud creds). + +--- + +## Cross-references + +- **`docs/spec/ses-email-architecture.md`** — **v0.1 primary**: full SES data model, receive + send pipelines, DNS setup, key derivation (§10.5 — Ed25519 DKIM + ES256 OIDC federation), v0/v0.1 split, build plan. The "how" of the SES path. +- **[oidc-federation](oidc-federation)** — how the TEE federates into AWS/GCP/Azure/Snowflake/K8s as an OIDC identity provider; the "no static cloud creds" story that SES inherits +- `docs/spec/email-signing-backends.md` — generalized three-layer backend comparison (DWD / TEE; needs SES row folded in) +- `docs/spec/credential-backend-interface.md` — existing `CredentialBackend` trait we're extending +- `docs/stage5-workspace-email-setup.md` — Workspace DWD operator runbook (alternative enterprise path) +- `docs/manual-test-stage5.md` §1 — Stage 5 demo entry point (uses the dedicated-Gmail path; will migrate to SES once built) +- [session-token](session-token) — 30-day session-key policy inherited for email grants +- [blockchain-tee-architecture](blockchain-tee-architecture) — stateless-TEE-plus-chain rationale (this spec inherits) +- [key-security](key-security) — two-tier storage model +- **AgentMail primary sources (for reference)**: + - `github.com/agentmail-to/agentmail-schemas` — their public Zod schemas (where we verified `AWS_SES | BYODKIM` and the event taxonomy) + - `github.com/agentmail-to/agentmail-mcp` — MCP tool surface (`get_message`, `send_message`, `reply_to_message`, …) + - `docs.agentmail.to/custom-domains` — custom-domain DNS flow (BIND zone download pattern) + - DNS evidence: `dig +short MX agentmail.to` → `10 inbound-smtp.us-east-1.amazonaws.com` (confirms AWS SES infra) +- Issue [#11](https://github.com/litentry/agentKeys/issues/11) — biometric gate for master CLI high-security actions +- Issue [#10](https://github.com/litentry/agentKeys/issues/10) — bearer-token terminology + diff --git a/wiki/hosted-first.md b/wiki/hosted-first.md new file mode 100644 index 0000000..36421b7 --- /dev/null +++ b/wiki/hosted-first.md @@ -0,0 +1,239 @@ +--- +title: "Hosted-First vs Bring-Your-Own — User Segmentation" +tags: ["hosted", "byo", "bring-your-own", "user-segmentation", "non-developer", "onboarding", "agentkeys-email.io", "default"] +created: 2026-04-19T10:07:18.478Z +updated: 2026-04-19T10:07:18.478Z +sources: [] +links: ["email-system.md", "tag-based-access.md", "knowledge-storage.md", "oidc-federation.md"] +category: decision +confidence: medium +schemaVersion: 1 +--- + +# Hosted-First vs Bring-Your-Own — User Segmentation + +# Hosted-First vs Bring-Your-Own — User Segmentation + +**Status:** decision (2026-04-19) +**Scope:** how AgentKeys onboards non-developer users vs enterprise / advanced users across email, knowledge base, and OIDC identity. + +--- + +## TL;DR + +> **Default path: AgentKeys-hosted.** Non-developer users get throwaway identities on our domains (e.g. `xyz123@agentkeys-email.io`) with zero setup. No DNS, no admin console, no Workspace subscription, no custom domain. +> +> **Advanced path: Bring-Your-Own.** Enterprises or power users who already run a Workspace domain / custom GitHub org / corporate AWS account plug those in through the same architecture, same trait interfaces, same ephemeral-credential minting. **Deferred to post-Stage 7.** + +The split exists because onboarding friction is the dominant user-acquisition cost for non-developers, and DNS / admin-console steps are exactly the kind of friction most users bounce off. + +--- + +## User segments and the default that fits each + +| Segment | Default email | Default knowledge base | OIDC provider | +|---|---|---|---| +| **Non-developer, first-time user** (our primary v0.1 target) | `xyz123@agentkeys-email.io` — we host, we own the domain, agent fully controls the inbox | AgentKeys-hosted on AWS S3 (non-China) or Ali Cloud OSS (China), both bucketed under our accounts | AgentKeys' OIDC issuer (our domain, our TEE); user doesn't configure anything | +| **Developer, technical user** | Same hosted default available; may upgrade to GitHub-repo-as-knowledge-base when useful | **GitHub App installation** into their own repos (preferred) | Same hosted OIDC (except for cases where they want to integrate with their own CI) | +| **Enterprise buyer** (Stage 7+) | Bring own Workspace domain; federated via the existing DWD or custom-OIDC path | Bring own AWS/GCP org; we federate into their identity pool | Possibly operate their own OIDC issuer trusted by their cloud | +| **Chinese non-developer** | Same hosted email (or a `.cn` variant we operate if data-residency requires) | Hosted Ali Cloud OSS | Hosted OIDC (potentially with Chinese-region issuer) | + +Every segment uses the same architecture; the only variable is **whose resources host the primitives**. AgentKeys's by default; the user's by upgrade path. + +--- + +## Why hosted-first + +### The non-developer's experience + +Compare the two onboarding flows we could offer today: + +**BYO (what Stage 5's Workspace DWD doc described):** +1. Sign up for Google Workspace ($6/user/month, credit card) +2. Verify domain ownership via DNS TXT record (~30 min for DNS propagation) +3. Super-admin logs into admin console +4. Create custom admin role with 4 specific privileges +5. Create `/Automation` OU +6. Assign custom role to `agent@yourdomain` scoped to OU +7. Create GCP project, enable APIs +8. Create service account +9. Authorize domain-wide delegation in admin console (super-admin action again) +10. Generate service account key JSON +11. Store the JSON in secret manager +12. Export env vars and run agent + +~1 hour of admin-console clicking for someone who's never seen Workspace's admin UI. Hard-blocking on "is the user a Google Workspace admin?" + +**Hosted default:** +1. Sign in to AgentKeys +2. Agent is created; it has inbox `xyz123@agentkeys-email.io` +3. Go. + +**10 minutes vs 10 seconds.** The hosted-first posture takes the second path as the default for the 95% of users who aren't enterprise buyers. + +### What hosting costs us (and why it's fine) + +On the AgentKeys side, operating `agentkeys-email.io` means: + +- One domain registration (~$15/year) +- SES domain verification (one-time) +- MX record to SES inbound (one record) +- DKIM / SPF / DMARC (three records; Ed25519 DKIM key derived from TEE master seed per [email-system](email-system)) +- SES receipt rule writing every inbound to S3 `agentkeys-mail///` +- S3 bucket policy conditions on `aws:PrincipalTag/agentkeys_user_wallet` per [tag-based-access](tag-based-access) + +Cost scales with mail volume, not user count: + +| User count | Monthly inbound emails | Monthly cost (approx) | +|---|---|---| +| 1,000 | 10,000 | < $5 | +| 10,000 | 100,000 | < $50 | +| 100,000 | 1,000,000 | < $500 | + +Per-user cost: **fractions of a cent per month** at realistic scale. No per-seat cost, no per-inbox fee. Dramatically cheaper than asking each user to buy a Workspace seat. + +The cost profile is exactly why we chose SES over Gmail DWD as the default — we control the infrastructure, our costs are AWS-native, and the scaling curve is flat. + +--- + +## What hosted-first covers (Stage 6) + +### Email — `xyz123@agentkeys-email.io` + +- Inbox ID is an address under our domain, allocated deterministically at agent-create time (e.g. derived from `agent_wallet`) +- Inbound mail via SES MX → S3 drop to user-prefixed path +- Outbound via SES `SendRawEmail`, signed with TEE-held Ed25519 DKIM key for `agentkeys-email.io` +- Per-user isolation enforced by AWS via PrincipalTag from JWT claim `agentkeys_user_wallet` +- No DNS on user's side. No admin console. Nothing. + +### Knowledge base — deferred per [knowledge-storage](knowledge-storage) + +When we commit, the hosted-default will be: + +- **Non-Chinese non-dev users**: our S3 bucket, per-user prefix, PrincipalTag isolation +- **Chinese non-dev users**: our Ali Cloud OSS bucket, equivalent isolation +- **Developer users**: option to plug their own GitHub (via GitHub App install), falling back to hosted S3 if they skip + +Same broker-not-proxy architecture. Same TEE-derived keys. Same chain audit. Hosted vs BYO is only about *which cloud account the storage bucket sits in*. + +### OIDC identity — our issuer + +- `https://oidc.agentkeys.dev` with a TEE-derived ES256 key +- Every cloud consumer trusts this once (per [oidc-federation](oidc-federation)) +- User never registers an OIDC provider; they inherit ours +- Every user's JWT mints temp creds scoped to their wallet via PrincipalTag + +--- + +## What's deferred (Stage 7+) + +### Bring-your-own email domain + +- User's own domain (`bots.theircompany.com`) verified in our SES +- Still TEE-held DKIM key, different derivation path per custom domain (`derive("dkim/theircompany.com/v1")`) +- User configures DNS once (MX, DKIM CNAMEs, DMARC) +- Same per-user isolation + chain audit +- Deferred because: + - Most v0.1 users don't own a domain they want agents using + - The operator-side complexity (managing an unbounded domain list) is non-trivial + - The current Workspace DWD runbook at `docs/stage5-workspace-email-setup.md` is a partial blueprint for the enterprise variant + +### Bring-your-own Workspace / GCP + +- User operates their own Google Workspace; we federate via DWD or OIDC +- See existing docs in `docs/stage5-workspace-email-setup.md` — still valid for this advanced path, just not the default anymore +- Deferred to Stage 7+ once the hosted default has paying users + +### Bring-your-own GitHub organization + +- User's own GitHub org where the agent has its own repo for memory +- Our GitHub App installation into their org (we author the app; they install) +- Per-installation token scoping +- Deferred to Stage 7+ as a developer-only advanced backend + +### Bring-your-own AWS / Ali Cloud account + +- User's AWS/Ali account hosts the S3/OSS bucket for their agent's knowledge base +- Our OIDC provider federates into their IAM role +- User configures one role trust policy in their account +- Deferred to Stage 7+ for enterprises with data-residency requirements + +--- + +## Parity guarantee: hosted and BYO share one architecture + +Every element listed in the "advanced" column above can be reached from the "hosted default" column by **swapping the cloud account, not rewriting the code**. The handler categories (OIDC federation, app-level signing, static-key unwrap) are identical; only the trust configuration on the remote side differs. + +Concretely: + +| Aspect | Hosted default | BYO advanced | What changes | +|---|---|---|---| +| DKIM key source | TEE master seed `derive("dkim/agentkeys-email.io/v1")` | TEE master seed `derive("dkim//v1")` | Just the derivation path | +| OIDC issuer | `https://oidc.agentkeys.dev` (ours) | Same (we run it; our cert) | Nothing | +| Federation target (AWS S3) | Our AWS account, role in our account | User's AWS account, role in their account, our OIDC trusted there | Trust policy in user's account | +| Isolation mechanism | `aws:PrincipalTag/agentkeys_user_wallet` on our bucket | Same in their bucket | Policy author | +| Chain audit | Same | Same | Nothing | + +**A user migrating from hosted to BYO doesn't change their agent code, their grants, their credentials, or their audit trail.** They change the cloud account that hosts their bucket, period. Clean migration story. + +--- + +## Alignment with the three architectural rules + +From `wiki/blockchain-tee-architecture.md` (repo wiki), cross-referenced here to make sure hosted-first doesn't violate anything: + +| Rule | Hosted-first posture | Preserved? | +|---|---|---| +| **#1 Chain stores everything persistent** | Every grant, inbox-create, credential-mint is on chain. Hosted vs BYO doesn't affect this. | ✓ | +| **#2 TEE holds all private keys** | DKIM key, OIDC issuer key, GitHub App key — all derived from TEE master seed. Hosted-first makes this LITERALLY true end-to-end (we don't hold third-party Workspace/SA keys at rest). | ✓ *strengthened* | +| **#3 Clients hold only bearer tokens** | Daemon holds 30-day AgentKeys bearer + short-lived minted creds from our broker. Same in hosted or BYO mode. | ✓ | +| **#4 (proposed) Credential broker, not operation proxy** | Daemon talks to SES / GitHub / S3 directly via MCP using minted creds. Our backend mints; does not proxy. Hosted mode doesn't change this. | ✓ | +| **Per-user isolation** | Enforced by AWS PrincipalTag from JWT claim. Single shared bucket, hard-walled per-user prefix. See [tag-based-access](tag-based-access). | ✓ | + +Hosted-first strictly strengthens rule #2: by operating our own infrastructure, we never need to hold a user's own Google SA key or Workspace admin credential. The TEE's inventory stays clean. BYO re-introduces (optional) trust on user-side admin steps but doesn't compromise anything on our side. + +--- + +## Attacker surface + +Relative to the BYO path: + +| Attack | Hosted default (our infra) | BYO (user infra) | Net change | +|---|---|---|---| +| Attacker compromises TEE | All users' creds compromised (same in both modes) | Same | None | +| Attacker compromises our AWS account | All users' S3 prefixes accessible to attacker | Only AgentKeys' domain/infra, user's own AWS untouched | **Hosted is worse** — per-user isolation via PrincipalTag still holds cryptographically but attacker has root-account keys | +| Attacker compromises one user's bearer token | Can impersonate that user for ≤30 days (until grant revoke or bearer expire) | Same | None | +| Attacker spoofs email from another user's inbox | Blocked by SES receipt rule + DKIM + our bucket-policy PrincipalTag | Same (on user's infra) | None | +| Regulator demands access to mail | Our legal team responds; user is subject to our jurisdiction | User's legal team responds; they control disclosure | **Hosted is worse** for users with strict regulatory or adversarial environments | + +Mitigations for the "hosted is worse" rows: + +- **AWS account compromise** — tight IAM boundary around the SES/S3 stack (isolated AWS account, limited IAM users, SCPs restricting dangerous actions, CloudTrail → our chain audit). Rule #2 still applies to the sensitive keys (they live in TEE), so account compromise gives attacker *AWS-level* access to user prefixes but no long-lived AgentKeys identity. +- **Regulatory / jurisdictional** — users with this concern are exactly the "advanced / enterprise" segment who should go BYO anyway. We offer BYO as the opt-out. + +The tradeoff is correct for the user segment: non-dev users accept "trust AgentKeys" as the onboarding floor; power users who don't can opt out via BYO. + +--- + +## Stage mapping + +- **Stage 5 (current)** — Quick email demo via dedicated personal Gmail. Proves the provisioner end-to-end before we build real infra. +- **Stage 6 (next)** — Federated Own Email: hosted `agentkeys-email.io`, SES + PrincipalTag + TEE-held DKIM, available to all users without setup. +- **Stage 7** — Generalized OIDC Provider: the federation pattern exposed publicly; accepts external consumers; enables the BYO AWS / GCP / GitHub / etc. advanced paths. +- **Later** — BYO custom-domain email, BYO Workspace DWD (existing `docs/stage5-workspace-email-setup.md` becomes the runbook), BYO GitHub org, enterprise SSO integration. + +See `docs/spec/plans/development-stages.md` for the authoritative stage list. + +--- + +## Cross-references + +- [email-system](email-system) — how SES + hosted `agentkeys-email.io` works end-to-end +- [oidc-federation](oidc-federation) — the federation pattern the hosted OIDC provider exposes +- [tag-based-access](tag-based-access) — how one bucket safely holds all users' memories +- [knowledge-storage](knowledge-storage) — the per-segment storage backend options +- `docs/spec/ses-email-architecture.md` — spec for the SES-backed hosted path +- `docs/stage5-workspace-email-setup.md` — the BYO Workspace runbook (advanced, deferred) +- `wiki/blockchain-tee-architecture.md` (repo) — the three rules hosted-first preserves +- `docs/spec/plans/development-stages.md` §Stage 6 — federated own email stage plan + diff --git a/wiki/knowledge-storage.md b/wiki/knowledge-storage.md new file mode 100644 index 0000000..373964a --- /dev/null +++ b/wiki/knowledge-storage.md @@ -0,0 +1,195 @@ +--- +title: "Knowledge Base Storage Options — Deferred Decision Matrix" +tags: ["knowledge-base", "storage", "deferred", "github", "s3", "google-drive", "alicloud", "jurisdictional", "user-segmentation"] +created: 2026-04-19T10:06:02.153Z +updated: 2026-04-19T10:06:02.153Z +sources: [] +links: ["email-system.md", "tag-based-access.md", "blockchain-tee-architecture.md", "oidc-federation.md", "hosted-first.md"] +category: decision +confidence: medium +schemaVersion: 1 +--- + +# Knowledge Base Storage Options — Deferred Decision Matrix + +# Knowledge Base Storage — Deferred Decision Matrix + +**Status:** deferred (2026-04-19) +**Scope:** which backend(s) AgentKeys will support as the agent's "memory" / knowledge-base storage layer once we commit. +**Why deferred:** no single choice fits all users; the best backend depends on developer/non-developer split and jurisdictional constraints (China vs non-China). Decision is held open so we don't prematurely commit infrastructure before user segments are clearer. + +> **Important architectural point.** Whichever backend(s) we eventually pick, the shape is identical under AgentKeys' credential-broker thesis: **storage is just another credential type in the vault, operations happen client-side via MCP, our backend never proxies reads/writes.** See [email-system](email-system) for the analogous email path that informed this decision and the broker-not-proxy principle. + +--- + +## The four candidates + +| Backend | Primary user fit | Auth pattern | Handler category | Per-user isolation primitive | +|---|---|---|---|---| +| **GitHub** (repos as knowledge store) | **Developers** — docs-as-code, markdown, PR review | GitHub App (installation tokens, 1h auto-rotating) | App-level signing (our derived ECDSA key signs app JWTs) | GitHub App installed per-repo; installation_id scopes access | +| **AWS S3** | **Non-Chinese non-technical users** | OIDC federation via our TEE (no per-user cred storage) + bucket policy + [tag-based-access](tag-based-access) | OIDC federation | `aws:PrincipalTag/agentkeys_user_wallet` in bucket policy | +| **Google Drive** (shared drives) | Non-Chinese Workspace users who already live in Google | Workload Identity Federation + shared drive membership | OIDC federation | Per-user shared drive; user adds our SA as member | +| **Ali-Cloud OSS** (Alibaba Object Storage) | **Chinese non-technical users** (regulatory + latency) | OIDC federation via Alibaba RAM OIDC provider | OIDC federation | RAM role condition on `oidc:sub` matching wallet pattern | + +All four fit the same architectural shape — **just four different handler routes inside the Authority**, dispatching the ephemeral credential to a different remote service. The daemon uses the resulting credentials directly via MCP, talking to the vendor's API. Our infrastructure does zero per-operation compute. + +--- + +## Why the segmentation matters + +### Developer vs non-developer + +- **Developers** (our early adopters) live in Git already. Agents storing their memory as commits, branches, and PRs matches how developers think about state. A markdown file in a GitHub repo is the most familiar knowledge primitive in that world. **→ GitHub preferred.** + +- **Non-developers** don't use Git and wouldn't recognize a repo as a knowledge base. They expect files and folders. Blob storage (S3, OSS) presented as files through a simple MCP feels native. **→ Object storage preferred.** + +### Non-Chinese vs Chinese users + +China's regulatory environment has repeatedly restricted cross-border data transfer and penalized unregistered international cloud use. Storing Chinese users' memory in AWS S3 creates compliance risk for the user and operational risk for us (potential blocked traffic, forced migrations later). + +- **Non-Chinese users**: AWS S3 in `us-east-1` or `eu-west-1`. Low latency, mature tooling, our OIDC federation already lands there. +- **Chinese users**: Alibaba Cloud OSS in `cn-hangzhou` or `cn-beijing`. Ali Cloud RAM supports OIDC providers in the same way AWS IAM does (we register our OIDC issuer URL; they accept JWTs). Latency inside China is acceptable; compliance handled. + +The two buckets share every architectural element except the cloud provider's URL. Our TEE's ES256 OIDC-issuer key federates into both. + +--- + +## One-line user pick + +A simple selection rule the onboarding flow can use: + +| If user is... | Default backend | Why | +|---|---|---| +| Developer (any region) | GitHub | Matches their mental model; per-repo scoping via GitHub App | +| Non-developer, not in China | AWS S3 | OIDC federation, PrincipalTag isolation, mature | +| Non-developer, in China | Alibaba Cloud OSS | Same architecture as AWS, compliant jurisdiction | +| Has custom preference / enterprise deal | (advanced) — their choice; plug-in as another handler | Supported via any of the four, or a custom fifth | + +No forced choice. Each user flows into the path that matches their existing operational reality. + +--- + +## Per-option architectural detail + +### GitHub (developers) + +``` +TEE master seed + └── derive("github-app/v1") → ECDSA P-256 (GitHub App signing key) + +Flow: + 1. We register a "AgentKeys Memory" GitHub App one-time + 2. User installs app into specific repo(s) they want agents to use + 3. Agent: daemon asks TEE for GitHub installation token + → TEE signs app-level JWT with derived ECDSA key + → Calls POST /app/installations//access_tokens + → Returns installation token (1h) to daemon + 4. Daemon uses token with GitHub MCP server (existing in MCP ecosystem) + 5. Chain audit extrinsic at mint time +``` + +Per-user isolation: each user's repo installs are separate `installation_id`s. One installation = one user's repo scope. GitHub enforces at installation level. + +### AWS S3 (non-Chinese non-devs) + +``` +TEE master seed + └── derive("oidc/issuer/v1") → ES256 (reused from email path) + +Flow: + 1. We provision S3 bucket agentkeys-memory (region per operator choice) + 2. Bucket policy: + - Allow s3:* on arn:aws:s3:::agentkeys-memory/${aws:PrincipalTag/agentkeys_user_wallet}/* + - Deny everything else + 3. Agent: daemon asks TEE for S3 temp creds + → TEE signs OIDC JWT with claims {agentkeys_user_wallet: 0xABC} + → sts:AssumeRoleWithWebIdentity → session tags from JWT claim + → Returns temp AWS creds (1h) to daemon + 4. Daemon uses creds to call S3 directly (aws-sdk or MCP) + 5. Chain audit extrinsic at mint time +``` + +Per-user isolation: `${aws:PrincipalTag/agentkeys_user_wallet}` expands to the JWT-claim value AWS STS mapped to a session tag. User A cannot read user B's prefix because the condition fails. Single role, N users, hard-walled. See [tag-based-access](tag-based-access) for the full mechanics. + +### Google Drive (alternative for Workspace users) + +``` +Same ES256 OIDC key; different cloud consumer. + +Per-user isolation: per-user shared drive. User creates their agent's +shared drive; adds our SA as a member; drive ID becomes the scope boundary. +``` + +### Alibaba Cloud OSS (Chinese non-devs) + +``` +TEE master seed + └── derive("oidc/issuer/v1") → ES256 (same key, federating into Ali RAM) + +Ali RAM supports external OIDC identity providers just like AWS IAM does: + 1. Register our OIDC issuer URL in RAM (one-time, per region) + 2. Create a RAM role with condition: oidc:sub matches "enclave:*:agent:*" + and oidc:aud = sts.aliyuncs.com + 3. Role policy: s3-equivalent OSS permissions on per-user prefix + 4. Agent flow: TEE signs JWT → AssumeRoleWithOIDC → temp OSS creds + +Per-user isolation: OSS bucket policies condition on oidc:sub prefix, +same mechanism as AWS PrincipalTag, different attribute mapping name. +``` + +--- + +## What this means for our codebase + +The credential vault picks up one more handler category for each backend we eventually ship: + +| Handler | Long-lived TEE material | Ephemeral output | Remote service | +|---|---|---|---| +| App-level signing (GitHub) | Derived ECDSA app key at `derive("github-app/v1")` | GitHub installation token (1h) | GitHub API | +| OIDC federation (AWS S3) | Derived ES256 issuer key at `derive("oidc/issuer/v1")` | AWS temp creds (≤1h) with session tag | S3 API | +| OIDC federation (GCP Drive) | Same ES256 key | GCP SA token (~1h) | Drive API | +| OIDC federation (Ali Cloud OSS) | Same ES256 key | Ali STS token (~1h) | OSS API | + +All four handlers drop into the existing `TokenAuthority::execute(op)` dispatcher — no architectural branching. When we eventually commit, adding a backend is ~1 week of work to: + +1. Write the handler +2. Register the remote cloud's trust for our OIDC provider (or register a GitHub App) +3. Add operator runbook for the backend's DNS / policy setup +4. Validate per-user isolation via tag-based conditions + +--- + +## Triggers that would move us off "deferred" + +Commit to a backend when any of these fire: + +- First paying customer in one of the user segments → build their preferred backend +- Regulatory need forces early commitment (e.g. enterprise buyer requires S3 in a specific region) +- An agent use case emerges where storage shape materially matters (e.g. "agent needs rich metadata / search across documents" → S3 or OSS favored; "agent needs PR-based review workflow" → GitHub favored) +- We see demand for cross-backend migration tooling + +Until one of those fires, the deferred state is fine. The credential-broker architecture already accommodates every candidate — we're only deferring *which handler(s) to build first*, not any architectural decision. + +--- + +## Alignment with [blockchain-tee-architecture](blockchain-tee-architecture) + +This deferred decision does not violate any of the three rules: + +- **Rule #1** (chain is source of truth): every knowledge-base grant is an on-chain extrinsic regardless of which backend; every credential mint is an on-chain audit event. +- **Rule #2** (TEE holds all private keys): whether it's the GitHub App ECDSA key, the ES256 OIDC-issuer key, or anything else, every long-lived key is derived from the TEE master seed and never extracted. +- **Rule #3** (clients hold only bearer tokens): the daemon holds a short-lived GitHub installation token / AWS temp creds / OSS STS token — all ≤1h, no long-lived secrets, same posture as existing credentials. + +The **broker-not-proxy** principle (Rule #4 candidate) applies end-to-end: our backend mints credentials, the daemon uses them to talk to the vendor directly via MCP, we never run `read_document` / `write_document` on the user's behalf. + +--- + +## Cross-references + +- [email-system](email-system) — the same deferred-decision pattern landed for email with SES as the v0.1 default +- [oidc-federation](oidc-federation) — how our OIDC provider federates into AWS, GCP, Ali Cloud, any compliant cloud +- [tag-based-access](tag-based-access) — how AWS (and equivalent-mechanism clouds) enforce per-user isolation on shared buckets via JWT-claim-derived session tags +- [hosted-first](hosted-first) — the broader user-segmentation principle this decision follows +- `wiki/blockchain-tee-architecture.md` (repo) — the three architectural rules all candidates preserve +- Issue [#11](https://github.com/litentry/agentKeys/issues/11) — biometric gate (applies to each backend's grant creation) + diff --git a/wiki/oidc-federation.md b/wiki/oidc-federation.md new file mode 100644 index 0000000..c12733e --- /dev/null +++ b/wiki/oidc-federation.md @@ -0,0 +1,326 @@ +--- +title: "TEE as OIDC Identity Provider — Universal Federation Pattern" +tags: ["oidc", "iam", "tee", "federation", "authentication", "aws", "gcp", "architecture", "no-static-secrets"] +created: 2026-04-19T05:55:11.690Z +updated: 2026-04-19T05:55:11.690Z +sources: [] +links: ["email-system.md", "blockchain-tee-architecture.md", "session-token.md", "key-security.md"] +category: architecture +confidence: medium +schemaVersion: 1 +--- + +# TEE as OIDC Identity Provider — Universal Federation Pattern + +**Status:** design (2026-04-19) +**Scope:** how AgentKeys' TEE becomes a conforming OpenID Connect identity provider, letting the TEE's sealed signing key federate into any cloud that accepts external OIDC (AWS, GCP, Azure, Snowflake, Kubernetes, …) with no static secrets stored anywhere in AgentKeys. +**Companion specs:** `docs/spec/ses-email-architecture.md`, `docs/spec/email-signing-backends.md` +**Related wiki:** [email-system](email-system), [tag-based-access](tag-based-access), [hosted-first](hosted-first), [knowledge-storage](knowledge-storage), [blockchain-tee-architecture](blockchain-tee-architecture) + +--- + +## TL;DR + +**Architectural property delivered:** *"No static cloud credentials anywhere in AgentKeys infrastructure."* + +We expose the TEE as a conforming OIDC identity provider at `https://oidc.agentkeys.dev` (or similar). The TEE holds one ES256 (ECDSA P-256) signing key, derived deterministically from the TEE master seed. Every cloud we integrate with — AWS, GCP, Azure, Snowflake, Kubernetes, and anything else that speaks OIDC federation — trusts this provider once; from then on, every credential for that cloud is minted on-demand, bound to a fresh attestation, and typically lives ≤1 hour. + +One JWKS, one issuer URL, N consumers. We never hold AWS access keys, GCP service-account JSON, Azure client secrets, or anything similar at rest. Ever. + +--- + +## Why this is the right generalization + +Every major cloud accepts external OIDC tokens for workload federation. They standardized on the same model independently: + +| Consumer | Federation primitive | Max session | +|---|---|---| +| **AWS IAM** | `CreateOpenIDConnectProvider` + `sts:AssumeRoleWithWebIdentity` | 12 h | +| **GCP Workload Identity Federation** | `iam.googleapis.com/.../workloadIdentityPools/.../providers/...` | 1 h (default) | +| **Azure AD Workload Identity Federation** | Managed identity with federated credential trust | 1 h | +| **Snowflake External OAuth** | `OAUTH_ISSUER` + `OAUTH_JWS_KEYS_URL` | configurable | +| **Kubernetes ServiceAccount projection** | `--oidc-issuer-url` kube-apiserver flag | 1 h | +| **GitLab/Terraform/Jenkins/CircleCI external identity** | Generic OIDC federation APIs | minutes to hours | + +They all accept the same input: a JWT signed by a key listed in our JWKS, with standard claims (`iss`, `sub`, `aud`, `exp`, `iat`). Build once, integrate many. + +--- + +## Key requirements — verbatim from AWS docs + +AWS is the strictest of the consumers, so meeting AWS's requirements makes us compatible with the rest. + +From [`IAM/id_roles_providers_create_oidc.html`](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html) and [`STS/API_AssumeRoleWithWebIdentity.html`](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRoleWithWebIdentity.html): + +| Requirement | Constraint | Verbatim source | +|---|---|---| +| Issuer URL | Must begin with `https://`; path components allowed; **no query parameters**; **no port number**; case-sensitive | "URL must begin with https://" + "path components are allowed but query parameters are not" | +| Discovery endpoint | Must serve `/.well-known/openid-configuration` JSON per OIDC standard | "Add /.well-known/openid-configuration to the end of your OIDC identity provider's URL" | +| JWKS | Referenced via `jwks_uri` in discovery doc; max **100 RSA + 100 EC keys** per provider | "must contain at least one key and can have a maximum of 100 RSA keys and 100 EC keys" | +| **Signing algorithms** | **`RS256, RS384, RS512, ES256, ES384, ES512`** only. **Ed25519 / EdDSA is NOT accepted.** | "Tokens must be signed using either RSA keys (RS256, RS384, or RS512) or ECDSA keys (ES256, ES384, or ES512)" | +| TLS certificate | AWS uses its trusted-root-CA library first; falls back to thumbprint only when cert isn't public-CA-signed or TLS 1.3 is required | "AWS secures communication with OIDC identity providers (IdPs) using our library of trusted root certificate authorities (CAs) to verify the JSON Web Key Set (JWKS) endpoint's TLS certificate" | +| Required claims | `iss` (must match provider URL), `aud`, `iat`, `sub` | "Claims must include a value for iat that represents the time that the ID token is issued" | +| Token max size | 20,000 characters | "Maximum length of 20000" | +| Temp-cred duration | 900s (15m) – 43200s (12h); default 3600s (1h) | "The value can range from 900 seconds (15 minutes) up to the maximum session duration setting" | + +### Critical finding: key algorithm + +**AWS does not accept Ed25519 for OIDC federation.** The docs explicitly list RSA (RS256/384/512) and ECDSA (ES256/384/512) only. This forces the OIDC-issuer key to be **ECDSA P-256 (ES256)**, regardless of what we use for DKIM or other purposes. + +ES256 is deterministically derivable from the TEE master seed via SLIP-0010 (the BIP-32 extension that covers secp256r1 / P-256 / NIST P-256). The same derivation mechanism the TEE already uses for custodial wallet keys. Performance: ~50 µs sign, ~150 µs verify. Sub-millisecond. + +--- + +## Design: two derived key families, two purposes + +``` +TEE master seed (sealed; one per enclave; disaster-recovery root) + ├── derive("dkim//") → Ed25519 (DKIM signing, per custom domain) + │ RFC 8463; we control the recipient side + │ Ed25519 is our choice (fast, clean) + │ + └── derive("oidc/issuer/") → ES256 (OIDC-issuer JWT signing, singleton) + Forced by AWS/GCP/Azure OIDC specs + One key serves ALL cloud consumers +``` + +Both algorithms deterministically derivable. Both sealed inside the TEE. Both rotated via path-version bump. Different algorithms because different protocols demand them; the derivation pipeline is uniform. + +--- + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ TEE ENCLAVE │ +│ │ +│ Master seed (sealed) │ +│ └── derive("oidc/issuer/v1") → ES256 private key (never leaves) │ +│ │ +│ mint_oidc_jwt(claims) → │ +│ 1. verify caller session │ +│ 2. check grant authorizes this audience │ +│ 3. build JWT { iss, sub, aud, exp, iat, agentkeys_* claims } │ +│ 4. sign with ES256 key │ +│ 5. emit on-chain audit extrinsic │ +│ 6. return JWT │ +└─────────────────────────────────────────────────────────────────────┘ + │ + │ ES256 JWT (5 min TTL typical) + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ HTTPS proxy (thin, stateless, static content) │ +│ │ +│ Serves three static endpoints with Let's Encrypt cert: │ +│ │ +│ https://oidc.agentkeys.dev/ │ +│ /.well-known/openid-configuration → static JSON │ +│ /.well-known/jwks.json → static JWKS (ES256 pubkey)│ +│ │ +│ Does NOT hold any private key. Compromise = attackers see public │ +│ keys only (useless). TEE remains the only signer. │ +└─────────────────────────────────────────────────────────────────────┘ + │ + │ trusted once per consumer + ▼ +┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ +│ AWS IAM │ │ GCP Workload │ │ Azure / Snow / │ +│ OIDC Provider │ │ Identity Fed │ │ K8s / etc. │ +│ │ │ │ │ │ +│ sts:AssumeRole- │ │ federated │ │ (each cloud's │ +│ WithWebIdentity │ │ credentials │ │ OIDC endpoint) │ +└──────────────────┘ └──────────────────┘ └──────────────────┘ + │ │ │ + ▼ ▼ ▼ + temp creds ≤12h temp creds ≤1h temp creds ≤varies + │ │ │ + └────────────────────┴─────────────────────┘ + │ + ▼ + agent / daemon uses temp creds + for actual service API calls (SES, etc.) +``` + +The TEE mints one JWT; the consumer mints temp creds for its own service. Credentials live minutes to hours, always tied to a fresh TEE-minted JWT. + +--- + +## JWT shape + +```json5 +{ + "iss": "https://oidc.agentkeys.dev", + "sub": "enclave:::agent:", + "aud": "sts.amazonaws.com", // or "//iam.googleapis.com/..." for GCP, etc. + "iat": 1745000000, + "exp": 1745000300, // 5 minutes + "nbf": 1745000000, + + // AgentKeys-specific claims — consumer's trust policy can condition on these + "agentkeys_attested_at": "2026-04-19T00:00:00Z", + "agentkeys_enclave_tier": "production", + "agentkeys_child_wallet": "0x...", // which child the credential is for + "agentkeys_grant_id": "grant_...", // which grant authorized this mint + "agentkeys_operation": "ses.send", // what op the credential will be used for +} +``` + +`sub` is formatted to uniquely identify *which enclave* + *which agent inside it* requested the token. Consumer's trust policy can condition on `sub` patterns (*"only enclaves matching mrenclave=X can assume this role"*) and on `agentkeys_*` claims (*"only when operation=ses.send"*). + +Claims are auditable on-chain via the audit extrinsic emitted at mint time — every OIDC JWT minted has a corresponding on-chain event. + +### The `agentkeys_user_wallet` claim is load-bearing for per-user isolation + +The `agentkeys_user_wallet` claim is **the key that unlocks per-user isolation on shared cloud resources**. Through AWS `sts:TagSession` (or GCP attribute mapping, or Ali RAM OIDC conditions), this claim surfaces in the assumed role's session as a PrincipalTag. Resource policies (S3 bucket policies, IAM role conditions, GCS IAM conditions) then condition on the tag to enforce "this session can only touch this user's prefix." + +**One bucket, N users, cryptographic separation via the tag. One role, N users, one trust policy.** See [tag-based-access](tag-based-access) for the full mechanics — it's the companion pattern that makes this OIDC provider safe to point at a shared resource. + +Without the tag claim, we'd either need one IAM role per user (doesn't scale past a few thousand) or one bucket per user (expensive, quota-limited) or our backend proxying every operation (violates Rule #4 broker-not-proxy). With the tag claim, none of those compromises are needed. + +--- + +## Consumer-registration recipes + +### AWS IAM + +```bash +aws iam create-open-id-connect-provider \ + --url https://oidc.agentkeys.dev \ + --client-id-list sts.amazonaws.com \ + --thumbprint-list '' # omit if using public-CA cert (Let's Encrypt etc.) + +# Trust policy on the role to assume: +{ + "Version": "2012-10-17", + "Statement": [{ + "Effect": "Allow", + "Principal": { "Federated": "arn:aws:iam:::oidc-provider/oidc.agentkeys.dev" }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringEquals": { "oidc.agentkeys.dev:aud": "sts.amazonaws.com" }, + "StringLike": { "oidc.agentkeys.dev:sub": "enclave::*" } + } + }] +} +``` + +### GCP Workload Identity Federation + +```bash +gcloud iam workload-identity-pools providers create-oidc agentkeys-oidc \ + --workload-identity-pool=agentkeys-pool \ + --issuer-uri=https://oidc.agentkeys.dev \ + --attribute-mapping='google.subject=assertion.sub,attribute.enclave=assertion.agentkeys_enclave_tier' +``` + +### Azure / Snowflake / K8s + +Same pattern — register `https://oidc.agentkeys.dev` as the issuer URL, trust the JWKS, condition on the claims our JWTs emit. + +**One provider, N registrations.** No code changes per consumer. + +--- + +## Rotation + +``` +ES256 key rotation: + 1. TEE starts deriving new key at oidc/issuer/v2 (old at v1 still usable) + 2. Update static JWKS to contain BOTH public keys (kid="v1" and kid="v2") + 3. All new JWTs signed with v2; sealed v1 key kept for grace window + 4. Consumers (AWS etc.) refresh JWKS on their cache cycle — both keys accepted + 5. After grace window (~24h), delete sealed v1; update JWKS to drop v1 +``` + +Zero-downtime rotation. Same recipe as any OIDC provider's key rotation. + +DKIM rotation works identically, independently, per-domain. + +--- + +## Alignment with `blockchain-tee-architecture.md` rules + +Verified end-to-end against the three architectural rules in the repo's `wiki/blockchain-tee-architecture.md`: + +| Rule | How this pattern preserves it | +|---|---| +| **#1 Chain stores everything persistent** | Every OIDC JWT mint emits an on-chain audit extrinsic with `(child, audience, operation, timestamp)`. Grants that authorize mints are on-chain. Revocations are on-chain. No persistent state lives only on our infrastructure. | +| **#2 TEE holds all private keys** | ES256 issuer key is sealed, derived from master seed at `oidc/issuer/v1`, never extractable. The thin proxy that serves the JWKS holds public keys only. | +| **#3 Clients hold only bearer tokens** | JWTs are 5-minute bearers; the daemon uses them once to mint temp cloud creds (≤1h) via the remote STS and discards them. No long-lived material on the daemon. | +| **#4 (proposed) Credential broker, not operation proxy** | The daemon calls AWS/GCP/Ali Cloud APIs directly using the temp creds. Our backend mints tokens; it does not proxy cloud operations. Per-operation compute on our side is zero. | +| **Per-user isolation** | Enforced via the `agentkeys_user_wallet` claim → session tag → resource-policy condition. Cryptographically hard-walled at the cloud level. See [tag-based-access](tag-based-access). | + +## Threat model delta vs sealed long-lived credentials + +| Threat | Sealed AWS IAM access keys inside TEE | TEE-backed OIDC federation | +|---|---|---| +| TEE fully compromised (hardware attack) | All cloud creds exposed; permanent blast radius | All JWT-signing capability exposed; attacker mints arbitrary JWTs for 12h at a time; STILL bad but bounded by temp-cred TTLs | +| TEE restart / redeploy | Keys restored from sealed storage; no disruption | Keys restored from master seed derivation; issuer URL + JWKS unchanged | +| JWKS proxy compromised | N/A | Attacker sees public key only (useless); TEE still controls signing | +| Single cloud's temp cred leaked | Full permanent AWS access | ≤12h window for that one role; all other clouds unaffected | +| Attacker learns an old OIDC JWT | Can use until its 5-min expiry on the one audience it was minted for | Same | +| Need to rotate credentials | Mint new IAM key per cloud, redeploy, migrate | Bump path version, update JWKS, done | + +Net: **the "blast radius on TEE compromise" property is unchanged from the existing architecture**, but the "blast radius on anything short of TEE compromise" drops to near-zero for all cloud credentials. + +--- + +## Build cost + +Minimal, most of the primitives already exist: + +| Component | Cost | +|---|---| +| ES256 key derivation inside TEE | Trivial — reuse the SLIP-0010 primitive the custodial wallet keys already use. 1 day. | +| JWT mint function inside TEE | Well-understood; libraries exist. Need to audit for constant-time. 2 days. | +| Thin HTTPS proxy (nginx + static files) + Let's Encrypt cert | 1 day including DNS setup. | +| `/.well-known/openid-configuration` + `/.well-known/jwks.json` static generation | 0.5 day. | +| First consumer registration (AWS) + end-to-end test | 1 day. | +| Second consumer registration (GCP) to validate the generalization | 0.5 day. | + +**Total: ~1 week** for a fully generalized OIDC provider that replaces every static cloud credential we'd otherwise hold. + +--- + +## What this enables beyond email + +Every future cloud-service integration we build inherits this property for free: + +- **Google Drive / Docs / Calendar** (v0.2+) — federate via GCP Workload Identity into a scoped Google service account +- **Snowflake / BigQuery** analytics — federate for per-agent data access +- **Third-party agent APIs** that accept OIDC — direct federation, no secret to store +- **On-chain payment rails** (x402 on Base) — the ES256 JWTs can also authenticate HTTP payments if we extend the pattern +- **Enterprise SSO integration** — customers can configure their IdP to trust our OIDC issuer for specific roles + +The pattern is the final piece of the "no long-lived secrets" architecture story. Every service AgentKeys touches gets a credential minted on demand, attested at the TEE boundary, auditable on-chain, expiring within an hour. + +--- + +## Open items + +- **Issuer hostname decision.** `oidc.agentkeys.dev`? `tee.agentkeys.io/oidc/`? Needs to be a stable public-CA-certed HTTPS endpoint we control. Suggest: `oidc.agentkeys.dev` as a subdomain we never repurpose. +- **First-cut enclave identity format in `sub`.** Propose `enclave:::agent:` as the concrete form. Consumer trust policies condition on `enclave:::*` to pin a specific enclave build. +- **Multi-tenant enterprise deployments.** Do enterprises want their own OIDC-issuer key? Probably yes. Extension: `derive("oidc/tenant//v1")` gives each tenant their own issuer URL (`https://oidc.agentkeys.dev/tenant//`) with its own JWKS. Same mechanism, bounded blast radius per tenant. +- **Kubernetes-native audience** — our JWTs might also directly satisfy K8s ServiceAccount projection, enabling pods to inherit our enclave identity without any wrapper. Worth exploring in v0.2. +- **On-chain record of active OIDC-issuer keys.** Should the current JWKS fingerprint be recorded on-chain so external verifiers can validate "this JWT was signed by the correct TEE-era key"? Adds an extra audit anchor. Tracked as a future enhancement. + +--- + +## Cross-references + +- `docs/spec/ses-email-architecture.md` §9 (AWS SES primitives) — inherits this federation pattern for SES access +- `docs/spec/ses-email-architecture.md` §11 (three-layer abstraction) — the OIDC broker is a TokenAuthority operation +- `docs/spec/email-signing-backends.md` — SES backend's authority layer uses this federation +- [email-system](email-system) — high-level email system architecture +- [blockchain-tee-architecture](blockchain-tee-architecture) §1 rule #2 — "TEE holds all private keys"; this spec makes it literally true for cloud credentials too +- [session-token](session-token) — the 30-day bearer token at the AgentKeys layer; OIDC JWTs are ~5-minute tokens at the federation layer +- [key-security](key-security) — two-tier storage model; OIDC-issuer key is another TEE-sealed key alongside shielding + JWT + DKIM + custodial wallets + +### AWS primary sources + +- [`IAM/id_roles_providers_create_oidc.html`](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_create_oidc.html) — OIDC provider creation, discovery-doc requirements, JWKS constraints +- [`STS/API_AssumeRoleWithWebIdentity.html`](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRoleWithWebIdentity.html) — federation API, accepted signing algorithms, session duration +- RFC 8037 — CFRG elliptic curves for JOSE (defines EdDSA/Ed25519 JWS; AWS chose not to support) +- RFC 8463 — Ed25519 DKIM (what we do use for outbound mail) +- SLIP-0010 — deterministic key derivation for ECDSA P-256 and Ed25519 from a master seed + diff --git a/wiki/overview.md b/wiki/overview.md new file mode 100644 index 0000000..14e3a4b --- /dev/null +++ b/wiki/overview.md @@ -0,0 +1,125 @@ +--- +title: "AgentKeys Wiki — Index and Reading Order" +tags: ["index", "overview", "reading-order", "tree", "navigation"] +created: 2026-04-19T11:39:37.415Z +updated: 2026-04-19T11:39:37.415Z +sources: [] +links: ["hosted-first.md", "tag-based-access.md", "oidc-federation.md", "email-system.md", "knowledge-storage.md"] +category: reference +confidence: medium +schemaVersion: 1 +--- + +# AgentKeys Wiki — Index and Reading Order + +# AgentKeys Wiki — Index and Reading Order + +The tree. Every wiki page in this repo, grouped by concern, with a one-line description so you can pick where to start. + +--- + +## Tree + +``` +AgentKeys wiki +├── 0 Overview +│ └── 0.1 This index (you are here) +│ +├── 1 Architectural principles +│ ├── 1.1 Hosted-first vs bring-your-own user segmentation +│ ├── 1.2 Tag-based access control (PrincipalTag → per-user isolation) +│ └── 1.3 Broker-not-proxy — our backend mints credentials; daemons do the ops +│ (principle inline across every page; no dedicated wiki yet) +│ +├── 2 Identity and federation +│ └── 2.1 TEE as OIDC identity provider (universal federation pattern) +│ +├── 3 Services +│ └── 3.1 Email system — architecture, backends, usage isolation +│ +└── 4 Deferred decisions + └── 4.1 Knowledge-base storage options +``` + +--- + +## By category + +### Architectural principles + +| Page | One-line summary | +|---|---| +| [hosted-first](hosted-first) | Default path: `xxxxx@agentkeys-email.io` on our infra. BYO custom domain is opt-in, deferred past Stage 7. | +| [tag-based-access](tag-based-access) | JWT claim `agentkeys_user_wallet` → AWS session tag → bucket-policy condition. One bucket, N users, cryptographic separation. | + +### Identity and federation + +| Page | One-line summary | +|---|---| +| [oidc-federation](oidc-federation) | TEE exposes `https://oidc.agentkeys.dev` as a conforming OIDC issuer. Federates into AWS, GCP, Ali Cloud, Azure, Snowflake, K8s. ES256 signing key sealed in enclave. | + +### Services + +| Page | One-line summary | +|---|---| +| [email-system](email-system) | Three email channels (agent / user / approval). Hosted default on SES. Broker-not-proxy: daemon calls SES directly with minted creds. | + +### Deferred decisions + +| Page | One-line summary | +|---|---| +| [knowledge-storage](knowledge-storage) | Four candidates (GitHub / S3 / Drive / Ali Cloud) mapped to user segments. Commit when first real user forces the choice. | + +--- + +## Reading order by role + +| If you're a… | Start with | Then | Then | +|---|---|---|---| +| New engineer on the team | `wiki/blockchain-tee-architecture.md` (repo) | [hosted-first](hosted-first) | [email-system](email-system) | +| Product / roadmap reviewer | [hosted-first](hosted-first) | `docs/spec/plans/development-stages.md` §Stage 5–7 roadmap update | [knowledge-storage](knowledge-storage) | +| Operator / infra setup | `docs/spec/ses-email-architecture.md` | [oidc-federation](oidc-federation) §Consumer-registration recipes | [tag-based-access](tag-based-access) §Concrete AWS configuration | +| Security reviewer | `wiki/blockchain-tee-architecture.md` (repo) | [tag-based-access](tag-based-access) §Security properties and attacker surface | [oidc-federation](oidc-federation) §Threat model | + +--- + +## Architectural principles — the short version + +Every page below assumes these four principles, from `wiki/blockchain-tee-architecture.md` (repo): + +1. **Chain is the source of truth.** Every grant, credential mint, audit event is on-chain. +2. **TEE holds all private keys.** Derived from master seed; never extractable. +3. **Clients hold only bearer tokens.** Short-lived; no long-lived secrets. +4. **Broker, not proxy.** We mint ephemeral credentials; daemons use them to call remote services directly via MCP. We never proxy per-operation reads/writes. + +All five wiki pages and all the `docs/spec/*` specs are derivations of these four rules into concrete services (email, knowledge base, identity federation). + +--- + +## Specs (outside the wiki) + +Living in `docs/spec/`: + +| Spec | Covers | +|---|---| +| `ses-email-architecture.md` | The SES-backed email backend (v0.1 default on `agentkeys-email.io`) | +| `email-signing-backends.md` | Generalized backend comparison (SES / DWD / AgentMail-style SaaS) | +| `credential-backend-interface.md` | The `CredentialBackend` trait that every backend implements | +| `plans/development-stages.md` | The stage roadmap (Stages 0–5 shipped; Stage 6 = hosted email; Stage 7 = OIDC provider) | + +And operator-facing docs: + +| Doc | Covers | +|---|---| +| `manual-test-stage5.md` | Stage 5 demo recipe (dedicated personal Gmail for the quick path) | +| `stage5-workspace-email-setup.md` | **Advanced / deferred:** BYO Workspace DWD for enterprise users | + +--- + +## Conventions + +- `[slug](slug)` links to another wiki page. +- `path/to/file.md` (no wiki brackets) links to a spec or doc outside the wiki. +- Every page starts with a "Status" line naming what stage it targets and whether it's current or deferred. +- We try to cut rather than add — if a page grows past ~300 lines of prose, it probably has SaaS-shape rot that should be removed. + diff --git a/wiki/tag-based-access.md b/wiki/tag-based-access.md new file mode 100644 index 0000000..60f030a --- /dev/null +++ b/wiki/tag-based-access.md @@ -0,0 +1,266 @@ +--- +title: "Tag-Based Access Control — PrincipalTags from JWT Claims for Per-User Isolation" +tags: ["tag-based-access-control", "principal-tag", "session-tag", "oidc", "aws", "iam", "gcp", "per-user-isolation", "security", "attack-surface"] +created: 2026-04-19T10:08:52.043Z +updated: 2026-04-19T10:08:52.043Z +sources: [] +links: ["hosted-first.md", "oidc-federation.md", "email-system.md", "knowledge-storage.md"] +category: pattern +confidence: medium +schemaVersion: 1 +--- + +# Tag-Based Access Control — PrincipalTags from JWT Claims for Per-User Isolation + +# Tag-Based Access Control — PrincipalTags from JWT Claims for Per-User Isolation + +**Status:** pattern (2026-04-19) +**Scope:** how AgentKeys enforces per-user isolation on shared cloud resources (one S3 bucket, one GCS shared drive, one OSS bucket) when many users' data coexists — **without** needing per-user IAM roles or per-user buckets. + +--- + +## TL;DR + +> **JWT carries the user's wallet as a claim → AWS STS maps it to a session tag → bucket policy conditions on the tag.** One bucket holds every user's data, but each user can only access their own prefix. The cloud enforces it, not our code. +> +> Same mechanism exists in GCP (Workload Identity Federation attribute mapping), Ali Cloud (RAM OIDC condition), Azure AD (federated credential + RBAC conditions). Our OIDC provider emits the claim once; each cloud consumer enforces via its native primitive. + +This is the mechanism that makes [hosted-first](hosted-first) secure at scale. Without it, either (a) every user needs their own IAM role (doesn't scale past a few thousand users), (b) every user needs their own bucket (expensive at scale), or (c) our backend proxies every op (violates the broker-not-proxy principle). + +--- + +## The shape of the pattern + +``` +TEE Authority (mint step): + { + iss: "https://oidc.agentkeys.dev", + sub: "enclave::agent:0xABC", // child wallet + aud: "sts.amazonaws.com", + agentkeys_user_wallet: "0xABC", // <<<< tag-claim + agentkeys_inbox: "xyz123@agentkeys-email.io", + agentkeys_operation: "s3.read" + } + → signed ES256 JWT + +AWS STS (exchange step): + POST sts:AssumeRoleWithWebIdentity + WebIdentityToken = + RoleArn = arn:aws:iam:::role/agentkeys-agent + → validates JWT via our JWKS + → maps JWT claim agentkeys_user_wallet → session tag (PrincipalTag) + → returns temp creds (AccessKey, SecretKey, SessionToken) + +Daemon (use step): + s3.getObject(Bucket="agentkeys-mail", Key="0xABC/inbox/msg-1.eml") + → SigV4-signed with temp creds + +S3 (enforce step): + looks up the session's tags → PrincipalTag/agentkeys_user_wallet = "0xABC" + evaluates bucket policy: + Condition: StringLike { + "s3:prefix": "${aws:PrincipalTag/agentkeys_user_wallet}/*" + } + → matches → allowed + → if user attempted "0xB.../..." prefix, fails → denied +``` + +At no point does our backend check "does user A own this prefix?" — AWS does it by cryptographic comparison of the session tag to the resource prefix. One bucket, N users, hard-walled. + +--- + +## Concrete AWS configuration + +### 1. OIDC provider with tag-supporting claim + +Our discovery doc lists the custom claims: + +```json +{ + "issuer": "https://oidc.agentkeys.dev", + "jwks_uri": "https://oidc.agentkeys.dev/.well-known/jwks.json", + "claims_supported": [ + "aud", "iat", "iss", "sub", "exp", + "agentkeys_user_wallet", + "agentkeys_inbox", + "agentkeys_operation", + "agentkeys_grant_id" + ], + "id_token_signing_alg_values_supported": ["ES256"] +} +``` + +### 2. IAM role trust policy — allow JWT if claim is present + +```json +{ + "Version": "2012-10-17", + "Statement": [{ + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.agentkeys.dev" + }, + "Action": [ + "sts:AssumeRoleWithWebIdentity", + "sts:TagSession" + ], + "Condition": { + "StringEquals": { + "oidc.agentkeys.dev:aud": "sts.amazonaws.com" + }, + "StringLike": { + "oidc.agentkeys.dev:sub": "enclave::*" + }, + "StringNotEquals": { + "aws:RequestTag/agentkeys_user_wallet": "" + } + } + }] +} +``` + +Key points: + +- `sts:TagSession` is required to grant the role permission to receive tags from the JWT claim. +- `StringNotEquals aws:RequestTag/agentkeys_user_wallet ""` — reject JWTs that don't carry the isolation claim (belt-and-suspenders; our TEE always sets it, but this catches misconfigurations). +- `sub` pattern-match pins the JWT to a specific enclave build (`mrenclave` identifier). Attackers with a different enclave can't assume the role. + +### 3. Role's attribute-mapping for session tags + +During `AssumeRoleWithWebIdentity`, AWS maps principal tags declared in the OIDC provider to session tags automatically when the provider's `claims_supported` includes them and the role permits `sts:TagSession`. Alternatively, attributes can be mapped explicitly in the IAM identity provider configuration. + +### 4. Bucket policy on the shared `agentkeys-mail` bucket + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "AllowListOwnPrefix", + "Effect": "Allow", + "Principal": { "AWS": "arn:aws:iam::123456789012:role/agentkeys-agent" }, + "Action": "s3:ListBucket", + "Resource": "arn:aws:s3:::agentkeys-mail", + "Condition": { + "StringLike": { + "s3:prefix": [ + "${aws:PrincipalTag/agentkeys_user_wallet}/*" + ] + } + } + }, + { + "Sid": "AllowCrudOwnPrefix", + "Effect": "Allow", + "Principal": { "AWS": "arn:aws:iam::123456789012:role/agentkeys-agent" }, + "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"], + "Resource": "arn:aws:s3:::agentkeys-mail/${aws:PrincipalTag/agentkeys_user_wallet}/*" + }, + { + "Sid": "DenyEverythingElse", + "Effect": "Deny", + "Principal": { "AWS": "arn:aws:iam::123456789012:role/agentkeys-agent" }, + "NotAction": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket"], + "Resource": "*" + } + ] +} +``` + +Every user assumes the **same role** — `agentkeys-agent`. But each session carries a different PrincipalTag derived from their JWT claim, and the bucket policy expands `${aws:PrincipalTag/agentkeys_user_wallet}` per session. User A with tag `0xABC` sees only `agentkeys-mail/0xABC/*`. User B with tag `0xBEEF` sees only `agentkeys-mail/0xBEEF/*`. Cryptographic separation, zero code on our side. + +--- + +## Equivalent mechanisms across clouds + +| Cloud | Claim → tag mapping | Condition key | +|---|---|---| +| **AWS IAM + STS** | `sts:TagSession` action + `id_token_signing_alg_values_supported` includes claim | `aws:PrincipalTag/` | +| **GCP Workload Identity Federation** | Attribute mapping in the provider config: `attribute.user_wallet = assertion.agentkeys_user_wallet` | Resource IAM condition: `request.auth.claims['agentkeys_user_wallet']` or `request.auth.claims.agentkeys_user_wallet` | +| **Ali Cloud RAM OIDC** | `oidc:token.iss` and `oidc:token.sub` exposed natively; custom claims via `oidc:iss/` | `oidc:` in condition | +| **Azure AD Federated Credential** | Claims from external OIDC tokens surface through `x-ms-edov` or equivalent; role assignments via RBAC conditions | RBAC condition: `@Resource[...].Principal.UserId` | + +One OIDC JWT with the `agentkeys_user_wallet` claim works across all four. Each cloud enforces per-user isolation with its native condition primitive on whatever resource (S3 bucket, GCS bucket, OSS bucket, Blob Storage container) we point it at. + +The abstraction we expose to agents is identical — *"read your inbox"* / *"write your memory"* — and the cloud they land on is a deployment-config choice, not an architectural branch. + +--- + +## What this solves vs alternatives + +| Approach | Per-user isolation | Per-user state on our side | Ops burden | Scales to | +|---|---|---|---|---| +| **PrincipalTag via OIDC claims (this pattern)** | Enforced cryptographically by the cloud | Zero — one role, one bucket, N claims | Zero per-user | Millions | +| Per-user IAM role | Enforced by IAM | One role per user | O(users) role creation | Thousands (AWS role count limits) | +| Per-user bucket | Enforced by bucket ownership | One bucket per user | O(users) bucket creation + policy | Limited by AWS bucket-count quotas | +| Our backend proxies every op | App-layer check | All ops flow through our code | Compute cost grows with ops | Unbounded only if we throw money at Lambda | +| Per-user OAuth app | Each user has own credential | Encrypted refresh token per user | Per-user OAuth flow | Millions (but adds per-user consent step) | + +Tag-based access control is the only option that gives (a) zero ops burden per user, (b) zero per-user state on our side, (c) cryptographic enforcement by the cloud, and (d) scales to our target user count. + +--- + +## Security properties and attacker surface + +### What this gives us + +- **Per-user boundary is cryptographic.** User A with wallet `0xA` cannot access user B's prefix even if A compromises their own daemon. The session tag is locked to A's wallet by the JWT signature at mint time; A cannot forge a JWT claiming to be B (would require the TEE's ES256 key, which is sealed). +- **Least privilege by default.** Bucket policy uses an explicit `Deny` on `NotAction`; if a future operation is added to the role without a corresponding bucket-policy `Allow`, it fails closed. +- **Audit attribution is cryptographic.** CloudTrail records the session's PrincipalTag on every access. Forensic investigators trace every S3 read back to a specific user wallet without needing our chain audit (though our chain audit is the authoritative source). +- **Revocation is ≤6 s.** Revoke the on-chain grant → TEE stops minting JWTs for that `(user, scope)` pair → last minted JWT expires in ≤5 min → no further access. No bucket-policy change needed. + +### What can still go wrong + +| Attack | Mitigation | +|---|---| +| Attacker steals a valid short-lived JWT | JWT expires ≤5 min (we set short `exp`); chain revocation invalidates the grant that minted it within ≤6 s; full blast radius ≤5 min access to one user's prefix | +| Attacker compromises our AWS root account | All users' data accessible; this is the catastrophic scenario hosted-first tolerates. Mitigation: isolated AWS account for SES/S3 stack, CloudTrail → chain audit for tamper-evidence, SCPs restricting destructive actions | +| Attacker compromises the TEE | Can mint arbitrary JWTs for any user → all users' data compromised. Same as any root-key compromise in the system. Mitigated by enclave attestation + out-of-band rotation of the ES256 key | +| Role trust policy misconfigured (missing `NotEquals ""` on claim) | JWT without `agentkeys_user_wallet` claim could assume role and access any/no prefix. Mitigation: policy-as-code CI check; integration test that fires a JWT without the claim and asserts denial | +| IAM bucket policy misses the `Deny` clause on `NotAction` | New S3 actions added to the role leak. Mitigation: explicit deny; IAM Access Analyzer scan | +| Attacker replays a valid JWT for another user's operation | Each JWT has `aud=sts.amazonaws.com` and short `exp`; can only exchange for temp creds once (STS returns session); temp creds are scoped to the specific PrincipalTag at issuance | + +The failure-mode surface is ~identical to AWS's own AssumeRoleWithWebIdentity pattern (used by GitHub Actions, etc.). We inherit AWS's hardening of the primitive. + +--- + +## Alignment with the architectural rules + +From `wiki/blockchain-tee-architecture.md`: + +- **Rule #1 (chain is truth):** The `agentkeys_user_wallet` claim in the JWT is the wallet of an on-chain account; the grant that authorized this JWT mint is an on-chain extrinsic; every mint emits an on-chain audit event. The tag's authority derives from chain state. +- **Rule #2 (TEE holds all keys):** The ES256 OIDC-issuer key that signs the JWT is TEE-sealed. Attackers cannot mint tags they don't own without compromising the TEE. +- **Rule #3 (clients hold only bearer tokens):** The daemon receives a short-lived AWS session token with the tag baked in; it never holds the signing key, never holds a long-lived AWS access key. +- **Rule #4 (broker, not proxy):** The daemon calls S3/GCS/OSS directly using the tagged session; our backend mints, doesn't proxy. + +Tag-based access control is the **technical mechanism that lets rule #4 (broker-not-proxy) coexist with per-user isolation**. Without it, we'd be forced back to either per-user buckets (expensive) or operation proxying (compute cost, rule-#4 violation). + +--- + +## Implementation checklist for Stage 6 + +- [ ] Include `agentkeys_user_wallet` in the TEE's JWT claim-set (parallel with existing `sub`) +- [ ] Update OIDC discovery doc to list the claim in `claims_supported` +- [ ] Register the OIDC provider in each AWS account we operate +- [ ] Create the `agentkeys-agent` role with trust policy requiring the claim + pinned to enclave mrenclave +- [ ] Apply the shared-bucket policy using `${aws:PrincipalTag/agentkeys_user_wallet}` +- [ ] Integration test: mint two JWTs for two different wallets; verify each can access only its prefix; verify `agentkeys_user_wallet=""` is denied +- [ ] Chain-audit extrinsic at mint time includes the claim values (redacted appropriately) +- [ ] Repeat for GCP (when Drive backend ships) and Ali Cloud (when OSS backend ships) + +--- + +## Cross-references + +- [oidc-federation](oidc-federation) — the OIDC-provider design that mints JWTs carrying the tag claim +- [email-system](email-system) — the email system that uses PrincipalTag to isolate per-user inboxes on the shared `agentkeys-mail` bucket +- [knowledge-storage](knowledge-storage) — every candidate backend uses an equivalent tag-condition mechanism natively +- [hosted-first](hosted-first) — why per-user isolation on a shared bucket matters (hosted default can't afford per-user AWS resources) +- `wiki/blockchain-tee-architecture.md` (repo) — rules this pattern is designed to preserve + +### AWS primary sources + +- [Session tags in STS](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_session-tags.html) +- [AssumeRoleWithWebIdentity and passing session tags](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_session-tags.html#id_session-tags_adding-assume-role-idp) +- [S3 condition keys and principal tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/bucket-policies.html) + From d39b2f7391b7d1e9b1b3a9fa3ec542b290981cfd Mon Sep 17 00:00:00 2001 From: wildmeta-agent Date: Mon, 20 Apr 2026 00:28:39 +0800 Subject: [PATCH 4/5] docs(wiki): resolve contradictions across 6 service-architecture pages MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Audited the 6 newly-promoted wiki pages (email-system, hosted-first, knowledge-storage, oidc-federation, overview, tag-based-access) against blockchain-tee-architecture.md and docs/spec/heima-gaps-vs-desired-architecture.md. Found 12 deltas. Resolutions applied: A1 oidc issuer URL typo `oidc.agentkeys.io` -> `.dev` in heima-gaps. A2 email-system isolation diagram mailbox domain switched from the old `bots.wildmeta.ai` to the canonical `agentkeys-email.io`. A3 Same domain sweep across ses-email-architecture.md and manual-test-stage5.md. A4 oidc-federation.md: "three architectural rules" -> "four" (rule #4 broker-not-proxy is now canonical). A5 knowledge-storage.md: same rule-count fix + folded the "Rule #4 candidate" note into the main alignment list. A6 Removed the duplicate H1 heading (wiki-ingest artifact) on hosted-first.md, tag-based-access.md, knowledge-storage.md. B1+B2 Session-JWT key and OIDC-issuer key are now TWO SEPARATE keys, both ES256. Session-JWT at `issuer/jwt/v1` (internal trust anchor, not on public JWKS). OIDC-issuer at `oidc/issuer/v1` (public JWKS at oidc.agentkeys.dev). Separation isolates AWS-cache-driven OIDC rotations from session-token invalidation. Updated: - wiki/blockchain-tee-architecture.md §1 table (one row -> two) - wiki/blockchain-tee-architecture.md §6 rule #2 - wiki/session-token.md alg + separation note - docs/spec/heima-gaps-vs-desired-architecture.md §2 table + §3 text B3 email-system.md "AWS_SES-managed DKIM" was a direct violation of rule #2. Flipped to TEE-held Ed25519 BYODKIM (`dkim//v1`) consistent with every other page. B4 heima-gaps §6 claim value: was `agentkeys_user_wallet = omni_account`; reconciled to child wallet with a note that the claim name is historical. Per-agent blast-radius bounding is the reason. B5 OIDC `sub` format now consistently includes mrsigner: `enclave:::agent:` across tag-based-access + knowledge-storage. Added §"Why three segments in `sub`" explaining the three pin-modes (strict mrenclave / loose mrsigner / explicit both) that this unlocks for relying-party trust policies. B6 blockchain-tee-architecture rule #3 TTL corrected from "~24h" to the canonical 30d, and added a brief three-TTL layering note (30d session bearer / 5-min OIDC JWT / 1h cloud temp creds). --- docs/manual-test-stage5.md | 2 +- .../heima-gaps-vs-desired-architecture.md | 28 ++++++++++--------- docs/spec/ses-email-architecture.md | 26 ++++++++--------- wiki/blockchain-tee-architecture.md | 7 +++-- wiki/email-system.md | 6 ++-- wiki/hosted-first.md | 1 - wiki/knowledge-storage.md | 12 ++++---- wiki/oidc-federation.md | 2 +- wiki/session-token.md | 7 +++-- wiki/tag-based-access.md | 24 +++++++++++++--- 10 files changed, 67 insertions(+), 48 deletions(-) diff --git a/docs/manual-test-stage5.md b/docs/manual-test-stage5.md index 706b83a..47d8aca 100644 --- a/docs/manual-test-stage5.md +++ b/docs/manual-test-stage5.md @@ -28,7 +28,7 @@ agentkeys provision openrouter For the demo-only purpose of Stage 5, the goal is the **shortest path to a running provisioner** with an inbox the agent fully controls. Use a dedicated personal Gmail below — reuses our existing IMAP code path, ~10 minutes total setup, no Workspace subscription required. -> **This is a temporary demo solution.** For production (v0.1), the agent mailbox moves to SES-hosted `*@bots.wildmeta.ai` under the three-layer `TokenAuthority` abstraction. See the [email-system wiki page](../wiki/email-system.md) for the full architecture and why we're running demo-and-production on different backends deliberately. +> **This is a temporary demo solution.** For production (v0.1), the agent mailbox moves to SES-hosted `*@agentkeys-email.io` under the three-layer `TokenAuthority` abstraction. See the [email-system wiki page](../wiki/email-system.md) for the full architecture and why we're running demo-and-production on different backends deliberately. #### 🚀 Demo path: dedicated personal Gmail + TOTP + app password diff --git a/docs/spec/heima-gaps-vs-desired-architecture.md b/docs/spec/heima-gaps-vs-desired-architecture.md index d3b11be..b305f9d 100644 --- a/docs/spec/heima-gaps-vs-desired-architecture.md +++ b/docs/spec/heima-gaps-vs-desired-architecture.md @@ -40,12 +40,13 @@ There is **no master seed**. OmniAccount *addresses* are deterministically deriv A single 256-bit master seed is generated once, at first enclave provisioning, from the hardware RNG and sealed. Every other long-lived key is deterministically derived from that seed via SLIP-0010 HDKD (BIP-32-style): -| Subkey | Derivation path | Alg | Consumer | -| ------------------------------- | --------------------------------------------------- | ----------- | -------------------------------------- | -| Shielding keypair | `shielding/v1` | Curve25519 | Credential-blob encrypt/decrypt | -| Issuer JWT signing key | `issuer/jwt/v1` | RSA-2048 *or* ES256 | Session-token minting + OIDC issuer (Stage 7) | -| Per-user wallet key | `wallet///v1` | secp256k1 / ed25519 (per chain) | Custodial wallet signing | -| Per-domain DKIM key | `dkim//v1` | Ed25519 | Outbound mail signing (Stage 6) | +| Subkey | Derivation path | Alg | Consumer | +| ------------------------------- | --------------------------------------------------- | ------------------------------------- | -------------------------------------- | +| Shielding keypair | `shielding/v1` | Curve25519 | Credential-blob encrypt/decrypt | +| Session-JWT signing key | `issuer/jwt/v1` | ES256 (ECDSA P-256) | Sign 30-day session tokens (internal trust anchor; not on public JWKS). | +| OIDC-issuer signing key | `oidc/issuer/v1` | ES256 (ECDSA P-256) | Sign ≤5-min OIDC JWTs exchanged at AWS STS / GCP WIF / Ali RAM for cloud temp creds. Separate key so the rotatable public trust anchor is isolated from the session-JWT anchor. | +| Per-user wallet key | `wallet///v1` | secp256k1 / ed25519 (per chain) | Custodial wallet signing | +| Per-domain DKIM key | `dkim//v1` | Ed25519 | Outbound mail signing (Stage 6) | ### Impact @@ -72,13 +73,13 @@ The TEE issues JWTs for internal AgentKeys authentication, but: - No `/.well-known/openid-configuration` discovery document is published. - No JWKS endpoint is published. - The `iss` claim on existing JWTs is not a resolvable HTTPS URL. -- The signing alg is RSA-2048 only; there is no ES256 path. +- The signing alg is RSA-2048 only; there is neither an ES256 path nor a separate OIDC-issuer key. ### Desired (Stage 7 — Generalized OIDC Provider) -The TEE's issuer signing key (derivation path `issuer/jwt/v1`, alg **ES256**) doubles as a conforming OpenID Connect issuer: +The TEE's **OIDC-issuer signing key** (derivation path `oidc/issuer/v1`, alg **ES256**, separate from the session-JWT key at `issuer/jwt/v1`) backs a conforming OpenID Connect issuer: -- `iss = https://oidc.agentkeys.io` (or per-tenant subdomain). +- `iss = https://oidc.agentkeys.dev` (or per-tenant subdomain). - `/.well-known/openid-configuration` served from a plain HTTPS endpoint (static file, no compute; just publishes the issuer URL, JWKS URL, supported algs). - `/.well-known/jwks.json` serves the ES256 public key as a JWK. - JWT claims include the user's OmniAccount wallet as a custom claim (`agentkeys_user_wallet`) so relying parties can gate access via `sts:TagSession` / `aws:PrincipalTag` conditions (see [`wiki/tag-based-access.md`](../../wiki/tag-based-access.md)). @@ -89,11 +90,12 @@ Without this, AWS / GCP / Azure / Ali Cloud / K8s cannot federate identity to th ### Migration path -- **Issuer key alg:** add ES256 derivation alongside RSA-2048 (AWS IAM OIDC accepts RS256 and ES256, but not Ed25519 — this was verified directly from the AWS docs). +- **Issuer key alg + key split:** migrate the session-JWT signing key from RSA-2048 to ES256 at `issuer/jwt/v1`, AND add a **separate** ES256 key at `oidc/issuer/v1` for OIDC federation. Two keys, same alg, different purposes: the session-JWT key is an internal TEE-only trust anchor (verified by TEE workers, not on a public JWKS); the OIDC-issuer key is on a public JWKS at `https://oidc.agentkeys.dev/.well-known/jwks.json`. Separation isolates OIDC-rotation cycles (driven by AWS cache windows) from session-token invalidation. (AWS IAM OIDC accepts RS256 and ES256, but not Ed25519 — verified directly from the AWS docs.) - **Discovery document + JWKS:** static S3/CloudFront-served JSON; no TEE compute required for serving (compute is key-derivation-on-demand for JWKS rotation, which is rare). -- **Publish pipeline:** the TEE computes its own JWK from the derived public key; we mirror it to the discovery URL on rotation. +- **Publish pipeline:** the TEE computes its own JWK from the derived OIDC public key; we mirror it to the discovery URL on rotation. +- **Session-JWT migration path:** existing RSA-2048 session tokens remain valid until their 30-day window expires; new tokens after the cut-over are ES256. Clients verify whichever matches the `alg` header. Two-week flag flip. -Depends on §2 (HDKD) landing first, because the ES256 key is a subkey of the master seed. +Depends on §2 (HDKD) landing first, because both the session-JWT key and the OIDC-issuer key are subkeys of the master seed. --- @@ -156,7 +158,7 @@ The TEE mints JWTs with standard claims (`sub`, `typ`, `exp`, `aud`). There is n ### Desired (Stage 6 + Stage 7) -The JWT the TEE mints carries `agentkeys_user_wallet = ` as a claim. When a client does `sts:AssumeRoleWithWebIdentity` with that JWT, STS extracts the claim and attaches it as a session tag. Downstream bucket policies and KMS policies pattern-match on `aws:PrincipalTag/agentkeys_user_wallet = ${aws:SourceIdentity}` or similar, giving us per-user isolation on shared cloud resources **without** per-user IAM roles. +The JWT the TEE mints carries `agentkeys_user_wallet = ` as a claim. The claim name is historical (from early design when the only identity was the user's OmniAccount); the value is the **child/agent wallet** so that per-agent compromise bounds the blast radius to that one agent's prefix rather than the whole user. When a client does `sts:AssumeRoleWithWebIdentity` with that JWT, STS extracts the claim and attaches it as a session tag. Downstream bucket policies and KMS policies pattern-match on `aws:PrincipalTag/agentkeys_user_wallet = ${aws:SourceIdentity}` or similar, giving us per-user (per-agent) isolation on shared cloud resources **without** per-user IAM roles. See [`wiki/tag-based-access.md`](../../wiki/tag-based-access.md) for the full pattern. diff --git a/docs/spec/ses-email-architecture.md b/docs/spec/ses-email-architecture.md index e072601..2bd14cd 100644 --- a/docs/spec/ses-email-architecture.md +++ b/docs/spec/ses-email-architecture.md @@ -134,9 +134,9 @@ Child bearer token 30 days (AgentKeys policy) The `grant.allowed_subjects` for an SES inbox grant is typically a pattern: -- `Exact("bot-42@bots.wildmeta.ai")` — one specific inbox -- `Prefix("bot-*@bots.wildmeta.ai")` — all inboxes with that prefix (owned by this child) -- `DomainWildcard("bots.wildmeta.ai")` — any inbox on this domain (master-only pattern) +- `Exact("bot-42@agentkeys-email.io")` — one specific inbox +- `Prefix("bot-*@agentkeys-email.io")` — all inboxes with that prefix (owned by this child) +- `DomainWildcard("agentkeys-email.io")` — any inbox on this domain (master-only pattern) `grant.allowed_scopes` is a small enum (more scopes are daemon-side concerns, not ours): @@ -152,7 +152,7 @@ Higher-level concerns like drafts-with-human-approval, per-message reply/forward | Primitive | Use | |---|---| -| **Verified identity (domain)** | `bots.wildmeta.ai` verified once; all inboxes live under it. | +| **Verified identity (domain)** | `agentkeys-email.io` verified once; all inboxes live under it. | | **DKIM records** | 3 CNAMEs if `AwsSes` type; 1 CNAME to our key fingerprint if `ByoDkim` (v0.1 TEE-held; **Ed25519 per RFC 8463**, derived from TEE master seed — see §10.5). | | **MX** | `10 inbound-smtp.us-east-1.amazonaws.com` — single record, catches all inbound to the domain. | | **Receipt rule** | One rule matching `*@agentkeys-email.io` → `S3Action` writes raw MIME directly to the bucket. No Lambda. | @@ -165,21 +165,21 @@ Higher-level concerns like drafts-with-human-approval, per-message reply/forward > **Stage 6 default — `agentkeys-email.io`.** We (AgentKeys) operate this domain; users do not configure DNS. The records below are the one-time setup on AgentKeys' side and are preserved here so the pattern is reproducible for Stage 7+ bring-your-own-domain users. -DNS records needed for a fresh domain (AgentKeys-hosted default: `agentkeys-email.io`; user-owned BYO example shown as `bots.wildmeta.ai`): +DNS records needed for a fresh domain (AgentKeys-hosted default: `agentkeys-email.io`; user-owned BYO example shown as `agentkeys-email.io`): | Record | Value | Purpose | |---|---|---| -| `bots.wildmeta.ai. MX` | `10 inbound-smtp.us-east-1.amazonaws.com.` | Inbound | -| `_amazonses.bots.wildmeta.ai. TXT` | `` | SES domain verification | -| `._domainkey.bots.wildmeta.ai. CNAME` | `.dkim.amazonses.com.` | DKIM selector 1 (AWS_SES mode) | -| `._domainkey.bots.wildmeta.ai. CNAME` | `.dkim.amazonses.com.` | DKIM selector 2 | -| `._domainkey.bots.wildmeta.ai. CNAME` | `.dkim.amazonses.com.` | DKIM selector 3 | -| `bots.wildmeta.ai. TXT` (SPF) | `v=spf1 include:amazonses.com ~all` | Outbound authorization | -| `_dmarc.bots.wildmeta.ai. TXT` | `v=DMARC1; p=quarantine; rua=mailto:dmarc@wildmeta.ai` | DMARC | +| `agentkeys-email.io. MX` | `10 inbound-smtp.us-east-1.amazonaws.com.` | Inbound | +| `_amazonses.agentkeys-email.io. TXT` | `` | SES domain verification | +| `._domainkey.agentkeys-email.io. CNAME` | `.dkim.amazonses.com.` | DKIM selector 1 (AWS_SES mode) | +| `._domainkey.agentkeys-email.io. CNAME` | `.dkim.amazonses.com.` | DKIM selector 2 | +| `._domainkey.agentkeys-email.io. CNAME` | `.dkim.amazonses.com.` | DKIM selector 3 | +| `agentkeys-email.io. TXT` (SPF) | `v=spf1 include:amazonses.com ~all` | Outbound authorization | +| `_dmarc.agentkeys-email.io. TXT` | `v=DMARC1; p=quarantine; rua=mailto:dmarc@wildmeta.ai` | DMARC | For `BYODKIM` (v0.1, TEE-held key): one CNAME pointing to the TEE's registered DKIM pubkey fingerprint, replacing the three AWS-provided CNAMEs. DKIM-signing moves into the enclave. -Optionally, a MAIL FROM subdomain `bounce.bots.wildmeta.ai` with its own MX + SPF for bounce handling (adds 2 records). +Optionally, a MAIL FROM subdomain `bounce.agentkeys-email.io` with its own MX + SPF for bounce handling (adds 2 records). State machine per domain: `NotStarted → Pending → Verifying → Verified` (or `Invalid` / `Failed`). Our backend polls SES's `GetIdentityVerificationAttributes` every ~60 s during `Verifying` and transitions the state. diff --git a/wiki/blockchain-tee-architecture.md b/wiki/blockchain-tee-architecture.md index f09c909..805ae3f 100644 --- a/wiki/blockchain-tee-architecture.md +++ b/wiki/blockchain-tee-architecture.md @@ -55,7 +55,8 @@ The TEE is a **stateless computation oracle**. It reads chain state, performs cr | ------------------------------------------------- | --------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | | **TEE master seed** | Permanent (sealed storage, never leaves enclave, never exposed) | Generated once at first enclave provisioning from a hardware RNG (256-bit) | Root of all HD derivation. Every other key below derives from this seed. | | Shielding keypair | Permanent (sealed storage, pubkey registered on chain via `register_enclave()`) | Derived from master seed at path `shielding/v1` (SLIP-0010 / BIP-32-style HDKD) | Encrypt/decrypt credential blobs | -| Issuer signing key (RSA-2048 or ES256) | Permanent (sealed storage, pubkey published via OIDC discovery + `register_enclave()`) | Derived from master seed at path `issuer/jwt/v1` (SLIP-0010) | Sign session tokens (JWT format) issued to clients; also the OIDC issuer key (Stage 7) | +| Session-JWT signing key (ES256) | Permanent (sealed storage, pubkey registered on chain via `register_enclave()`) | Derived from master seed at path `issuer/jwt/v1` (SLIP-0010, secp256r1 / NIST P-256) | Sign 30-day session tokens (JWT format) issued to clients. Verified by TEE only — not exposed via public JWKS. | +| OIDC-issuer signing key (ES256) | Permanent (sealed storage, pubkey published at `https://oidc.agentkeys.dev/.well-known/jwks.json`) | Derived from master seed at path `oidc/issuer/v1` (SLIP-0010, secp256r1 / NIST P-256) | Sign short-lived (≤5 min) OIDC JWTs exchanged by daemons for AWS/GCP/Azure/Ali temp creds (Stage 7). Separate key so the publicly-rotatable OIDC trust anchor is isolated from the session-JWT trust anchor. | | Per-user custodial wallet keys (BTC/ETH/TON) | Permanent (derived on demand, cacheable; deterministic re-derivation after restart) | Derived from master seed at path `wallet///v1` (SLIP-0010) | Sign on-chain extrinsics on behalf of user wallets. Private key never leaves the enclave. | | Per-domain DKIM signing key (Stage 6) | Permanent (derived on demand, public key published as DNS TXT record) | Derived from master seed at path `dkim//v1` (Ed25519, RFC 8463) | Sign outbound mail for `@agentkeys-email.io` and user-owned domains | | AES response keys | Ephemeral (per-request) | From `RequestAesKey` parameter | Encrypt sensitive responses to specific clients | @@ -525,8 +526,8 @@ This gets the per-read latency down to pure-TEE-backend levels for hot-path read The entire AgentKeys v0.1 architecture follows four rules: 1. **Chain stores everything persistent.** Account records, credential blobs (encrypted), pair requests, approvals, audit events, wallet balances, revocation lists. The chain is the single source of truth. If the TEE restarts, if the daemon crashes, if the user switches devices — chain state is always there. -2. **TEE holds all private keys and does all computation.** The TEE holds a single sealed master seed and deterministically derives every other long-lived key from it via SLIP-0010 HDKD: the shielding key (`shielding/v1`), the issuer signing key for session JWTs and the OIDC provider (`issuer/jwt/v1`), per-user custodial wallet keys (`wallet///v1`, per `pallet-bitacross` pattern), and per-domain DKIM signing keys (`dkim//v1`, Ed25519, Stage 6). The TEE decrypts credential blobs, issues and verifies JWTs, signs on-chain extrinsics using the user's wallet key, signs outbound mail, and enforces scope + rate limits. No private key ever leaves the TEE. (Current Heima source generates these keys independently rather than HD-derived — see [`docs/spec/heima-gaps-vs-desired-architecture.md`](../docs/spec/heima-gaps-vs-desired-architecture.md) for the migration gap.) -3. **Clients hold only a JWT (bearer token), not private keys.** The master CLI and agent daemon each hold a JWT string issued by the TEE upon authentication. The JWT is a signed bearer token (`AuthTokenClaims { sub, typ, exp, aud }`), not a private key. However, it IS still a bearer credential — anyone with the string can impersonate the user until it expires. **OS keychain is the recommended default** for the master CLI (provides app-level ACL against malware-as-same-user). Plain file (mode 0600) is an acceptable fallback for daemon/sandbox/CI where keychain isn't available. If the JWT leaks, the blast radius is bounded by its expiration time (~~24h) and the on-chain revocation list (~~6s). If the JWT expires, the client re-authenticates and gets a new one. +2. **TEE holds all private keys and does all computation.** The TEE holds a single sealed master seed and deterministically derives every other long-lived key from it via SLIP-0010 HDKD: the shielding key (`shielding/v1`, Curve25519), the session-JWT signing key (`issuer/jwt/v1`, ES256), the OIDC-issuer key (`oidc/issuer/v1`, ES256, separate from the session-JWT key so the publicly-rotatable OIDC trust anchor is isolated from the internal session-JWT trust anchor), per-user custodial wallet keys (`wallet///v1`, per `pallet-bitacross` pattern), and per-domain DKIM signing keys (`dkim//v1`, Ed25519, Stage 6). The TEE decrypts credential blobs, issues and verifies JWTs, signs on-chain extrinsics using the user's wallet key, signs outbound mail (BYODKIM — the DKIM key lives in the enclave, not at AWS SES), and enforces scope + rate limits. No private key ever leaves the TEE. (Current Heima source generates these keys independently rather than HD-derived — see [`docs/spec/heima-gaps-vs-desired-architecture.md`](../docs/spec/heima-gaps-vs-desired-architecture.md) for the migration gap.) +3. **Clients hold only a JWT (bearer token), not private keys.** The master CLI and agent daemon each hold a JWT string issued by the TEE upon authentication. The JWT is a signed bearer token (`AuthTokenClaims { sub, typ, exp, aud }`), not a private key. However, it IS still a bearer credential — anyone with the string can impersonate the user until it expires. **OS keychain is the recommended default** for the master CLI (provides app-level ACL against malware-as-same-user). Plain file (mode 0600) is an acceptable fallback for daemon/sandbox/CI where keychain isn't available. If the JWT leaks, the blast radius is bounded by its expiration time (**30 days**, per [Session Token](session-token)) and the on-chain revocation list (~6s). If the JWT expires, the client re-authenticates and gets a new one. There are three TTLs to keep straight: **30-day session bearer** (this rule), **≤5-min OIDC-federation JWT** (what the daemon exchanges at AWS STS / GCP WIF / Ali RAM for cloud temp creds, per [OIDC Federation](oidc-federation)), and **≤1-hour cloud temp creds** (AWS default). Nested: shortest TTL always wins; revocation still propagates in ≤6s via the chain. 4. **AgentKeys brokers credentials, not operations.** Our infrastructure mints ephemeral credentials (JWTs, temp cloud creds, decrypted API keys) and emits audit extrinsics at mint time. The daemon then calls remote services (SES, S3, GitHub, Notion, LLM APIs, …) **directly** using those credentials — we never proxy per-operation reads/writes. Compute cost on our side scales with user count, not with operation frequency. Per-user isolation on shared cloud resources is enforced by the cloud itself via PrincipalTag / session-tag conditions derived from JWT claims (see [Tag-Based Access](tag-based-access)). This rule is why the email, knowledge-base, and OIDC-federation designs never build proxies, SaaS feature surfaces, or per-operation compute on our side. Every flow in the system (credential store, credential read, pairing, revocation, audit query, email read/send, knowledge-base ops) is an instance of: diff --git a/wiki/email-system.md b/wiki/email-system.md index 93e0141..9441976 100644 --- a/wiki/email-system.md +++ b/wiki/email-system.md @@ -45,8 +45,8 @@ AgentKeys treats email as a credential-managed resource under the same `Authorit ``` ┌──────────────────────────────────────────────────────────────┐ │ 1. AGENT MAILBOX │ -│ @bots.wildmeta.ai (our SES, v0.1) │ -│ Hosts: our infra. Agent reads via backend only. │ +│ @agentkeys-email.io (our SES, v0.1) │ +│ Hosts: our infra. Agent reads via minted S3 creds. │ │ Purpose: receive service-provider OTPs + confirmations. │ │ User does NOT share this inbox. │ │ │ @@ -145,7 +145,7 @@ Minimal broker-not-proxy shape. Our infrastructure handles only credential minti |---|---| | **Inbox** | On-chain row `(user_wallet, agent_wallet, inbox_address)`. `inbox_address` is the email itself — e.g. `abc123@agentkeys-email.io`. No server-side message/thread/label store. | | **Receive** | SES receipt rule drops raw MIME to `s3://agentkeys-mail///.eml`. No Lambda. No parsing. No DB writes. Per-email compute on our side: zero. | -| **Send** | Daemon mints temp SES creds (via OIDC federation), assembles MIME, calls `ses:SendRawEmail` directly. IAM role condition on `ses:FromAddress` pins the daemon to its own inbox address. AWS_SES-managed DKIM for the hosted domain. Per-send compute on our side: zero. | +| **Send** | Daemon assembles MIME, calls the TEE to DKIM-sign the message (Ed25519 BYODKIM, key derived at `dkim/agentkeys-email.io/v1` — Rule #2 keeps the signing key inside the enclave), then mints temp SES creds via OIDC federation and calls `ses:SendRawEmail` directly. IAM role condition on `ses:FromAddress` pins the daemon to its own inbox address. Per-send compute on our side: one signature inside the TEE, zero app-layer compute. | | **Read** | Daemon mints temp S3 creds with `PrincipalTag/agentkeys_user_wallet` from the JWT claim. Bucket policy conditions limit the daemon to its own user's prefix. Daemon lists/gets S3 objects and parses MIME client-side. Per-read compute on our side: zero. | | **Domain (hosted default)** | We operate `agentkeys-email.io`; MX + SPF + DMARC + AWS_SES DKIM set once on our side. User-side DNS: none. | | **OIDC federation** | TEE-derived ES256 issuer key at `derive("oidc/issuer/v1")`. JWT claims include `agentkeys_user_wallet` for PrincipalTag isolation. See [oidc-federation](oidc-federation). | diff --git a/wiki/hosted-first.md b/wiki/hosted-first.md index 36421b7..043ae11 100644 --- a/wiki/hosted-first.md +++ b/wiki/hosted-first.md @@ -12,7 +12,6 @@ schemaVersion: 1 # Hosted-First vs Bring-Your-Own — User Segmentation -# Hosted-First vs Bring-Your-Own — User Segmentation **Status:** decision (2026-04-19) **Scope:** how AgentKeys onboards non-developer users vs enterprise / advanced users across email, knowledge base, and OIDC identity. diff --git a/wiki/knowledge-storage.md b/wiki/knowledge-storage.md index 373964a..3475836 100644 --- a/wiki/knowledge-storage.md +++ b/wiki/knowledge-storage.md @@ -12,7 +12,6 @@ schemaVersion: 1 # Knowledge Base Storage Options — Deferred Decision Matrix -# Knowledge Base Storage — Deferred Decision Matrix **Status:** deferred (2026-04-19) **Scope:** which backend(s) AgentKeys will support as the agent's "memory" / knowledge-base storage layer once we commit. @@ -128,8 +127,10 @@ TEE master seed Ali RAM supports external OIDC identity providers just like AWS IAM does: 1. Register our OIDC issuer URL in RAM (one-time, per region) - 2. Create a RAM role with condition: oidc:sub matches "enclave:*:agent:*" + 2. Create a RAM role with condition: oidc:sub matches "enclave:::agent:*" and oidc:aud = sts.aliyuncs.com + (three-segment enclave identity — see tag-based-access §"Why three segments" + for the mrenclave/mrsigner pin-mode options) 3. Role policy: s3-equivalent OSS permissions on per-user prefix 4. Agent flow: TEE signs JWT → AssumeRoleWithOIDC → temp OSS creds @@ -174,13 +175,12 @@ Until one of those fires, the deferred state is fine. The credential-broker arch ## Alignment with [blockchain-tee-architecture](blockchain-tee-architecture) -This deferred decision does not violate any of the three rules: +This deferred decision preserves all four rules: - **Rule #1** (chain is source of truth): every knowledge-base grant is an on-chain extrinsic regardless of which backend; every credential mint is an on-chain audit event. - **Rule #2** (TEE holds all private keys): whether it's the GitHub App ECDSA key, the ES256 OIDC-issuer key, or anything else, every long-lived key is derived from the TEE master seed and never extracted. - **Rule #3** (clients hold only bearer tokens): the daemon holds a short-lived GitHub installation token / AWS temp creds / OSS STS token — all ≤1h, no long-lived secrets, same posture as existing credentials. - -The **broker-not-proxy** principle (Rule #4 candidate) applies end-to-end: our backend mints credentials, the daemon uses them to talk to the vendor directly via MCP, we never run `read_document` / `write_document` on the user's behalf. +- **Rule #4** (broker, not proxy): our backend mints credentials; the daemon uses them to talk to the vendor directly via MCP. We never run `read_document` / `write_document` on the user's behalf. Applies end-to-end across all four candidate backends. --- @@ -190,6 +190,6 @@ The **broker-not-proxy** principle (Rule #4 candidate) applies end-to-end: our b - [oidc-federation](oidc-federation) — how our OIDC provider federates into AWS, GCP, Ali Cloud, any compliant cloud - [tag-based-access](tag-based-access) — how AWS (and equivalent-mechanism clouds) enforce per-user isolation on shared buckets via JWT-claim-derived session tags - [hosted-first](hosted-first) — the broader user-segmentation principle this decision follows -- `wiki/blockchain-tee-architecture.md` (repo) — the three architectural rules all candidates preserve +- [`wiki/blockchain-tee-architecture.md`](blockchain-tee-architecture) — the four architectural rules all candidates preserve - Issue [#11](https://github.com/litentry/agentKeys/issues/11) — biometric gate (applies to each backend's grant creation) diff --git a/wiki/oidc-federation.md b/wiki/oidc-federation.md index c12733e..d017dad 100644 --- a/wiki/oidc-federation.md +++ b/wiki/oidc-federation.md @@ -240,7 +240,7 @@ DKIM rotation works identically, independently, per-domain. ## Alignment with `blockchain-tee-architecture.md` rules -Verified end-to-end against the three architectural rules in the repo's `wiki/blockchain-tee-architecture.md`: +Verified end-to-end against the four architectural rules in the repo's [`wiki/blockchain-tee-architecture.md`](blockchain-tee-architecture): | Rule | How this pattern preserves it | |---|---| diff --git a/wiki/session-token.md b/wiki/session-token.md index 283b565..d28a8cf 100644 --- a/wiki/session-token.md +++ b/wiki/session-token.md @@ -66,9 +66,10 @@ TEE returns the token string to the client The issuer signing key: -- Lives inside the TEE (sealed storage), derived from the sealed TEE master seed at path `issuer/jwt/v1` via SLIP-0010 HDKD — the same seed that roots the shielding key, per-user wallet keys, and per-domain DKIM keys (see [Blockchain TEE Architecture §1](blockchain-tee-architecture#tee-trusted-execution-environment-worker) and [`docs/spec/heima-gaps-vs-desired-architecture.md`](../docs/spec/heima-gaps-vs-desired-architecture.md) for the current-vs-desired gap) -- Default alg is RSA-2048 (SHA-256) for backward compatibility with existing JWT verifiers; ES256 (ECDSA P-256) is the preferred alg once Stage 7 ships, since AWS IAM OIDC accepts ES256 but not Ed25519 -- Public key published on chain via `register_enclave()` AND (Stage 7) via the OIDC discovery document at `/.well-known/openid-configuration` + JWKS endpoint +- Lives inside the TEE (sealed storage), derived from the sealed TEE master seed at path `issuer/jwt/v1` via SLIP-0010 HDKD — the same seed that roots the shielding key, per-user wallet keys, OIDC-issuer key, and per-domain DKIM keys (see [Blockchain TEE Architecture §1](blockchain-tee-architecture#tee-trusted-execution-environment-worker) and [`docs/spec/heima-gaps-vs-desired-architecture.md`](../docs/spec/heima-gaps-vs-desired-architecture.md) for the current-vs-desired gap) +- Alg is **ES256** (ECDSA P-256, SHA-256 digest). This is the TEE's internal trust anchor for the 30-day session bearer and is verified only by TEE workers — not exposed on any public JWKS endpoint. +- The session-JWT key is **separate** from the public OIDC-issuer key (`oidc/issuer/v1`, also ES256). Separation keeps the public-facing, rotatable OIDC trust anchor isolated from the internal session-JWT anchor, so an OIDC-issuer rotation (driven by AWS cache windows) does not invalidate every live session token. +- Public key published on chain via `register_enclave()` for on-chain verification by other Heima components. --- diff --git a/wiki/tag-based-access.md b/wiki/tag-based-access.md index 60f030a..4b30db7 100644 --- a/wiki/tag-based-access.md +++ b/wiki/tag-based-access.md @@ -12,7 +12,6 @@ schemaVersion: 1 # Tag-Based Access Control — PrincipalTags from JWT Claims for Per-User Isolation -# Tag-Based Access Control — PrincipalTags from JWT Claims for Per-User Isolation **Status:** pattern (2026-04-19) **Scope:** how AgentKeys enforces per-user isolation on shared cloud resources (one S3 bucket, one GCS shared drive, one OSS bucket) when many users' data coexists — **without** needing per-user IAM roles or per-user buckets. @@ -35,7 +34,7 @@ This is the mechanism that makes [hosted-first](hosted-first) secure at scale. W TEE Authority (mint step): { iss: "https://oidc.agentkeys.dev", - sub: "enclave::agent:0xABC", // child wallet + sub: "enclave:::agent:0xABC", // child wallet; three-segment enclave identity (build hash + signer hash) lets relying parties pin to a specific build, a specific signer, or both — see §"Why three segments in sub" aud: "sts.amazonaws.com", agentkeys_user_wallet: "0xABC", // <<<< tag-claim agentkeys_inbox: "xyz123@agentkeys-email.io", @@ -109,7 +108,7 @@ Our discovery doc lists the custom claims: "oidc.agentkeys.dev:aud": "sts.amazonaws.com" }, "StringLike": { - "oidc.agentkeys.dev:sub": "enclave::*" + "oidc.agentkeys.dev:sub": "enclave:::*" }, "StringNotEquals": { "aws:RequestTag/agentkeys_user_wallet": "" @@ -123,7 +122,24 @@ Key points: - `sts:TagSession` is required to grant the role permission to receive tags from the JWT claim. - `StringNotEquals aws:RequestTag/agentkeys_user_wallet ""` — reject JWTs that don't carry the isolation claim (belt-and-suspenders; our TEE always sets it, but this catches misconfigurations). -- `sub` pattern-match pins the JWT to a specific enclave build (`mrenclave` identifier). Attackers with a different enclave can't assume the role. +- `sub` pattern-match pins the JWT to a specific enclave build (`mrenclave`) *and* signer (`mrsigner`). Attackers with a different enclave or a different signer can't assume the role. See §"Why three segments in `sub`" below for the three pin-modes operators can choose. + +### 2a. Why three segments in `sub` (`mrenclave` + `mrsigner`) + +Intel SGX enclaves carry two identities: + +- `mrenclave` — hash of the enclave binary. Changes every build. +- `mrsigner` — hash of the public key that signed the enclave. Stable across builds from the same signing identity. + +Publishing both in `sub` lets each relying party pick its pin policy without any change on our side: + +| Pin policy | Trust-policy pattern | Effect | +|---|---|---| +| **Strict (exact build)** | `"enclave::*:*"` | Only enclave build v1 can assume the role. Every upgrade requires policy update. | +| **Loose (any build from our signer)** | `"enclave:*::*"` | Any signed build auto-accepts. Upgrades roll without policy changes — but a compromised build still signed by us would also pass. | +| **Explicit (both, belt-and-suspenders)** | `"enclave:::*"` | Exact build from exact signer. Strongest. | + +If `mrsigner` were omitted from `sub`, relying parties lose the *"any build from our signer"* option — every enclave upgrade becomes a fleet-wide trust-policy rewrite. Including it costs nothing (an extra 32-hex-char segment in one claim) and hands the operator the full policy spectrum. ### 3. Role's attribute-mapping for session tags From ed8fb7a4451cfe42c9467726b610a82b4d93f638 Mon Sep 17 00:00:00 2001 From: wildmeta-agent Date: Mon, 20 Apr 2026 10:43:22 +0800 Subject: [PATCH 5/5] docs(stage7b): URL-hijack defense + Heima pallet interfaces + security reorg MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Wraps up the design rounds since the last docs update. Four threads landing together: 1. Stage 7 revision + new Stage 7b in the development plan. Stage 7 (generalized OIDC provider) now explicitly documents that its cryptographic trust anchor is URL + TLS + JWKS signature, with AWS thumbprint pinning / CAA / DNSSEC / short-TTL JWT as the baseline hardening stack. `sub`-pattern pinning is presented as informational-by-default; MRENCLAVE/MRSIGNER pinning is opt-in with a documented rotation cost. New Stage 7b (one-sprint follow-on): pallet-oidc-pubkeys + off-chain watchdog + daemon-side dual-verify for AgentKeys-owned accounts. Collapses URL-hijack blast window from indefinite to 30-60s for foreign clouds; closes it entirely for our own infra. Full stage contract, tests, E2E checklist, and deferred items. 2. New required interfaces in heima-gaps. §8 pallet-oidc-pubkeys: on-chain authoritative OIDC-pubkey registry. Extrinsics register_oidc_key (TEE-origin) and revoke_oidc_key (governance), query active_oidc_keys(). Mock-server HTTP mirror at /mock/oidc-pubkeys/*. §9 pallet-enclave-successors: governance-gated authorized-MRSIGNER list used by the attested seed handoff during MRSIGNER rotation. Mock mirror at /mock/enclave-successors/*. Tracking section renumbered to §10. 3. Security reorg in blockchain-tee-architecture.md. New §7 "Security model: assumptions and attacker surface" consolidating: - 7.1 seven foundational assumptions we take as given - 7.2 what each of the four rules defends (and what compromise looks like under each) - 7.3 attacker-surface matrix by attack class (bearer theft, TEE compromise, chain attack, OIDC URL hijack, malicious enclave build, bearer replay, prefix-crossing, insider, registrar compromise) with net capability without mitigation + the mitigation we ship - 7.4 the TEE-extraction disaster-recovery case, explicit - 7.5 three routine rotation procedures (OIDC-issuer, session-JWT, MRSIGNER), each cheap under HDKD - 7.6 pointers to narrower surfaces (key-security, email-system, tag-based-access, oidc-federation) References renumbered to §8. Added §10 to key-security.md summarizing how its client-side model interacts with the new server-side threat model (they defend against different adversaries and are additive; neither replaces the other). 4. docs/spec/post-v0.1-future-work.md (new). Living backlog for items beyond v0.1: - OIDC hardening: TEE-hosted endpoint, chain-native relying parties, on-chain TLS cert fingerprints, per-tenant issuer keys - MRSIGNER rotation CLI + IaC auto-rotation - Daemon Priority C hardening (landlock, namespaces, repro builds) - Knowledge-base backend expansions (Dropbox/Box/OneDrive, IPFS) - Email follow-ups (token-authority-model spec, 2FA flow, BYO custom-domain runbook) - Enterprise integrations (SAML, SCIM, SSO-into-master-CLI) - Protocol research (K8s audience, x402 payments, attested audit feed) - Explicit graveyard of rejected ideas --- .../heima-gaps-vs-desired-architecture.md | 142 ++++++++++++- docs/spec/plans/development-stages.md | 122 +++++++++++- docs/spec/post-v0.1-future-work.md | 187 ++++++++++++++++++ wiki/blockchain-tee-architecture.md | 81 +++++++- wiki/key-security.md | 37 +++- 5 files changed, 561 insertions(+), 8 deletions(-) create mode 100644 docs/spec/post-v0.1-future-work.md diff --git a/docs/spec/heima-gaps-vs-desired-architecture.md b/docs/spec/heima-gaps-vs-desired-architecture.md index b305f9d..bccdbdc 100644 --- a/docs/spec/heima-gaps-vs-desired-architecture.md +++ b/docs/spec/heima-gaps-vs-desired-architecture.md @@ -195,7 +195,147 @@ Defer past Stage 7 — this is a hardening follow-up, not a Stage 6/7 blocker. T --- -## 8. Tracking +## 8. Gap: no `pallet-oidc-pubkeys` — OIDC trust is URL-only + +### Current + +Stage 7 OIDC federation's trust anchor is `https://oidc.agentkeys.dev` + TLS + JWKS signature. If the URL is hijacked (DNS compromise, CA misissuance, hosting takeover, deploy-pipeline compromise), an attacker can serve a rogue JWKS and mint arbitrary JWTs accepted by every downstream cloud. Heima has no on-chain authoritative registry of which OIDC-issuer pubkeys are currently valid. There is no way for a watchdog, a daemon, or a future chain-native relying party to check "is this JWKS still the legitimate one?" + +### Desired (Stage 7b — URL-hijack defense) + +A new pallet `pallet-oidc-pubkeys` holds the authoritative list: + +```rust +// Storage +pub struct OidcKey { + pub kid: BoundedVec>, + pub pubkey: BoundedVec>, // raw ES256 point, uncompressed + pub attestation_quote: BoundedVec>, // DCAP quote at registration time + pub active_from: BlockNumber, + pub active_until: BlockNumber, // 0 = no expiry + pub revoked_reason: Option>>, +} + +#[pallet::storage] +pub type OidcKeys = StorageMap<_, Blake2_128Concat, KeyId, OidcKey>; + +// Extrinsics +fn register_oidc_key( + origin: OriginFor, + kid: Vec, + pubkey: Vec, + attestation_quote: Vec, + active_from: BlockNumber, + active_until: BlockNumber, +) -> DispatchResult; +// Authorized origin: TEE-attested submitter only (reuse existing TEE-submitter check). + +fn revoke_oidc_key( + origin: OriginFor, + kid: Vec, + reason: Vec, +) -> DispatchResult; +// Authorized origin: governance; intended for fast-track incident response. + +// Queries (runtime API) +fn active_oidc_keys(at: BlockNumber) -> Vec<(KeyId, Pubkey)>; +fn get_oidc_key(kid: KeyId) -> Option; + +// Events +OidcKeyRegistered { kid, active_from, active_until } +OidcKeyRevoked { kid, reason } +``` + +**Mock-server mirror** (for local dev and Stage 4/5 tests that must not require a real Heima node): + +``` +POST /mock/oidc-pubkeys/register { kid, pubkey, quote, active_from, active_until } +POST /mock/oidc-pubkeys/revoke { kid, reason } +GET /mock/oidc-pubkeys/active → [ { kid, pubkey } ] +GET /mock/oidc-pubkeys/{kid} → { kid, pubkey, quote, active_from, active_until, revoked_reason? } +``` + +The mock persists to SQLite (same pattern as the credential store). The daemon's dual-verify code path points at `/mock/oidc-pubkeys/active` under `AGENTKEYS_OIDC_REGISTRY_URL=http://127.0.0.1:8090/mock/oidc-pubkeys`. + +### Impact + +Without this pallet, `oidc.agentkeys.dev` is a single point of failure for the entire Stage 6/7 federation story. URL compromise is silent and total. Stage 7b's watchdog needs an authoritative "what should the JWKS say right now?" comparison source, and that source has to live somewhere trust can't be short-circuited by the same attack that compromised the URL. Chain is the only candidate. + +### Migration path + +- **Pallet landing:** net-new; no upstream grandfathering concerns. Heima-fork PR or upstream PR depending on Litentry's position on AgentKeys-specific pallets. +- **Deploy order:** pallet first, mock-server mirror second, watchdog third, daemon feature flag fourth. Each step is independently testable. +- **Rollback:** the feature is additive; disable the watchdog + flip the daemon feature flag off, and you're back at Stage 7's URL-only trust. + +Depends on §3 (OIDC provider exists) landing first. No dependency on §2 (HDKD) — the pubkey field is algorithm-agnostic. + +--- + +## 9. Gap: no `pallet-enclave-successors` — MRSIGNER rotation has no on-chain governance anchor + +### Current + +Heima has no on-chain list of authorized enclave MRSIGNERs. If we rotate the enclave-signing key (new MRSIGNER_B replaces MRSIGNER_A), the old enclave has no authoritative way to decide whether a peer claiming MRSIGNER_B is a legitimate successor during the attested-seed-handoff step. The choice devolves to hard-coded config or out-of-band operator coordination, both of which undermine Rule #1. + +See the MRSIGNER-rotation discussion in earlier design review: under HDKD (gap §2), rotation reduces to a single attested seed handoff; the only thing it needs is a trusted answer to *"is MRSIGNER_B authorized?"*. + +### Desired + +A new small pallet `pallet-enclave-successors`: + +```rust +pub struct AuthorizedMrSigner { + pub mrsigner: [u8; 32], + pub effective_from: BlockNumber, + pub rationale_uri: BoundedVec>, // link to governance proposal / audit report +} + +#[pallet::storage] +pub type AuthorizedMrSigners = + StorageValue<_, BoundedVec>, ValueQuery>; + +fn authorize_mrsigner( + origin: OriginFor, + mrsigner: [u8; 32], + effective_from: BlockNumber, + rationale_uri: Vec, +) -> DispatchResult; +// Authorized origin: governance (collective / referendum). + +fn deauthorize_mrsigner( + origin: OriginFor, + mrsigner: [u8; 32], +) -> DispatchResult; +// Authorized origin: governance. For removing a compromised signer. + +fn authorized_mrsigners() -> Vec<[u8; 32]>; +``` + +**Mock-server mirror:** + +``` +POST /mock/enclave-successors/authorize { mrsigner, effective_from, rationale_uri } +POST /mock/enclave-successors/deauthorize { mrsigner } +GET /mock/enclave-successors → [ mrsigner, ... ] +``` + +Used by the enclave startup code during MRSIGNER rotation: before the old enclave opens an attested TLS channel to a new enclave claiming MRSIGNER_B, it queries the chain's `authorized_mrsigners()` and confirms MRSIGNER_B is listed. + +### Impact + +Without the pallet, MRSIGNER rotation is either (a) impossible without a flag-day coordinated restart, or (b) gated on trust anchors that live outside the chain — both of which break Rule #1 and Rule #4. With the pallet, rotation is a routine governance extrinsic + attested seed handoff, and the derived-keys story (HDKD §2) makes everything downstream (JWKS, custodial wallets, DKIM DNS) continue working without changes. + +### Migration path + +- **Pallet landing:** net-new; small; governance-gated. No upstream compatibility concerns. +- **Usage deployment:** enclave startup code reads `authorized_mrsigners()` before accepting a successor's RA quote. This is a small change in the TEE worker — one pallet query + one comparison. +- **Companion doc:** "MRSIGNER rotation runbook" in Stage 7b's operator-facing documentation; covers governance proposal → pallet update → enclave coordination → grace window → old-enclave decommission. + +Depends on §2 (HDKD) landing first, because the rotation is only cheap under HDKD (without HDKD, rotation is a full re-issuance of every sealed key). + +--- + +## 10. Tracking - Each gap is owned as a separate issue in the `litentry/agentKeys` repo (TBD — file when this doc merges). - When a gap closes, mark the section **RESOLVED** with the merge commit(s) and the resolution path (A/B/C from §2). diff --git a/docs/spec/plans/development-stages.md b/docs/spec/plans/development-stages.md index 86942e6..4dae267 100644 --- a/docs/spec/plans/development-stages.md +++ b/docs/spec/plans/development-stages.md @@ -968,6 +968,25 @@ See [`wiki/oidc-federation.md`](../../../wiki/oidc-federation.md) for the full d 4. **JWT format is consistent across consumers** — same `sub`, same `aud` varies per consumer, same `agentkeys_*` claim set for tag-based isolation. 5. **Consumer-side per-user isolation** — each consumer's trust policy conditions on `PrincipalTag` / attribute-mapping from the JWT's `agentkeys_user_wallet` claim. +### Cryptographic trust anchor in Stage 7: URL + TLS + JWKS signature + +The trust that AWS / GCP / Ali Cloud place in our JWTs is rooted in: + +- **The issuer URL** — `https://oidc.agentkeys.dev` is registered once per consumer as an OIDC provider. The consumer fetches our discovery doc and JWKS from this URL. +- **The TLS certificate** on that URL — protects the JWKS fetch against on-path attackers. Consumer libraries typically also validate that the cert chains to a trusted root CA. +- **The JWKS signature** — each JWT is signed with `derive("oidc/issuer/v1")` (ES256); consumers verify the signature using the current JWK served at the URL. + +**Hardening we ship with Stage 7 (all standard belt-and-suspenders, zero blockchain dependency):** + +- **AWS OIDC thumbprint pinning** — register the TLS cert's SHA-1 thumbprint on each consumer's AWS OIDC provider. Reduces the attack surface from "any CA" to "our specific cert." Documented in the AWS registration runbook and emitted by `agentkeys oidc register aws`. +- **CAA DNS records** on `agentkeys.dev` — only whitelisted CAs may issue for the domain. +- **DNSSEC** where the registrar supports it. +- **Short-lived JWTs (≤5 min `exp`)** — bounds a forged JWT's useful window to minutes even if the URL is compromised mid-flight. +- **Short `Cache-Control` on the JWKS URL** — our published cache directive is short; AWS's JWKS cache is several hours by default, which we accept in Stage 7 and tighten via Stage 7b. +- **Optional `sub`-pattern pinning** by relying parties is **informational-by-default** — the canonical trust-policy examples in [`wiki/tag-based-access.md`](../../../wiki/tag-based-access.md) pin on the issuer URL plus claim conditions (`agentkeys_user_wallet`, `agentkeys_operation`); `mrenclave`/`mrsigner` pinning is presented as an opt-in hardening with a documented rotation cost. + +**What this explicitly does *not* defend against in Stage 7 alone:** compromise of the issuer URL itself — DNS hijack, CA misissuance, hosting takeover, or deploy-pipeline compromise. An attacker who controls the URL can replace the JWKS and mint arbitrary JWTs. Stage 7b below is the defense-in-depth layer that collapses the blast window for this class of attack from indefinite to seconds. + ### Crates / Packages - Primarily **operator/documentation work** — the TEE signing path already exists from Stage 6. Stage 7 adds: @@ -1039,10 +1058,105 @@ agentkeys run test-agent -- \ ### Deferred past Stage 7 -- Enterprise-specific advanced integrations (SAML federation, SCIM provisioning) -- On-chain record of active OIDC-issuer pubkey fingerprint for external auditors -- Per-tenant OIDC issuer URLs (`oidc.agentkeys.dev/tenant//`) with isolated issuer keys per tenant -- Workload Identity Federation into consumer clouds like Cloudflare, Fly, or others AgentKeys users may prefer +- Enterprise-specific advanced integrations (SAML federation, SCIM provisioning) — tracked in [`docs/spec/post-v0.1-future-work.md`](../post-v0.1-future-work.md) §7 +- Per-tenant OIDC issuer URLs (`oidc.agentkeys.dev/tenant//`) with isolated issuer keys per tenant — §2.4 of future-work +- TEE-hosted OIDC endpoint (attestation-rooted TLS, not URL-rooted) — §2.1 of future-work +- Workload Identity Federation into consumer clouds like Cloudflare, Fly, etc. + +--- + +## Stage 7b: URL-hijack defense (chain-anchored JWKS + watchdog) + +**Status (2026-04-20):** one-sprint follow-on to Stage 7. Small, cheap, large security win. + +**Goal:** close the gap between "Stage 7 cryptographic trust anchor is URL + TLS + signature" and "we want chain-anchored trust." Specifically: make URL compromise **detectable and revocable in seconds** rather than silently catastrophic. Foreign clouds (AWS / GCP / Ali) still can't speak Substrate, so Stage 7b targets detection + response, not prevention on foreign clouds. Prevention on foreign clouds is [`post-v0.1-future-work.md`](../post-v0.1-future-work.md) §2.1 / §2.3. + +**Why this is Stage 7b and not Stage 8:** Stage 7's URL-only trust anchor is the single largest unmitigated class of risk left in the architecture. The fix (one pallet + one watchdog) is ~1 sprint. Deferring past v0.1 leaves a known, understood hole open for the entire v0.1 window. Shipping it inside the v0.1 milestone is cheap insurance. + +### Architecture summary + +Two new on-chain primitives plus one off-chain watchdog: + +1. **`pallet-oidc-pubkeys`** — on-chain authoritative registry of currently valid OIDC-issuer public keys. + - Extrinsic: `register_oidc_key(kid, pubkey, attestation_quote, active_from, active_until)` — callable only by the TEE via the existing TEE-submitter pattern. + - Extrinsic: `revoke_oidc_key(kid, reason)` — callable by governance (fast-track for incident response). + - Query: `active_oidc_keys() → Vec<(kid, pubkey)>`. +2. **`pallet-enclave-successors`** — on-chain list of authorized MRSIGNERs for MRSIGNER-rotation handoffs (see [`heima-gaps-vs-desired-architecture.md`](../heima-gaps-vs-desired-architecture.md) §9). + - Extrinsic: `authorize_mrsigner(mrsigner, effective_from)` — governance-gated. + - Query: `authorized_mrsigners() → Vec`. +3. **OIDC watchdog** — a small off-chain process that every ~30 s fetches both: + - Chain: `pallet-oidc-pubkeys::active_oidc_keys()`. + - URL: `https://oidc.agentkeys.dev/.well-known/jwks.json`. + + On mismatch → (a) page on-call, (b) auto-call `aws iam remove-client-id-from-open-id-connect-provider` (or equivalent per cloud) on AgentKeys-owned federation trusts to cut their ability to accept our JWTs immediately, (c) file an on-chain `revoke_oidc_key` extrinsic with reason = `jwks_url_drift_detected`. + +4. **Daemon-side dual verification for AgentKeys-owned relying parties.** Our own daemon, before exchanging a JWT at AWS STS against *our* accounts, queries `pallet-oidc-pubkeys` and rejects JWTs whose `kid` is not in `active_oidc_keys()`. Closes the URL-hijack hole entirely for our own infra. Customer BYO accounts still rely on the URL — they benefit from detection (watchdog) but not from dual verification unless they opt in. + +### Crates / Packages + +- `pallets/oidc-pubkeys/` (new, in the Heima fork) — small pallet; extrinsics + storage + events. +- `pallets/enclave-successors/` (new) — even smaller; stores the authorized-MRSIGNER list. +- `crates/agentkeys-oidc-watchdog/` (new) — standalone binary; Substrate RPC client + HTTPS fetcher + alerting + cloud-revocation adapter per cloud. +- `crates/agentkeys-daemon/` — extend OIDC-JWT-exchange code path with the on-chain kid check (behind feature flag `chain_verified_oidc`, on by default for AgentKeys-owned accounts, off by default for customer BYO accounts). +- **Mock-side mirrors:** `crates/agentkeys-mock-server/src/handlers/oidc_pubkeys.rs` and `enclave_successors.rs` replicate the two pallets' extrinsics + queries over HTTP so local dev + Stage 4/5 tests don't need a Heima node. + +### Deliverables + +- [ ] `pallet-oidc-pubkeys` in the Heima fork; extrinsics + storage + events; unit tests +- [ ] `pallet-enclave-successors`; same shape +- [ ] Mock-server HTTP endpoints mirroring both pallets, under `/mock/oidc-pubkeys/*` and `/mock/enclave-successors/*` +- [ ] `agentkeys-oidc-watchdog` binary: chain + URL fetch, mismatch detection, per-cloud revocation adapter (AWS first; GCP / Ali in follow-ups) +- [ ] Daemon dual-verification code path (`chain_verified_oidc` feature) +- [ ] Operator runbook: "OIDC URL drift incident response" (detection → revoke in AWS OIDC provider list → rotate cert → re-publish → re-register) +- [ ] Integration test: simulate URL drift by serving a rogue JWKS on a test endpoint; assert watchdog fires < 60 s; assert AWS revocation succeeds; assert daemon dual-verify rejects rogue JWTs + +### Tests + +| Test | What it validates | +|---|---| +| `pallet::oidc_pubkeys_register_by_tee_only` | Non-TEE submitter cannot call `register_oidc_key`; returns `NotAuthorizedSubmitter`. | +| `pallet::oidc_pubkeys_query_returns_active_only` | Keys past `active_until` are excluded from `active_oidc_keys()`. | +| `pallet::enclave_successors_governance_gated` | Non-governance call to `authorize_mrsigner` fails. | +| `watchdog::detects_jwks_drift_under_60s` | Rogue JWKS served on test URL; watchdog polls; mismatch detected; alert fired; AWS revocation invoked. | +| `daemon::dual_verify_rejects_unknown_kid` | JWT signed by an off-chain kid is rejected before reaching AWS STS; metric `oidc.dual_verify.rejected` increments. | +| `daemon::dual_verify_byo_opt_in` | Customer BYO mode (`chain_verified_oidc=false`) passes the JWT through to AWS without the chain check (preserves current Stage 7 behavior). | +| `mock::oidc_pubkeys_endpoint_parity` | Mock server `/mock/oidc-pubkeys/active` returns the same shape as the pallet query. | + +### Reviewer E2E Checklist + +```bash +# Spin up mock chain + mock OIDC URL + watchdog +cargo run --release -p agentkeys-mock-server & +cargo run --release -p agentkeys-oidc-watchdog --config harness/watchdog-test.toml & + +# Register an OIDC kid on the mock chain +curl -X POST http://127.0.0.1:8090/mock/oidc-pubkeys/register \ + -d '{"kid":"v1","pubkey":"...","active_from":0,"active_until":999999999}' + +# Serve a matching JWKS on the URL side — watchdog stays silent +# Flip the URL-side JWKS to a different key — watchdog should alert within 30 s +python harness/serve-rogue-jwks.py & +# Expected: within 30 s, watchdog logs "JWKS_URL_DRIFT_DETECTED" and fires the revocation adapter + +# Verify daemon dual-verify path +agentkeys run test-agent -- \ + aws s3 ls s3://agentkeys-mail/ +# On our own infra: succeeds with chain-verified kid. +# If rogue JWKS is active and daemon is in chain-verify mode: fails closed with DualVerifyRejected. +``` + +### Stage Contract + +- **Inputs:** Stage 7 complete (`oidc.agentkeys.dev` live, public, working). +- **Outputs:** URL-hijack attacks on the JWKS endpoint are detected within 30–60 s and auto-revoked on AgentKeys-owned federation trusts. Our own daemon refuses to exchange JWTs whose kid is not on-chain. `pallet-enclave-successors` is available for the Stage 9 MRSIGNER-rotation procedure. +- **Done when:** All 7 tests pass. Drift-simulation integration test succeeds end-to-end. Runbook reviewed by security. + +### Deferred past Stage 7b + +- TEE-hosted OIDC endpoint (prevents compromise instead of just detecting it) — [`post-v0.1-future-work.md`](../post-v0.1-future-work.md) §2.1 +- On-chain TLS cert fingerprints with dual-update requirement — §2.3 +- Per-tenant OIDC issuer keys — §2.4 +- GCP + Ali revocation adapters in the watchdog (AWS-first for v0.1; others follow) --- diff --git a/docs/spec/post-v0.1-future-work.md b/docs/spec/post-v0.1-future-work.md new file mode 100644 index 0000000..3c54c5b --- /dev/null +++ b/docs/spec/post-v0.1-future-work.md @@ -0,0 +1,187 @@ +# Post-v0.1 Future Work + +**Status:** living backlog — items deferred past v0.1. +**Purpose:** capture design directions, hardening work, and extensions that are valuable but not on the v0 or v0.1 critical path. Every item here should eventually be promoted to `docs/spec/plans/development-stages.md` with a concrete stage number, or dropped. +**Last updated:** 2026-04-20. + +## 1. How this doc relates to the stage plan + +- [`docs/spec/plans/development-stages.md`](./plans/development-stages.md) — the stages we are committed to shipping. +- [`docs/spec/heima-gaps-vs-desired-architecture.md`](./heima-gaps-vs-desired-architecture.md) — deltas between upstream `litentry/heima` and the desired architecture; blockers for current stages. +- **This doc** — ideas that do not block current stages. Items here come from design reviews where we identified a better-but-bigger option, deferred it, and shipped the cheap version. + +An item moves out of here when it (a) gets promoted to a numbered stage, or (b) is explicitly dropped as not worth pursuing. + +--- + +## 2. OIDC-federation hardening (beyond Stage 7b) + +Stage 7 ships `https://oidc.agentkeys.dev` with URL + TLS + JWKS-signature as the cryptographic trust anchor, hardened with AWS thumbprint pinning, CAA records, short-lived JWTs, and standard belt-and-suspenders. Stage 7b (see stage plan) adds `pallet-oidc-pubkeys` + a watchdog for fast detection-and-revocation of URL compromise. The items below go further. + +### 2.1 TEE-hosted OIDC endpoint (attestation-rooted, not URL-rooted) + +**Today (Stage 7 + 7b):** the JWKS is served by a thin HTTPS proxy. An attacker who compromises the proxy (DNS, CA, hosting, deploy pipeline) can swap the JWKS before the watchdog fires. The Stage 7b watchdog collapses the blast window from indefinite to ~60 seconds, but the endpoint itself is still not in the TEE. + +**Desired:** the OIDC discovery and JWKS endpoints are served from inside the enclave, TLS-terminated by a cert whose private key is derived from the master seed at `derive("oidc/tls/v1")`. Cert is issued via ACME by the enclave itself. DNS-01 challenge answered by a dedicated subdomain that the enclave also signs. Compromise of the hosting tier (VM, K8s node, CDN) becomes irrelevant because the attacker cannot terminate TLS. + +**Cost:** non-trivial. ACME client inside the enclave, DNS-01 plumbing, and the hosting shell becomes a dumb TCP forwarder. Probably 2–3 weeks. + +**When to promote:** if URL-compromise risk materializes (close call observed), or if an enterprise customer requires this property. + +### 2.2 Chain-native relying parties + +**Today:** AWS/GCP/Ali can only verify JWTs via the HTTPS JWKS endpoint — they don't speak Substrate. + +**Desired:** a Heima client library (WASM-compatible) that third-party services running inside or adjacent to the Heima network can use to verify JWTs directly against `pallet-oidc-pubkeys`. URL-hijack is irrelevant to these consumers because they never touch the URL. + +**Cost:** a lightweight verifier crate + docs. 1 week. + +**When to promote:** first partner service that runs on or near Heima and can consume chain-anchored trust. + +### 2.3 On-chain TLS-cert fingerprints + dual-update requirement + +**Today:** AWS thumbprint list holds a hash of the JWKS TLS cert. Rotation = update the thumbprint list. + +**Desired:** a new extrinsic `register_oidc_tls_cert_fingerprint(fingerprint, active_from, active_until)`. Deploy pipeline enforces: no TLS cert rotation without a matching on-chain entry. Attacker who compromises only the hosting provider cannot silently replace the cert — they'd also need the chain governance key. + +**Cost:** pallet extension (trivial) + deploy-pipeline gate (medium; requires CI/CD integration). + +**When to promote:** together with §2.1 (makes the whole OIDC-trust story chain-anchored end-to-end). + +### 2.4 Per-tenant OIDC issuer URLs + +**Today:** one shared issuer at `oidc.agentkeys.dev` with one ES256 key. All tenants share the same issuer-key blast radius. + +**Desired:** each enterprise tenant gets its own issuer URL `oidc.agentkeys.dev/tenant//` backed by its own derived key at `derive("oidc/tenant//v1")`. Compromise of one tenant's issuer key does not affect other tenants' federation. + +**Cost:** multi-issuer routing in the proxy + per-tenant discovery docs. 1 week. + +**When to promote:** first enterprise customer that requires tenant-isolated issuer keys (likely a contractual ask). + +--- + +## 3. MRSIGNER rotation tooling + +Stage 7b covers the rotation *mechanism* (attested seed handoff via inter-enclave remote attestation, governance-authorized successor list via `pallet-enclave-successors`). The items below smooth the relying-party-side experience. + +### 3.1 `agentkeys oidc-rotate-trust` CLI + +A small CLI that takes the operator's own cloud credentials and patches the trust policy on an IAM role / GCP WIF provider / Ali RAM role: + +``` +agentkeys oidc-rotate-trust --cloud aws --role-arn ... --add-mrsigner +agentkeys oidc-rotate-trust --cloud aws --role-arn ... --remove-mrsigner +``` + +**Cost:** 1 week per cloud (4 clouds = 4 weeks ideal; 1 cloud = 1 week minimum viable). + +**When to promote:** first MRSIGNER rotation event, or when ≥3 customers are using `sub`-pattern MRSIGNER pinning. + +### 3.2 Automated rotation orchestration for our own infra + +Our own AWS/GCP/Ali accounts are managed by IaC. A GitHub Action can watch `pallet-enclave-successors` for new authorized MRSIGNERs and auto-open a PR that flips the trust-policy variable `MRSIGNER=[A] → [A,B] → [B]` on the timeline dictated by the grace window. + +**Cost:** 1 sprint including IaC changes. + +**When to promote:** first MRSIGNER rotation event (same as §3.1). + +--- + +## 4. Hardening follow-ups to the daemon credential lifecycle + +From [`wiki/key-security.md`](../../wiki/key-security.md) §9 "Daemon Priority C" — items explicitly tagged as v0.2+. + +### 4.1 Landlock / Pledge-style syscall containment for the daemon + +Unix-only; macOS has `sandbox_init` but the ergonomics are ugly. Restrict the daemon to exactly the syscalls it needs. Defense-in-depth against supply-chain attacks on transitive dependencies. + +### 4.2 OS-level isolation (namespaces, jails) + +Run the daemon in a user namespace or FreeBSD jail. Complements Stage 3 kernel hardening. + +### 4.3 Reproducible daemon binary builds + +Deterministic builds so that `mrenclave`-style equivalent applies to the daemon: `daemon_hash` from source tree is reproducible by auditors. Establishes "this running daemon matches this tag" without trusting the build pipeline. + +--- + +## 5. Knowledge-base backend expansions + +See [`wiki/knowledge-storage.md`](../../wiki/knowledge-storage.md) for the current four-candidate matrix (GitHub / AWS S3 / Google Drive / Ali Cloud OSS). + +### 5.1 Dropbox / Box / OneDrive as additional non-dev backends + +If a user segment emerges that prefers these over raw S3. + +### 5.2 Local-first / IPFS / Arweave for crypto-native users + +A chain-native audience may want memory stored on decentralized storage. The credential-broker shape still works — we mint an ephemeral Filecoin / IPFS signing key from the master seed, daemon uses it client-side. + +### 5.3 Cross-backend migration tooling + +User switches from hosted S3 to BYO GitHub — we need an export/import utility that preserves grants and audit trail continuity. + +--- + +## 6. Email system (beyond Stage 6+7) + +From [`wiki/email-system.md`](../../wiki/email-system.md) §"Open items / follow-ups". + +### 6.1 `docs/spec/token-authority-model.md` — the generalized three-layer spec + +We currently describe the `TokenAuthority` / `TokenBroker` / `GrantStore` abstraction inline in email-system.md. Once three or more credential types (session tokens, email, knowledge base) share it, extract to a standalone spec. + +### 6.2 Email-2FA approval flow spec + +The [#11](https://github.com/litentry/agentKeys/issues/11) biometric gate needs a mobile-fallback-via-email section: message templates, magic-link vs 6-digit-code tradeoff, ≤10-minute TTL, replay protection via single-use nonce, CSRF on the magic-link endpoint. + +### 6.3 BYO custom-domain email operator runbook + +Stage 7 mentions this in deferred items; when a customer brings their own domain, we need a DNS configuration doc, MAIL FROM bounce-handling subdomain setup, DMARC alignment walkthrough. Distinct from the current Workspace DWD runbook at `docs/stage5-workspace-email-setup.md`. + +--- + +## 7. Enterprise integrations + +From Stage 7 deferred items. + +### 7.1 SAML federation + +Enterprises with legacy SAML stacks. Our TEE would need a SAML assertion-signing path; probably reuses the `oidc/issuer/v1` key with a SAML-signing adapter. + +### 7.2 SCIM provisioning + +When an enterprise onboards / offboards users, their IdP pushes updates via SCIM. Our backend would need a SCIM receiver that creates/revokes grants. + +### 7.3 Enterprise SSO into our master CLI + +Today the master CLI authenticates via our own flow. Enterprises will want "my employees sign in to AgentKeys via Okta / AzureAD / Google Workspace." Requires the OIDC-consumer direction (we trust their IdP), not just the OIDC-producer direction (Stages 6/7). + +--- + +## 8. Protocol-level / research items + +Exploratory work with unclear ROI. Park here so we don't re-open the same conversations. + +### 8.1 Kubernetes-native audience for TEE JWTs + +K8s ServiceAccount projection accepts external OIDC. Our JWTs could directly authenticate pods. Worth testing in v0.2. + +### 8.2 On-chain payment rails on Base (x402) + +If we extend the ES256 OIDC path to sign HTTP-payment requests as well, the same federation pattern covers payments. Needs an x402 implementation audit. + +### 8.3 Attested audit-event feed for external verifiers + +A signed-by-TEE audit-event feed that external parties can subscribe to without Heima-node operation. Useful for regulators / compliance tools. Requires a transport (Kafka? HTTP stream?) and a bootstrapping trust anchor. + +--- + +## 9. Graveyard (items explicitly rejected) + +Items we discussed and decided not to pursue. Listed here so we don't re-litigate. + +- **AgentMail as a first-party email backend.** Their infra is AWS SES underneath; our SES impl gives us the things their SaaS does not (chain audit, per-child isolation via grants, no static cloud creds, broker-not-proxy). The three-layer abstraction still allows a customer to plug `AgentMailAuthority` if they want — we just don't ship it. +- **Static IAM access keys inside the TEE for AWS/GCP.** Superseded by OIDC federation; violates "no long-lived cloud credentials at rest." +- **Per-user IAM roles on AWS.** Doesn't scale past a few thousand users; superseded by PrincipalTag-via-JWT-claim (see [`wiki/tag-based-access.md`](../../wiki/tag-based-access.md)). +- **Reading the user's personal Gmail for OTPs.** Collapses agent-mail and identity-mail into one inbox; fragile against Google's policy changes; see [`wiki/email-system.md`](../../wiki/email-system.md) §"What this rules out." diff --git a/wiki/blockchain-tee-architecture.md b/wiki/blockchain-tee-architecture.md index 805ae3f..8281e2f 100644 --- a/wiki/blockchain-tee-architecture.md +++ b/wiki/blockchain-tee-architecture.md @@ -545,7 +545,86 @@ No exceptions. --- -## 7. References +## 7. Security model: assumptions and attacker surface + +This section consolidates the trust assumptions the four rules rely on and the attacker surfaces those assumptions expose. It is the authoritative security summary for the architecture; individual wiki pages (e.g. [Key Security](key-security), [OIDC Federation](oidc-federation), [Tag-Based Access](tag-based-access)) cover narrower surfaces in more detail. + +### 7.1 Assumptions we take as given + +These are the foundational trust assumptions. If any breaks, the architecture's guarantees do not hold. + +| # | Assumption | What breaks if it fails | +|---|---|---| +| A1 | The TEE's attestation primitive (Intel SGX DCAP today) is sound — `mrenclave` + `mrsigner` + the attestation report cryptographically bind to the running code. | Rule #2 collapses — an attacker could run arbitrary code while claiming to be our enclave. | +| A2 | The SGX master-seed sealing primitive (`SEAL_POLICY_MRSIGNER`) is sound. A sealed blob is readable only by enclaves sharing the same MRSIGNER. | HDKD collapses — the master seed leaks, all derived keys leak. | +| A3 | SLIP-0010 HDKD is cryptographically sound for the algorithm families we use (Ed25519, secp256k1, NIST P-256/ES256). | Derived-key isolation breaks between purposes (`dkim/*`, `wallet/*`, `oidc/*`). | +| A4 | The Heima parachain's finality and validator set are honest (BABE/GRANDPA assumptions). | Rule #1 collapses — chain state can be rewritten; grants and audit events lose meaning. | +| A5 | At least one TEE worker is running unmodified, attested code (liveness). | We can still verify old chain state, but we can't mint new credentials or sign new extrinsics until a worker recovers. | +| A6 | Standard internet PKI works for the Stage 7 OIDC URL: DNS resolves honestly, CAs don't misissue, the hosting tier isn't compromised. | URL-hijack window opens — see 7.3 below. Stage 7b (`pallet-oidc-pubkeys` + watchdog) collapses the blast window but does not eliminate the assumption. | +| A7 | The operator's deploy pipeline for static OIDC artifacts (discovery doc, JWKS) has integrity. | Same class as A6 — attacker replaces what the URL serves. Stage 7b mitigation applies. | + +Assumptions A1–A5 are the "Heima + TEE" trust core and are shared with every other service built on Heima. A6–A7 are specific to our OIDC federation path and are the ones Stage 7b is designed to harden. + +### 7.2 What the four rules actually defend + +Rule-by-rule, what compromise **looks like under each rule** and what the blast radius is when the rule holds: + +| Rule | Holds means… | If compromised… | +|---|---|---| +| #1 Chain stores everything persistent | No off-chain state is load-bearing. Every grant, credential, audit event is reconstructible from chain + TEE. | An attacker who compromises our infrastructure (hosting, deploy pipeline, databases) cannot forge grants or hide audit events — the chain is still there. | +| #2 TEE holds all private keys | No operational key (shielding, session-JWT, OIDC-issuer, per-user wallet, per-domain DKIM) exists outside the enclave. All derived from one sealed master seed via HDKD. | If the TEE is compromised, *all* operational keys are compromised. This is the "total compromise" case — see 7.4. If the TEE is not compromised, nothing short of extracting the master seed from SGX silicon gets you a key. | +| #3 Clients hold only a JWT | The master CLI and agent daemon never hold a private key. If a client is compromised, the attacker gets a 30-day bearer at worst, not signing authority. | Leaked bearer → attacker impersonates until expiration (≤30 d) or on-chain revocation (≤6 s). They cannot forge new bearers, cannot sign extrinsics, cannot forge OIDC JWTs. | +| #4 Credential broker, not operation proxy | Per-operation compute lives on the daemon. Our backend never holds operation-level data (email bodies, knowledge-base documents, trade payloads). | Breach of our operation path is bounded to metadata we already store — grants, audit events, addresses. Operation content stays on the user's daemon and the vendor's service. | + +### 7.3 Attacker surface by attack class + +Every attack vector we design against, what it enables, and which rule / Stage-7b layer blunts it. + +| Attack class | Requires attacker to… | Net capability without mitigation | Mitigation | +|---|---|---|---| +| **Bearer token theft** (malware on user's machine) | Read keychain / file storage of the master CLI or daemon | Impersonate user until token expires or is revoked | Short TTL (30 d), on-chain revocation (≤6 s), keychain ACL (Stage 3), memory hygiene (Stage 8) | +| **TEE compromise** (hardware or microcode attack) | Extract master seed from SGX | Full, permanent compromise of all users | Out of scope for v0.1 — assumption A1/A2. DCAP + enclave upgrade path + MRSIGNER rotation (§7.5) are the operational responses | +| **Chain attack** (51% validator collusion) | Finalize malicious blocks on Heima | Forge grants, hide audit events | Assumption A4 — shared with all Heima applications | +| **OIDC URL hijack** (DNS / CA / hosting / deploy compromise) | Replace `oidc.agentkeys.dev` with attacker-controlled JWKS | Mint arbitrary JWTs accepted by AWS / GCP / Ali; federate to any user's cloud prefix | Stage 7 baseline: AWS thumbprint pinning, CAA, DNSSEC, 5-min JWT TTL. Stage 7b: `pallet-oidc-pubkeys` on-chain authoritative registry + watchdog (30–60 s detection) + daemon-side dual verify for our own infra. | +| **Malicious enclave build signed by our MRSIGNER** | Compromise our enclave-signing key *and* push a build through our release pipeline | Mint JWTs with any `sub`/claims; all consumers pinning on MRSIGNER accept | Governance-gated `pallet-enclave-successors` (only authorized MRSIGNERs are accepted during seed handoff); release pipeline review; relying parties can opt into MRENCLAVE pinning (strict mode) for highest-security buckets | +| **Bearer replay across audiences** | Steal a JWT minted for one `aud` | Use it against a different cloud | `aud` binding at JWT level; consumer-side `aud` condition in trust policies; 5-min TTL | +| **Prefix-crossing on shared buckets** (user A tries to read user B's data) | Mint or obtain a JWT with wrong `agentkeys_user_wallet` value | Access another user's prefix on shared S3/OSS/GCS | PrincipalTag condition on bucket policy — enforced by the cloud, not us (see [Tag-Based Access](tag-based-access)) | +| **Insider attack** (AgentKeys operator) | Access our deploy / AWS / hosting creds | Depends — see below | Chain-audit means every minted JWT is permanently logged; insider actions are forensically attributable. AWS-account SCPs, least-privilege IAM, CloudTrail → chain audit for tamper-evidence. | +| **Registrar / DNS provider compromise** | Compromise our domain registrar | Silent URL takeover (subset of URL hijack class) | DNSSEC where supported; registrar-lock; monitoring via CT logs; Stage 7b watchdog detects drift | + +### 7.4 The "total compromise" case: TEE extraction + +The one failure mode this architecture cannot recover from *in place* is extraction of the master seed from a live enclave. That requires defeating SGX's sealing + attestation guarantees, which is an assumption-A1/A2 break. The operational response is: + +1. Detect: attestation-verification failures, out-of-band intelligence, unexplained signing-key usage patterns. +2. Contain: revoke all active OIDC keys via `pallet-oidc-pubkeys::revoke_oidc_key`; freeze the affected enclave's submitter origin; pause grant issuance. +3. Rotate: stand up a new enclave with a new master seed (fresh MRSIGNER if the signing key is also suspected); users must re-authenticate; on-chain custodial wallet addresses change (new derivations from new seed). +4. Recover: chain state (grants, audit, non-custodial chain state) survives intact. User-scoped credentials (API keys stored in the old TEE) are lost and must be re-provisioned. + +This is a known disaster-recovery mode, not a routine operation. It is documented here so the scope is explicit. + +### 7.5 Routine key-rotation procedures + +Three rotation paths, each routine under HDKD + the new pallets (7b): + +- **OIDC-issuer key rotation** (`oidc/issuer/v1` → `v2`): new derivation path; both keys in JWKS during the grace window; `pallet-oidc-pubkeys` records both `kid`s as active; consumer JWKS cache refreshes naturally. No external party action required. +- **Session-JWT key rotation** (`issuer/jwt/v1` → `v2`): same pattern, but the session-JWT key is internal (not on public JWKS). Clients re-authenticate gradually as old tokens expire; no coordinated flip. +- **MRSIGNER rotation** (new enclave-signing key): one attested seed handoff from the old enclave to the new one; `pallet-enclave-successors::authorize_mrsigner(new_mrsigner, ...)` extrinsic lands before the handoff; JWKS / custodial wallets / DKIM DNS are **unchanged** because the master seed survived. Relying parties who pinned on MRSIGNER do a one-time trust-policy update (automatable via the `agentkeys oidc-rotate-trust` CLI — see [`docs/spec/post-v0.1-future-work.md`](../docs/spec/post-v0.1-future-work.md) §3.1). + +See [`docs/spec/heima-gaps-vs-desired-architecture.md`](../docs/spec/heima-gaps-vs-desired-architecture.md) §8 and §9 for the pallet specifications and the MRSIGNER-rotation runbook. + +### 7.6 What this section does *not* cover + +Narrower surfaces with their own dedicated pages: + +- Daemon-side credential lifecycle (memory hygiene, zeroization, keyring ACL) → [Key Security](key-security). +- Per-domain DKIM + outbound-mail provenance → [Email System](email-system) §Security. +- Per-user isolation on shared cloud buckets → [Tag-Based Access](tag-based-access) §Security properties. +- JWT format, claim semantics, and consumer-trust-policy patterns → [OIDC Federation](oidc-federation) + [Tag-Based Access](tag-based-access). + +--- + +## 8. References ### Spec documents diff --git a/wiki/key-security.md b/wiki/key-security.md index 303b617..d134175 100644 --- a/wiki/key-security.md +++ b/wiki/key-security.md @@ -454,7 +454,40 @@ See `docs/spec/plans/development-stages.md` Stage 8 section for the full deliver --- -## 10. What was broken in the manual-test doc +## 10. Server-side trust anchors and URL-hijack defense + +This doc focuses on **client-side** credential storage: keychain vs file, memory hygiene, the daemon's credential lifecycle. It does not cover the **server-side** trust anchors our architecture introduced for Stage 6/7 (OIDC federation, DKIM, per-user PrincipalTag). Those are documented authoritatively in [Blockchain TEE Architecture §7 — Security model: assumptions and attacker surface](blockchain-tee-architecture#7-security-model-assumptions-and-attacker-surface). + +### What's covered there that matters to a client-security reader + +- **Four architectural rules** and what each rule actually defends against (bearer theft, TEE compromise, chain attack, OIDC URL hijack, etc.). +- **Attacker-surface matrix by attack class** — columns for what the attacker needs to achieve, net capability without mitigation, and the mitigation we ship. +- **The "total compromise" disaster-recovery case** for TEE-extraction scenarios. +- **Routine key-rotation procedures** for the three rotation paths (OIDC-issuer, session-JWT, MRSIGNER) — all kept cheap under HDKD (gap §2) + the two new pallets (gap §8, §9). + +### New threat class introduced by Stage 7 OIDC federation + +**OIDC URL hijack.** `https://oidc.agentkeys.dev` is a public HTTPS endpoint serving our JWKS. Stage 7's cryptographic trust anchor is URL + TLS + JWKS signature. Attackers who compromise DNS / CA / hosting / deploy pipeline can replace the JWKS and mint JWTs that downstream clouds (AWS / GCP / Ali) accept. + +- **Baseline hardening in Stage 7 (no blockchain):** AWS thumbprint pinning, CAA DNS records, DNSSEC where supported, 5-min JWT TTL, short `Cache-Control` on JWKS. These reduce the attack surface but don't close it. +- **Chain-anchored defense in Stage 7b:** `pallet-oidc-pubkeys` + off-chain watchdog + daemon-side dual-verify for AgentKeys-owned accounts. Detection + auto-revocation in 30–60 s. Full spec in [`docs/spec/heima-gaps-vs-desired-architecture.md`](../docs/spec/heima-gaps-vs-desired-architecture.md) §8. +- **TEE-hosted OIDC endpoint (future work):** defers past v0.1; closes the hole on foreign clouds too. Tracked in [`docs/spec/post-v0.1-future-work.md`](../docs/spec/post-v0.1-future-work.md) §2.1. + +### How this doc's client-side model interacts with the server-side model + +Client-side (what this doc covers) and server-side (blockchain-tee-architecture §7) defenses are additive; neither replaces the other: + +- Client-side keychain / memory hygiene defends bearer-token leakage on a user's machine. +- Server-side OIDC / PrincipalTag defends against a compromised client failing into another user's data or privilege. +- **If both hold**, user-A compromise bounds to user-A's 30-day blast radius, and even then only against operations user-A was grant-authorized for. +- **If client-side breaks** (bearer stolen), server-side still enforces per-user isolation at the cloud layer. +- **If server-side breaks** (TEE compromise), client-side keychain is irrelevant — the attacker has signing authority. + +The two models are designed against **different adversaries** — client-side against local malware and opportunistic attackers on the user's device; server-side against infrastructure attackers with PKI / cloud / deploy-pipeline reach. Shipping both is the whole story. + +--- + +## 11. What was broken in the manual-test doc Two bugs in `docs/manual-test-stage4.md` found during this investigation: @@ -468,7 +501,7 @@ Both should be fixed together. The right fix is to add `agentkeys whoami` (see h --- -## 11. References +## 12. References ### Spec documents