Skip to content

v0.1: Pattern 4 audit submission — TEE-as-paymaster per-read sponsored audit #5

@hanwencheng

Description

@hanwencheng

Summary

Design and implement the v0.1 audit submission pattern for AgentKeys on Heima. The naive "cold-first-read" approach (CLI submits audit extrinsic, waits for block inclusion before returning the credential) adds ~6 seconds to the first read of every session, which is unacceptable for interactive use and unworkable for unattended agents. This issue tracks Pattern 4: TEE-as-paymaster per-read sponsored audit — the chosen design that gets first-read latency down to ~50ms while preserving the tamper-evident public audit property that is AgentKeys's core differentiator against 1Password.

Full design notes: docs/spec/plans/development-stages.md — Stage 9 section
Background investigation: wiki/key-security.md

Problem

Per docs/spec/heima-cli-exploration.md:85, every credential read is a Heima extrinsic signed by the agent's ephemeral session key, emitting an on-chain audit event visible on the block explorer. This is the "tamper-evident public audit log" security property — it is what AgentKeys offers that 1Password structurally cannot.

The cost is latency. docs/spec/heima-cli-exploration.md:116:

Latency: every read is at minimum a chain RTT (~6s block time on Heima) unless we add an off-chain fast-path. 1Password is sub-100ms via Connect.

Naive cold-first-read:

read_credential
  ↓
CLI signs extrinsic with session key
  ↓
CLI submits via wss RPC
  ↓
──── ~6s ────  wait for block inclusion
  ↓
TEE decrypts credential
  ↓
Extrinsic confirms, credential returned

~6s is fine for a one-off interactive command. ~6s per read for an agent fetching 30 credentials over a 2-hour task is 3 minutes of pure waiting. Cron jobs and CI runners are not interactively watching, but they still have wall-clock budgets they need to meet. First-read latency has to go down.

Patterns considered

Pattern First-read latency Audit on chain Chain fees Complexity
Cold-first-read (naive) ~6s Synchronous (strongest) 1 extrinsic per read Simplest
Pattern 1: TEE-batched async audit ~50ms Async, ~10s staleness (batched) 1 extrinsic per 32 reads Medium (batcher + top-up pool + TEE substrate account)
Pattern 2: Merkle-committed TEE log ~50ms Async, per-root commitment Minimal (1 hash per batch) Medium (Merkle log + inclusion proofs)
Pattern 3: CLI fire-and-forget ~50ms Best-effort, client-dependent 1 extrinsic per read Simple but insecure
Pattern 4: TEE-as-paymaster per-read (chosen) ~50ms Async, ~6s (next block) 1 extrinsic per read, paymaster funded Medium (paymaster + rate limiter)

Pattern 4: chosen design

CLI signs read_credential extrinsic with session key
  ↓
Submits to TEE over wss RPC
  ↓
TEE verifies session signature + scope (~10ms)
  ↓
TEE decrypts credential (~5ms)
  ↓
TEE returns credential to CLI           ← user sees this (~50ms total)
  ↓
   ──── decoupled hereafter ────
  ↓
TEE builds audit extrinsic, signs as    ← uses the user's REAL wallet key
the user's wallet (TEE-held per         ← no separate TEE operational account
pallet-bitacross pattern)               ← on-chain event attributed to user wallet
  ↓
TEE submits via paymaster (Option A)
  ↓
──── ~6s ──── audit extrinsic confirms on-chain
  ↓
Audit event visible on block explorer

The key architectural move: signer and payer are decoupled. The audit extrinsic is signed by the user's wallet (which Heima already holds in the TEE per pallet-bitacross pattern — docs/spec/1-step-analysis.md:88) so the on-chain event correctly attributes the read. But the fees come from a paymaster — the user has no top-up pool to manage, and there is no new fee primitive to implement.

This is the meta-transaction pattern (EIP-2771 on Ethereum, custom signed extension on Substrate), applied specifically to audit submission.

Fee funding: Option A (AgentKeys operators subsidize)

Three options were considered for who pays the paymaster:

Option A: AgentKeys operators fund a treasury account (CHOSEN FOR v0.1)

  • AgentKeys deploys a Substrate account funded from operator treasury
  • Paymaster pays audit fees from this account
  • Cost grows linearly with usage × reads/user
  • Sustainable via a per-user fee structure at deployment time
  • Requires no Heima runtime changes — works on any Substrate chain
  • Matches the hosted AgentKeys service model

Why chosen: ships today with no Heima-side work, matches the planned hosted service pricing model, and is the easiest to operate and audit.

Option B: Heima protocol subsidizes as "free calls" (FILED for future reconsideration)

  • Runtime adds a new primitive: TEE-originated audit extrinsics consume no fees
  • Cost borne by validators as part of base chain operation
  • Most elegant architecturally — zero per-read cost to anyone
  • Blocked on Heima runtime changes: requires a new pallet primitive for free TEE-originated calls
  • Revisit once Kai confirms whether this is in scope for the AgentKeys pallet integration (see docs/spec/heima-open-questions.md)

Option C: User wallet pays from its existing USDC balance (FILED for future reconsideration)

  • TEE signs audit extrinsic with user's wallet; fees debited from the wallet's USDC balance (same balance that holds x402 funds)
  • Self-scaling, fair, no new treasury
  • Rejected for v0.1 default because it mixes "wallet pays gas" with "wallet is user's identity" roles and creates confusing error UX when the balance runs low
  • Could be offered as an opt-in mode for self-hosted deployments where users prefer to pay their own audit fees directly. Appropriate future work.

Abuse defense: TEE-side per-session read rate limit

Pattern 4's paymaster funding model is vulnerable to DoS: without rate limiting, an abusive session could burn through the treasury in seconds. The rate limit lives at the credential-read layer, not the audit-submission layer, so it defends everything downstream at once.

Full design in the rate-limit issue: #. Summary: default 100 reads/minute/session, token-bucket algorithm, configurable per-session at creation, enforced by the TEE in v0.1 (and the mock backend in v0).

The rate limit is a prerequisite for Pattern 4 to safely deploy. Do not merge Pattern 4 without the rate limit already in place.

Deliverables

Heima TEE worker (v0.1)

  • Paymaster-funded audit submission path. TEE builds the audit extrinsic as signed_by(user_wallet), but submission uses a paymaster account funded by AgentKeys operators. Requires coordination with Kai on the exact Substrate extension pattern (probably a custom SignedExtension that overrides the fee-paying logic).
  • Decoupled serve/audit code path. read_credential returns the plaintext to the caller immediately after scope verification; audit submission runs on a separate task that does not block the response.
  • Audit submission failure handling. Needs explicit design: retry + backoff, pending-audit queue in TEE memory, circuit-break reads from affected sessions, local-log-and-flush-later. Pick one and implement. (Deferred decision in Stage 9 notes.)
  • Audit event format. Single read_credential event per audit extrinsic, with fields: wallet, agent_id, service, timestamp, block_number, session_id, result (success/denied/rate_limited). Visible in block explorer and indexable by Subsquid.
  • Paymaster treasury account setup. A separate Substrate account funded by AgentKeys operators. Needs ops tooling to monitor balance, top up, and alert on drain.

AgentKeys-core (trait abstraction)

  • CredentialBackend trait stays unchanged — Pattern 4 is an implementation detail of HeimaBackend, not of the trait itself.
  • Add a BackendCapabilities struct that the backend can report to the CLI: { supports_sponsored_audit: bool, audit_on_chain: bool, expected_read_latency_ms: u32 }. CLI uses this to decide whether to show a "first read will be slower" hint (cold-first-read) or not (Pattern 4).

CLI

  • No code change required for the happy path — the CLI calls backend.read_credential(...) and gets the credential back. Pattern 4 is transparent from the CLI's perspective.
  • Add an optional --sync-audit flag for users who explicitly want cold-first-read semantics (strong audit guarantee at the cost of latency). Default off. Nice-to-have, not required.
  • Surface rate-limit errors clearly (covered in #).

Operational

  • Dashboards for the paymaster treasury: balance, burn rate, rejected-for-insufficient-funds count, per-user audit submission count.
  • Alerting: balance drops below N days of projected spend.
  • Rate-limit-adjacent: per-user audit-fee budget cap (filed under Stage 9 deferred decisions).

Documentation

  • Update wiki/key-security.md with Pattern 4 as the default v0.1 audit model and the rationale for choosing it over cold-first-read and Pattern 1.
  • Update the docs/spec/plans/development-stages.md Stage 9 section as Pattern 4 implementation proceeds — convert from design notes to concrete deliverables as each piece lands.
  • Add a latency budget table to wiki/key-security.md showing expected first-read and warm-read latencies under each pattern.

Deferred decisions (to resolve before implementation starts)

From the Stage 9 notes:

  1. Cross-pattern mixing. Offer Pattern 4 (default) with an opt-out --sync-audit flag for users who want hard synchronous audit guarantees? Leaning yes.
  2. Paymaster DoS protection beyond rate limiting. Add a per-user audit-fee budget cap on top of the read rate limit? Leaning yes for hosted AgentKeys, no for self-hosted.
  3. Audit submission failure handling strategy. Retry + backoff, pending queue, circuit-break, local log + flush later — which one (or combination) is the right default? Needs explicit design document before implementation.

Acceptance criteria

  • First read_credential completes in < 100ms under normal conditions (measured end-to-end from CLI submit to credential return)
  • Audit event appears on block explorer within ~10s of the read (1 block + transport)
  • Paymaster treasury covers all audit submissions without user-visible fee prompts
  • Rate limit (issue #) correctly rejects excess reads with a structured error
  • On paymaster failure: audit events are not silently lost — either retried successfully or surfaced as a failure the TEE operator can investigate
  • All existing credential-read tests pass under the Pattern 4 code path

Effort estimate

TBD pending Kai's input on Heima-side feasibility. Ballpark:

  • Paymaster + signed-extension pattern: 3-5 days (if Heima supports it), much more if we have to build the primitive ourselves
  • Decoupled serve/audit in the TEE worker: 2-3 days
  • Failure handling design + implementation: 2-3 days
  • Ops tooling (dashboards, alerts): 1-2 days
  • Tests + E2E: 2-3 days

Total: ~10-16 days, off the critical path, slotted into v0.1 migration work.

Dependencies

  • Must land before Pattern 4: issue # (per-session read rate limit)
  • Blocked on: Kai review of the paymaster pattern against Heima's current TEE worker architecture
  • Blocks: v0.1 first-read latency SLA

References

  • docs/spec/plans/development-stages.md Stage 9 — design decisions
  • wiki/key-security.md — full investigation notes and pattern comparison
  • docs/spec/heima-cli-exploration.md:85, :116 — the audit-as-extrinsic design and the latency acknowledgement
  • docs/spec/1-step-analysis.md:88 — pallet-bitacross pattern (TEE-held wallet keys, enables Pattern 4's signer/payer decoupling)
  • docs/spec/heima-open-questions.md — open questions for Kai, including the paymaster feasibility question
  • Issue Stage 8: Production hardening — daemon memory hygiene + CLI defensive features #3 — Stage 8 production hardening (where the rate limit belongs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestv0.1development plan for blockchain backend integration

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions