v0.1: TEE-side per-session read rate limit (abuse defense)

## Summary

Add a per-session credential-read rate limit enforced by the TEE (v0.1) or daemon (v0 mock), with a configurable cap and a clear rate-limit error path. This is a **general abuse defense** that is independently valuable, and is a **prerequisite** for the Pattern 4 audit submission design (issue #<pattern-4>) because Pattern 4 relies on a paymaster-funded audit flow that is vulnerable to unbounded request volume without an upstream rate limiter.

## Why this is needed

Right now, there is nothing stopping a buggy or abusive agent from calling `agentkeys.get_credential` in a tight loop, thousands of times per second. In v0 this drains the backend SQLite; in v0.1 this drains the paymaster that subsidizes audit extrinsics and creates audit log spam that makes real compromise patterns impossible to spot. Neither is acceptable.

Putting the rate limit at the **credential-read layer** (not at the audit-submission layer) defends everything downstream simultaneously: if you can't do 10,000 reads/second, you can't cause 10,000 audit submissions/second, you can't exfiltrate credentials 10,000 times/second, and you can't drain the paymaster 10,000 fees/second.

This is also useful **regardless of which audit submission pattern ships**. Even under the current cold-first-read plan, rate limiting is a general DoS defense and belongs in Stage 8.

## Design

### Policy

- **Default:** 100 reads / minute / session.
- **Configurable per-session at creation.** The session-creation extrinsic takes an optional `read_rate_limit: Option<u32>` field. If unset, default applies. If set, must be ≤ a hard cap (e.g., 10,000/min) to prevent abuse of the config itself.
- **Token bucket algorithm.** Each session gets a bucket of capacity `rate_limit`, refilled linearly at `rate_limit / 60` tokens per second. Each `read_credential` consumes one token. Bucket starts full.
- **Excess reads return a structured error** that agents can handle: `{ \"code\": \"rate_limit_exceeded\", \"retry_after_secs\": <integer> }`. The `retry_after_secs` field tells the agent when the next token will be available.

### Where it lives

- **v0 (mock backend):** rate limit lives in the mock backend's `handlers::credential::read_credential` path, stored in an in-memory `HashMap<SessionToken, TokenBucket>`. SQLite-backed for persistence across server restarts is optional — in-memory is fine for the mock.
- **v0.1 (Heima TEE):** rate limit lives in the TEE worker's credential-serving path, stored in TEE-internal state. Persistence across TEE restarts is desirable but not critical for the security property — a restart that resets buckets just gives each session one \"free\" burst, which is the same as session creation. The TEE already holds per-session state; adding a bucket counter is ~5 lines.

### Telemetry

Every rate-limit rejection should emit an audit event — `rate_limit_exceeded` with `session_id`, `service`, `timestamp`, `attempted_rate`. These events go to the regular audit log path (wherever that is under the active pattern — batched, per-read, paymaster-relayed, etc.) so operators can spot abusive agents via the normal usage-query flow (`agentkeys usage`).

### Override for legitimate high-volume use

Some workloads legitimately need >100 reads/minute (e.g., an agent that makes many parallel API calls). The session-creation `read_rate_limit` field lets the master configure a higher cap for specific agents at pair time. This is a **conscious grant** by the master, not a default behavior, so abuse requires both a compromised session and a compromised session-creation flow, raising the bar.

## Deliverables

### Mock backend (v0)

- [ ] `TokenBucket` struct in `crates/agentkeys-mock-server/src/state.rs` with refill + consume methods
- [ ] Per-session bucket stored in `SharedState` alongside the SQLite handle
- [ ] Rate-limit check at the top of `handlers::credential::read_credential` before any DB work
- [ ] Structured `rate_limit_exceeded` error variant in `AppError`
- [ ] Session-creation extrinsic accepts optional `read_rate_limit: Option<u32>` field (default 100)
- [ ] Audit log write for rate-limit rejections (emit as a `rate_limit_exceeded` event, not a successful read)
- [ ] Tests:
  - [ ] `credential::rate_limit_default_100_per_minute` — 100 reads in quick succession succeed, 101st fails
  - [ ] `credential::rate_limit_refills_linearly` — after waiting 6s, 10 more reads succeed (100/min ÷ 60 = ~1.66/s, 6s = 10 tokens)
  - [ ] `credential::rate_limit_per_session_not_global` — two sessions each get their own bucket, neither affects the other
  - [ ] `credential::rate_limit_configurable_at_creation` — creating a session with `read_rate_limit: 500` allows 500 reads in a burst
  - [ ] `credential::rate_limit_emits_audit_event` — rejected read writes an audit row with action `rate_limit_exceeded`

### CLI

- [ ] `agentkeys read` and `agentkeys run` surface a clear error message when rate-limited: \`\"Error: RATE_LIMIT. Session 0x... has exceeded 100 reads/minute. Retry after <N> seconds.\"\`
- [ ] \`agentkeys run\` specifically should treat the rate-limit error as retryable (wait \`retry_after_secs\`, then retry) up to 3 attempts before giving up, since agents running long tasks can legitimately hit temporary bursts.

### Daemon (v0)

- [ ] Daemon proxies rate-limit errors from the backend to the MCP client without modification.
- [ ] Daemon-internal audit log records the rate-limit event even though the backend also records it (redundancy for detection).

### Documentation

- [ ] Update \`docs/manual-test-stage4.md\` with a rate-limit verification test
- [ ] Update \`wiki/key-security.md\` with a brief note that rate limiting is part of the security story (abuse defense layer)

## Acceptance criteria

- All mock-backend tests pass
- A burst of 101 reads from one session within a minute results in 100 successes and 1 rejection
- Two sessions with default rate limit do not affect each other
- \`agentkeys run\` tolerates a rate-limit error and retries
- \`agentkeys usage\` surfaces rate-limit events as distinguishable rows in the audit output

## Effort estimate

1-2 days. Small, self-contained, well-bounded.

## Priority

**Must-have for v0.1** because it gates Pattern 4 (sponsored audit submission). **Should-have for v0** because it closes an obvious abuse vector in the mock backend. Recommended to slot into Stage 8 as part of production hardening.

## References

- Pattern 4 design: issue #<pattern-4>
- Production hardening: issue #3 (Stage 8)
- \`docs/spec/plans/development-stages.md\` Stage 9 section — design notes on Pattern 4 and why rate limiting is a prerequisite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1: TEE-side per-session read rate limit (abuse defense) #4

Summary

Why this is needed

Design

Policy

Where it lives

Telemetry

Override for legitimate high-volume use

Deliverables

Mock backend (v0)

CLI

Daemon (v0)

Documentation

Acceptance criteria

Effort estimate

Priority

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

v0.1: TEE-side per-session read rate limit (abuse defense) #4

Description

Summary

Why this is needed

Design

Policy

Where it lives

Telemetry

Override for legitimate high-volume use

Deliverables

Mock backend (v0)

CLI

Daemon (v0)

Documentation

Acceptance criteria

Effort estimate

Priority

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions