Stage 8: Production hardening — daemon memory hygiene + CLI defensive features

## Summary

During Stage 4 manual testing, an investigation into macOS Keychain prompt behavior surfaced a broader gap in the security plan: **Stage 3 covers kernel-level daemon hardening (memfd_secret, mlock, seccomp, capability drop) but does not cover process-internal memory hygiene** for credential bytes that flow through the daemon between backend fetch and agent delivery. Several CLI defensive features and storage-level options are also missing from any current stage.

This issue tracks the **Stage 8: Production Hardening** work added to `docs/spec/plans/development-stages.md`. It is post-MVP and off the critical path, but should land before broad deployment.

Full investigation notes: [`wiki/key-security.md`](../blob/main/wiki/key-security.md)
Full plan: [`docs/spec/plans/development-stages.md`](../blob/main/docs/spec/plans/development-stages.md) — Stage 8 section

## Background: the two layers of daemon hardening

| Layer | Defends against | Where in the plan |
|---|---|---|
| **Kernel hardening** (memfd_secret, mlock2, seccomp, prctl, cap drop) | External probes: ptrace, `/proc/pid/mem`, swap, core dumps, co-tenant scraping | **Already in Stage 3** ✅ |
| **Process-internal hygiene** (zeroize, fd-based delivery, idle eviction, lifecycle audit) | Internal bugs: credentials lingering in heap, copies in long-lived address spaces, intermediate buffers nobody zeroes | **Not in any stage today** ❌ — this issue |

Both layers are necessary. Stage 3 alone leaks plaintext into freed-but-unscrubbed heap pages between requests. Stage 8 alone leaks plaintext to ptrace if seccomp is bypassed.

## Real exposure window (read this first)

Where the credential actually spends its time:

```
backend         daemon                    agent
─────────────────────────────────────────────────────────────────
                  fetch ────►
       ◄──── plaintext (~50ms)
                  serialize MCP (~1ms)
                  send over socket ────►
                                    agent decodes
                                    agent uses credential for
                                    the entire task (minutes–hours)
                                    agent exits
─────────────────────────────────────────────────────────────────
       DAEMON WINDOW: ~50ms     AGENT WINDOW: minutes to hours
```

The credential's dominant residence is in **agent memory** after delivery, not in daemon memory before delivery. The daemon window is ~50ms; the agent window is 1000x to 100,000x longer, in the agent process's regular heap with no zeroize and no scrubbing. **Daemon-side hardening is necessary but not sufficient** — even with perfect daemon hygiene, the credential lives in the agent's address space for the entire duration of the task.

The Stage 8 priority ranking reflects this. Priority A items shrink the agent window (or are foundational). Priority B items shrink only the daemon window — still worth doing as defense in depth, but not the dominant mitigation. **An earlier draft of this plan inverted the rankings; they were corrected during the Stage 4 review.**

## Daemon deliverables — Priority A (shrink the dominant — agent — window)

- [ ] **`zeroize` / `SecretString` wrappers** on every type that holds credential plaintext or session tokens. Touched types: `Session.token`, `CredentialBackend::read_credential` return type, MCP `get_credential` response builder, daemon-internal credential cache. **Foundational** — every other item below assumes credentials flow through these types.
- [ ] **Daemon-mediated `cmd_run` for agentkeys-managed runtimes.** Move the `cmd_run` flow from CLI to daemon for paths we ship (`agentkeys run`, MCP `agentkeys.run` tool). Daemon holds credential in `memfd_secret`, forks child, sets env in child's address space, drops parent copy before `exec`. CLI never touches plaintext. **Shrinks the agent window** by keeping the credential out of long-lived parent address space. We control both ends — no upstream cooperation needed; achievable in v0.1.
- [ ] **`memfd_secret`-via-SCM_RIGHTS delivery for `agentkeys.get_credential`** (agentkeys-managed runtime path). When the requesting agent is running an agentkeys-managed runtime that knows how to read a credential from a passed fd, daemon writes credential into a `memfd_secret` and sends fd via SCM_RIGHTS instead of inlining the bytes. Agent reads once, closes fd, bytes never enter regular heap. Falls back to inline bytes for runtimes that don't advertise fd support. **Shrinks the agent window** for the dominant MCP delivery path.
- [ ] **Idle credential eviction.** Configurable TTL (default 60s) wipes cached credentials even while the agent is still running. Closes the case where an agent fetches a credential, idles for a long time, then resumes — instead of holding the credential the whole time, the daemon re-fetches.
- [ ] **Daemon-internal audit trail.** Log every fetch / deliver / drop / evict event with timestamp, agent_id, service. Surfaces compromise patterns the backend audit log alone cannot see. Foundational for detection regardless of which mitigations are in place.

## Daemon deliverables — Priority B (shrink the daemon window; defensive depth)

These items shrink only the ~50ms daemon window. Worth doing because compromise of the long-lived daemon process is a real threat — the daemon holds the master session, scope information, and is the privileged process inside the sandbox — and per-call drop removes the "retroactive enumeration" attack where a compromised daemon hands over every credential it has ever fetched. But the marginal security win is small relative to Priority A.

- [ ] **Drop credential from daemon memory immediately after MCP delivery.** No caching unless explicitly configured per-service. *Demoted from Priority A in the Stage 4 review:* this only defends against daemon compromise + retroactive enumeration, not against the dominant agent-side exposure window.
- [ ] **`setrlimit(RLIMIT_CORE, 0)`** at daemon startup. Belt-and-suspenders against `prctl(PR_SET_DUMPABLE, 0)` from Stage 3.
- [ ] **`pkey_alloc` + `pkey_mprotect`** per-credential page protection (Linux 4.9+, x86 only). Marks credential pages `PROT_NONE` except during active read.
- [ ] **Secure-scrubbing global allocator** (`mimalloc` secure mode or `scudo`). Zeros heap allocations on free, adds guard pages.
- [ ] **`ptrace_scope` runtime check** at startup — refuse to launch (or warn loudly) if `kernel.yama.ptrace_scope < 1`.
- [ ] **CI verification of binary hardening** — `checksec` on `cargo build --release` artifacts to confirm PIE, RELRO (full), stack canaries, NX bits.
- [ ] **Anti-debugger check** at startup via `TracerPid` in `/proc/self/status`. Refuse to start if a debugger is attached unless `--allow-debugger` is set.

## Daemon deliverables — Priority C (broader runtime cooperation, v0.2+)

- [ ] **Extend `memfd_secret`-via-SCM_RIGHTS delivery to non-agentkeys-managed agent runtimes.** Most upstream LLM frameworks expect a `String` env var, not an fd. Generalizing the Priority A protection to arbitrary runtimes requires upstream changes. Until those land, Priority A covers only runtimes we ship.
- [ ] **Daemon-mediated `cmd_run` for arbitrary parent processes.** Priority A covers paths we control. Generalizing the daemon-mediated fork-and-drop pattern to arbitrary parents is a v0.2+ item.

## CLI deliverables

- [ ] **`agentkeys whoami`** subcommand. Print non-sensitive session metadata (wallet, scope, expiry). Never print `session.token`. Replaces `ak-keychain-show | jq` in the manual test. ~15 LOC.
- [ ] **Idempotent `agentkeys init`.** If a valid session exists, print `\"Already initialized as <wallet>\"` and exit. `--force` overrides. Eliminates the find-then-update double-prompt path on macOS.
- [ ] **`zeroize` wrapping** for credential strings in `cmd_read` and `cmd_run`.
- [ ] **`prctl(PR_SET_DUMPABLE, 0)` + `setrlimit(RLIMIT_CORE, 0)`** on CLI startup (Linux only).
- [ ] **Wire CLI `read` to honor `AuthRequestType::HighValueRelease`.** Sensitive credentials require `agentkeys approve` before release.

## Optional storage hardening

- [ ] **Touch-ID-gate the master session on macOS** via `kSecAttrAccessControl = kSecAccessControlUserPresence`. Master session only — child sessions stay silent. macOS only.
- [ ] **DEK + encrypted file pattern.** Cross-platform. Keyring holds an immutable 32-byte data encryption key, session JSON encrypted at `~/.agentkeys/session.enc` (XChaCha20-Poly1305). Makes `security find-generic-password -w` return useless random bytes.

## Acceptance criteria

- All Priority A daemon items implemented and tested (these are the ones that matter most — they shrink the agent window)
- All CLI deliverables implemented and tested
- Unit tests pass (see Stage 8 test matrix in `development-stages.md`)
- Manual review confirms credential bytes do not survive in agent memory beyond the agent's actual use of them (when running an agentkeys-managed runtime)
- Reviewer E2E checklist completes
- Priority B may slip to a follow-up issue if needed
- Priority C is explicitly tracked separately for v0.2+

## Effort estimate

4-6 days. Off the critical path. Recommended sequencing: ship v0 from Stage 7, then immediately roll into Stage 8 before broad deployment.

## Why this is post-MVP

Stages 0-7 ship a working v0 system suitable for demo and early adopters. Stage 8 hardens it for broader deployment where the daemon may run on hosts with permissive ptrace scope, where credentials may be sensitive enough to warrant high-value release gating, and where memory hygiene matters for compliance / threat-model coverage. None of Stage 8 is required to demonstrate the product; all of it is required to claim production readiness.

## References

- `wiki/key-security.md` — full investigation notes, two-tier storage model, daemon credential lifecycle, "real exposure window" diagram, why both hardening layers are necessary
- `docs/spec/plans/development-stages.md` Stage 8 — full deliverable list, unit test matrix, reviewer E2E checklist
- `docs/spec/architecture.md:70` — original Stage 3 daemon hardening requirements
- `docs/spec/tech-brief.md:80` — TEE shielding key model (the layer above which daemon hardening operates)
- `docs/spec/credential-backend-interface.md` — `AuthRequestType::HighValueRelease` definition

## Changelog

- **2026-04-11** — Issue created with original Stage 8 priority ranking
- **2026-04-11** — Priority ranking corrected during Stage 4 review. "Drop after delivery" demoted from Priority A to Priority B (defends only the ~50ms daemon window, not the dominant agent window). Daemon-mediated `cmd_run` and `memfd_secret`-via-SCM_RIGHTS delivery promoted from Priority C to Priority A for agentkeys-managed runtimes (the paths we control). Real-exposure-window framing added to make the rationale explicit.

Layer	Defends against	Where in the plan
Kernel hardening (memfd_secret, mlock2, seccomp, prctl, cap drop)	External probes: ptrace, `/proc/pid/mem`, swap, core dumps, co-tenant scraping	Already in Stage 3 ✅
Process-internal hygiene (zeroize, fd-based delivery, idle eviction, lifecycle audit)	Internal bugs: credentials lingering in heap, copies in long-lived address spaces, intermediate buffers nobody zeroes	Not in any stage today ❌ — this issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stage 8: Production hardening — daemon memory hygiene + CLI defensive features #3

Summary

Background: the two layers of daemon hardening

Real exposure window (read this first)

Daemon deliverables — Priority A (shrink the dominant — agent — window)

Daemon deliverables — Priority B (shrink the daemon window; defensive depth)

Daemon deliverables — Priority C (broader runtime cooperation, v0.2+)

CLI deliverables

Optional storage hardening

Acceptance criteria

Effort estimate

Why this is post-MVP

References

Changelog

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stage 8: Production hardening — daemon memory hygiene + CLI defensive features #3

Description

Summary

Background: the two layers of daemon hardening

Real exposure window (read this first)

Daemon deliverables — Priority A (shrink the dominant — agent — window)

Daemon deliverables — Priority B (shrink the daemon window; defensive depth)

Daemon deliverables — Priority C (broader runtime cooperation, v0.2+)

CLI deliverables

Optional storage hardening

Acceptance criteria

Effort estimate

Why this is post-MVP

References

Changelog

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions