Skip to content

Stage 8: Production hardening — daemon memory hygiene + CLI defensive features #3

@hanwencheng

Description

@hanwencheng

Summary

During Stage 4 manual testing, an investigation into macOS Keychain prompt behavior surfaced a broader gap in the security plan: Stage 3 covers kernel-level daemon hardening (memfd_secret, mlock, seccomp, capability drop) but does not cover process-internal memory hygiene for credential bytes that flow through the daemon between backend fetch and agent delivery. Several CLI defensive features and storage-level options are also missing from any current stage.

This issue tracks the Stage 8: Production Hardening work added to docs/spec/plans/development-stages.md. It is post-MVP and off the critical path, but should land before broad deployment.

Full investigation notes: wiki/key-security.md
Full plan: docs/spec/plans/development-stages.md — Stage 8 section

Background: the two layers of daemon hardening

Layer Defends against Where in the plan
Kernel hardening (memfd_secret, mlock2, seccomp, prctl, cap drop) External probes: ptrace, /proc/pid/mem, swap, core dumps, co-tenant scraping Already in Stage 3
Process-internal hygiene (zeroize, fd-based delivery, idle eviction, lifecycle audit) Internal bugs: credentials lingering in heap, copies in long-lived address spaces, intermediate buffers nobody zeroes Not in any stage today ❌ — this issue

Both layers are necessary. Stage 3 alone leaks plaintext into freed-but-unscrubbed heap pages between requests. Stage 8 alone leaks plaintext to ptrace if seccomp is bypassed.

Real exposure window (read this first)

Where the credential actually spends its time:

backend         daemon                    agent
─────────────────────────────────────────────────────────────────
                  fetch ────►
       ◄──── plaintext (~50ms)
                  serialize MCP (~1ms)
                  send over socket ────►
                                    agent decodes
                                    agent uses credential for
                                    the entire task (minutes–hours)
                                    agent exits
─────────────────────────────────────────────────────────────────
       DAEMON WINDOW: ~50ms     AGENT WINDOW: minutes to hours

The credential's dominant residence is in agent memory after delivery, not in daemon memory before delivery. The daemon window is ~50ms; the agent window is 1000x to 100,000x longer, in the agent process's regular heap with no zeroize and no scrubbing. Daemon-side hardening is necessary but not sufficient — even with perfect daemon hygiene, the credential lives in the agent's address space for the entire duration of the task.

The Stage 8 priority ranking reflects this. Priority A items shrink the agent window (or are foundational). Priority B items shrink only the daemon window — still worth doing as defense in depth, but not the dominant mitigation. An earlier draft of this plan inverted the rankings; they were corrected during the Stage 4 review.

Daemon deliverables — Priority A (shrink the dominant — agent — window)

  • zeroize / SecretString wrappers on every type that holds credential plaintext or session tokens. Touched types: Session.token, CredentialBackend::read_credential return type, MCP get_credential response builder, daemon-internal credential cache. Foundational — every other item below assumes credentials flow through these types.
  • Daemon-mediated cmd_run for agentkeys-managed runtimes. Move the cmd_run flow from CLI to daemon for paths we ship (agentkeys run, MCP agentkeys.run tool). Daemon holds credential in memfd_secret, forks child, sets env in child's address space, drops parent copy before exec. CLI never touches plaintext. Shrinks the agent window by keeping the credential out of long-lived parent address space. We control both ends — no upstream cooperation needed; achievable in v0.1.
  • memfd_secret-via-SCM_RIGHTS delivery for agentkeys.get_credential (agentkeys-managed runtime path). When the requesting agent is running an agentkeys-managed runtime that knows how to read a credential from a passed fd, daemon writes credential into a memfd_secret and sends fd via SCM_RIGHTS instead of inlining the bytes. Agent reads once, closes fd, bytes never enter regular heap. Falls back to inline bytes for runtimes that don't advertise fd support. Shrinks the agent window for the dominant MCP delivery path.
  • Idle credential eviction. Configurable TTL (default 60s) wipes cached credentials even while the agent is still running. Closes the case where an agent fetches a credential, idles for a long time, then resumes — instead of holding the credential the whole time, the daemon re-fetches.
  • Daemon-internal audit trail. Log every fetch / deliver / drop / evict event with timestamp, agent_id, service. Surfaces compromise patterns the backend audit log alone cannot see. Foundational for detection regardless of which mitigations are in place.

Daemon deliverables — Priority B (shrink the daemon window; defensive depth)

These items shrink only the ~50ms daemon window. Worth doing because compromise of the long-lived daemon process is a real threat — the daemon holds the master session, scope information, and is the privileged process inside the sandbox — and per-call drop removes the "retroactive enumeration" attack where a compromised daemon hands over every credential it has ever fetched. But the marginal security win is small relative to Priority A.

  • Drop credential from daemon memory immediately after MCP delivery. No caching unless explicitly configured per-service. Demoted from Priority A in the Stage 4 review: this only defends against daemon compromise + retroactive enumeration, not against the dominant agent-side exposure window.
  • setrlimit(RLIMIT_CORE, 0) at daemon startup. Belt-and-suspenders against prctl(PR_SET_DUMPABLE, 0) from Stage 3.
  • pkey_alloc + pkey_mprotect per-credential page protection (Linux 4.9+, x86 only). Marks credential pages PROT_NONE except during active read.
  • Secure-scrubbing global allocator (mimalloc secure mode or scudo). Zeros heap allocations on free, adds guard pages.
  • ptrace_scope runtime check at startup — refuse to launch (or warn loudly) if kernel.yama.ptrace_scope < 1.
  • CI verification of binary hardeningchecksec on cargo build --release artifacts to confirm PIE, RELRO (full), stack canaries, NX bits.
  • Anti-debugger check at startup via TracerPid in /proc/self/status. Refuse to start if a debugger is attached unless --allow-debugger is set.

Daemon deliverables — Priority C (broader runtime cooperation, v0.2+)

  • Extend memfd_secret-via-SCM_RIGHTS delivery to non-agentkeys-managed agent runtimes. Most upstream LLM frameworks expect a String env var, not an fd. Generalizing the Priority A protection to arbitrary runtimes requires upstream changes. Until those land, Priority A covers only runtimes we ship.
  • Daemon-mediated cmd_run for arbitrary parent processes. Priority A covers paths we control. Generalizing the daemon-mediated fork-and-drop pattern to arbitrary parents is a v0.2+ item.

CLI deliverables

  • agentkeys whoami subcommand. Print non-sensitive session metadata (wallet, scope, expiry). Never print session.token. Replaces ak-keychain-show | jq in the manual test. ~15 LOC.
  • Idempotent agentkeys init. If a valid session exists, print \"Already initialized as <wallet>\" and exit. --force overrides. Eliminates the find-then-update double-prompt path on macOS.
  • zeroize wrapping for credential strings in cmd_read and cmd_run.
  • prctl(PR_SET_DUMPABLE, 0) + setrlimit(RLIMIT_CORE, 0) on CLI startup (Linux only).
  • Wire CLI read to honor AuthRequestType::HighValueRelease. Sensitive credentials require agentkeys approve before release.

Optional storage hardening

  • Touch-ID-gate the master session on macOS via kSecAttrAccessControl = kSecAccessControlUserPresence. Master session only — child sessions stay silent. macOS only.
  • DEK + encrypted file pattern. Cross-platform. Keyring holds an immutable 32-byte data encryption key, session JSON encrypted at ~/.agentkeys/session.enc (XChaCha20-Poly1305). Makes security find-generic-password -w return useless random bytes.

Acceptance criteria

  • All Priority A daemon items implemented and tested (these are the ones that matter most — they shrink the agent window)
  • All CLI deliverables implemented and tested
  • Unit tests pass (see Stage 8 test matrix in development-stages.md)
  • Manual review confirms credential bytes do not survive in agent memory beyond the agent's actual use of them (when running an agentkeys-managed runtime)
  • Reviewer E2E checklist completes
  • Priority B may slip to a follow-up issue if needed
  • Priority C is explicitly tracked separately for v0.2+

Effort estimate

4-6 days. Off the critical path. Recommended sequencing: ship v0 from Stage 7, then immediately roll into Stage 8 before broad deployment.

Why this is post-MVP

Stages 0-7 ship a working v0 system suitable for demo and early adopters. Stage 8 hardens it for broader deployment where the daemon may run on hosts with permissive ptrace scope, where credentials may be sensitive enough to warrant high-value release gating, and where memory hygiene matters for compliance / threat-model coverage. None of Stage 8 is required to demonstrate the product; all of it is required to claim production readiness.

References

  • wiki/key-security.md — full investigation notes, two-tier storage model, daemon credential lifecycle, "real exposure window" diagram, why both hardening layers are necessary
  • docs/spec/plans/development-stages.md Stage 8 — full deliverable list, unit test matrix, reviewer E2E checklist
  • docs/spec/architecture.md:70 — original Stage 3 daemon hardening requirements
  • docs/spec/tech-brief.md:80 — TEE shielding key model (the layer above which daemon hardening operates)
  • docs/spec/credential-backend-interface.mdAuthRequestType::HighValueRelease definition

Changelog

  • 2026-04-11 — Issue created with original Stage 8 priority ranking
  • 2026-04-11 — Priority ranking corrected during Stage 4 review. "Drop after delivery" demoted from Priority A to Priority B (defends only the ~50ms daemon window, not the dominant agent window). Daemon-mediated cmd_run and memfd_secret-via-SCM_RIGHTS delivery promoted from Priority C to Priority A for agentkeys-managed runtimes (the paths we control). Real-exposure-window framing added to make the rationale explicit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions