Skip to content

fix(cli): #37 — real macOS LAContext biometric gate via objc2 FFI#38

Open
hanwencheng wants to merge 1 commit intomainfrom
fix/issue-37
Open

fix(cli): #37 — real macOS LAContext biometric gate via objc2 FFI#38
hanwencheng wants to merge 1 commit intomainfrom
fix/issue-37

Conversation

@hanwencheng
Copy link
Copy Markdown
Member

Summary

Wires real Touch ID / Face ID on macOS via LAContext.evaluatePolicy through the objc2 + block2 FFI stack. Supersedes PR #27, which landed a stub that logged a prompt and returned Ok(()) on macOS.

Closes #37.

Design — trait seam

biometric/
├── mod.rs         BiometricBackend trait + require_biometric() entry point
├── error.rs       typed BiometricError enum (all LAError codes mapped)
├── logic.rs       pure functions: parse_la_error, redact_prompt_reason
├── lacontext.rs   macOS backend — real LAContext.evaluatePolicy FFI
└── stdin.rs       non-macOS backend — stdin y/N fallback
  • AGENTKEYS_BIOMETRIC=on activates the gate; default is bypass (preserves existing CLI behavior for scripts / CI).
  • AGENTKEYS_ALLOW_NO_BIOMETRIC=1 (non-macOS only) skips the stdin TTY guard.
  • cmd_approve, cmd_revoke, cmd_teardown call require_biometric with reason strings scrubbed via redact_prompt_reason — long opaque tokens get replaced with <redacted> so revoke <session-token> can't leak the token to terminal scrollback (the critical P2 codex flagged on PR fix(cli): #11 biometric gate for high-security master CLI actions (macOS) #27).

unsafe inventory

~4 blocks, each 2-5 lines, all annotated with // SAFETY: comments:

  1. LAContext::new() — all objc2 message-sends are unsafe
  2. Synchronous canEvaluatePolicy_error capability check
  3. *mut NSError dereference inside the completion block (null-checked defensively even though Apple's contract guarantees non-null on failure)
  4. evaluatePolicy_localizedReason_reply async method send

Deadlock / leak protections:

  • 60-second recv_timeout on the channel — CLI can never hang forever
  • No Retained<LAContext> captured inside the completion block (avoids retain cycle)
  • Single-shot: each authenticate call builds its own LAContext

Tests — 4 layers

L1 — Pure logic (22 tests, runs on every platform)

  • parse_la_error — one test per documented LAError code (-1 through -10)
  • redact_prompt_reason — 0x-address preservation, long-token stripping, idempotence
  • biometric_is_enabled — env-var parsing (on / 1 / true / unset / off)

L2 — FFI boundary (2 tests, #[cfg(target_os = "macos")])

  • la_context_constructs_and_drops — catches dylib load / linker issues
  • can_evaluate_policy_is_synchronous_on_ci_runner — smoke-test for FFI wiring

L3 — Behavioral via MockBackend

L4 — Manual QA (docs/manual-test-issue-37.md)

7 scenarios requiring real macOS Touch ID hardware — cannot be automated:

  1. Touch ID succeeds → action proceeds
  2. User cancels → action aborts with clean message
  3. Biometry lockout → passcode fallback
  4. Device without biometry sensor
  5. AGENTKEYS_BIOMETRIC=off escape hatch
  6. Token-leak regression: revoke <raw-token> must NOT echo the token to stderr / scrollback
  7. Non-macOS stdin fallback

Dependencies

Gated with [target.'cfg(target_os = "macos")'.dependencies] — zero cost on Linux / Windows builds:

objc2 = "0.6"
objc2-foundation = "0.3"
objc2-local-authentication = "0.3"
block2 = "0.6"

Test plan

  • cargo test -p agentkeys-cli biometric -- --test-threads=1 → 22 passed (L1 + L3 + L2 on macOS)
  • cargo test -p agentkeys-cli -- --test-threads=1 → 25 existing CLI integration tests still green with gate default-off
  • cargo check -p agentkeys-cli → clean compile on macOS with full FFI stack linked
  • Manual QA scenarios 1-7 from docs/manual-test-issue-37.md (requires reviewer with Touch ID hardware)

Issue

Closes #37. Supersedes #27 — PR #27 should be closed when this merges (the trait-seam design replaces its biometric.rs entirely).

Supersedes PR #27's stub. Wires actual Touch ID / Face ID enforcement on
macOS through `LAContext.evaluatePolicy`, with a trait-seam design so
non-macOS platforms, tests, and the env-opt-out path are all first-class.

## Design

```
biometric/
├── mod.rs         trait BiometricBackend + require_biometric() entry point
├── error.rs       typed BiometricError enum (all LAError codes mapped)
├── logic.rs       pure functions: parse_la_error, redact_prompt_reason
├── lacontext.rs   macOS backend — real LAContext.evaluatePolicy FFI
└── stdin.rs       non-macOS backend — stdin y/N fallback
```

- `AGENTKEYS_BIOMETRIC=on` activates the gate; default is bypass (preserves
  existing CLI behavior for scripts / CI).
- `AGENTKEYS_ALLOW_NO_BIOMETRIC=1` (non-macOS) skips the stdin TTY guard.
- `cmd_approve`, `cmd_revoke`, `cmd_teardown` call `require_biometric`
  with reason strings that are scrubbed via `redact_prompt_reason` — long
  opaque tokens get replaced with `<redacted>` so `revoke <session-token>`
  can't leak the token to terminal scrollback (codex PR #27 P2).

## unsafe inventory (~4 blocks, all with SAFETY comments)

1. `LAContext::new()` — all objc2 message-sends are unsafe
2. `canEvaluatePolicy_error` synchronous call
3. `*mut NSError` dereference inside the completion block (null-checked
   defensively even though Apple's contract guarantees non-null on failure)
4. `evaluatePolicy_localizedReason_reply` async method send

Deadlock / leak protections:
- 60-second `recv_timeout` on the channel (CLI can never hang forever)
- No `Retained<LAContext>` captured inside the completion block
  (avoids retain cycle)
- Single-shot: each `authenticate` call builds its own LAContext

## Tests — 4 layers

**L1 — Pure logic (22 tests, runs everywhere):**
- `parse_la_error` one test per documented NSError code
- `redact_prompt_reason` preservation of 0x addresses, stripping of long
  tokens, idempotence
- `biometric_is_enabled` env-var parsing

**L2 — FFI boundary (2 tests, macOS only):**
- `la_context_constructs_and_drops` — catches dylib load / linker issues
- `can_evaluate_policy_is_synchronous_on_ci_runner` — smoke-test that
  the FFI wiring doesn't crash when called in a test context

**L3 — Behavioral via MockBackend:**
- `mock_backend_receives_redacted_reason` — critical regression for the
  P2 token-leak finding; asserts raw tokens never reach the backend
- `mock_backend_returns_scripted_error` — scripted-response validation

**L4 — Manual QA (`docs/manual-test-issue-37.md`):**
7 scenarios including Touch ID success/cancel, lockout → passcode fallback,
device without biometry, env opt-out, token-leak regression, non-macOS
stdin fallback.

## Dependencies

macOS only (gated `[target.'cfg(target_os = \"macos\")'.dependencies]`):
- `objc2 = \"0.6\"`
- `objc2-foundation = \"0.3\"`
- `objc2-local-authentication = \"0.3\"`
- `block2 = \"0.6\"`

Zero cost on Linux/Windows — objc2 + block2 don't enter the build graph.

Test: cargo test -p agentkeys-cli -- --test-threads=1
biometric: 22 passed; cli: 25 passed

Closes #37. Supersedes PR #27 (which should be closed on merge).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hanwencheng
Copy link
Copy Markdown
Member Author

this also close #11

hanwencheng added a commit that referenced this pull request Apr 15, 2026
Augments the auto-generated Claude Code + Claude Code Review workflows
with context from 15+ PR review cycles in this repo so the Action
produces findings consistent with recent codex iterations instead of
generic Rust advice.

## claude-code-review.yml

- Scope `on.pull_request.paths` to `crates/**`, `docs/**`, `wiki/**`,
  workflows, Cargo.toml, CLAUDE.md, and harness/. Skips cheap Cargo.lock
  churn.
- `fetch-depth: 0` so Claude can inspect `git log` / `git blame` during
  review (useful for "this finding predates the PR" arguments).
- `dtolnay/rust-toolchain@stable` + `Swatinem/rust-cache@v2` so every
  `cargo check` / `cargo test -p <crate>` in-session runs fast.
- Custom prompt injects:
  - crate names (agentkeys-types, agentkeys-core, etc)
  - pointer to CLAUDE.md for architecture + mock-server design principles
  - pointer to the new .github/REVIEW_GUIDELINES.md for agentkeys-specific
    review patterns
  - `--test-threads=1` requirement (tests mutate shared HOME/keyring)
  - the 8-pattern checklist (audit-log DENIED rows, URL-encoding via
    reqwest .query(), session-token redaction, case-insensitive wallet
    comparison, 30-day TTL, synchronous keychain ops, path-traversal
    guards, cross-wallet credential safety)
- `claude_args --allowed-tools` whitelist for cargo/git/gh so the
  Action can actually run the cargo commands the prompt tells it to.

## claude.yml (@claude mentions)

- Same Rust toolchain + cache setup so `@claude run tests` /
  `@claude check clippy` requests don't pay cold-compile cost.
- `fetch-depth: 0` for git-history tools.
- Same `claude_args --allowed-tools` whitelist plus `gh pr comment:*` /
  `gh pr edit:*` so @claude can update PR bodies and comment back with
  findings.

## .github/REVIEW_GUIDELINES.md (new)

Single source of truth for agentkeys review patterns, extracted from
PRs #18-#38 (fix/issue-10 through fix/issue-37). Documents:

- Test constraints (`--test-threads=1`, per-crate targeting)
- 10 canonical bug patterns that codex has flagged repeatedly
- Architectural invariants (master→agent single-hop; no users yet)
- Scope-control guidance (no speculative refactors, no backwards-compat
  shims pre-launch)
- Policy for codex-vs-claude disagreements

Each pattern has a PR/issue reference so reviewers (and future Claude
runs) can trace why the rule exists.
hanwencheng added a commit that referenced this pull request Apr 15, 2026
* "Claude PR Assistant workflow"

* "Claude Code Review workflow"

* ci: wire agentkeys-specific context into Claude Code workflows

Augments the auto-generated Claude Code + Claude Code Review workflows
with context from 15+ PR review cycles in this repo so the Action
produces findings consistent with recent codex iterations instead of
generic Rust advice.

## claude-code-review.yml

- Scope `on.pull_request.paths` to `crates/**`, `docs/**`, `wiki/**`,
  workflows, Cargo.toml, CLAUDE.md, and harness/. Skips cheap Cargo.lock
  churn.
- `fetch-depth: 0` so Claude can inspect `git log` / `git blame` during
  review (useful for "this finding predates the PR" arguments).
- `dtolnay/rust-toolchain@stable` + `Swatinem/rust-cache@v2` so every
  `cargo check` / `cargo test -p <crate>` in-session runs fast.
- Custom prompt injects:
  - crate names (agentkeys-types, agentkeys-core, etc)
  - pointer to CLAUDE.md for architecture + mock-server design principles
  - pointer to the new .github/REVIEW_GUIDELINES.md for agentkeys-specific
    review patterns
  - `--test-threads=1` requirement (tests mutate shared HOME/keyring)
  - the 8-pattern checklist (audit-log DENIED rows, URL-encoding via
    reqwest .query(), session-token redaction, case-insensitive wallet
    comparison, 30-day TTL, synchronous keychain ops, path-traversal
    guards, cross-wallet credential safety)
- `claude_args --allowed-tools` whitelist for cargo/git/gh so the
  Action can actually run the cargo commands the prompt tells it to.

## claude.yml (@claude mentions)

- Same Rust toolchain + cache setup so `@claude run tests` /
  `@claude check clippy` requests don't pay cold-compile cost.
- `fetch-depth: 0` for git-history tools.
- Same `claude_args --allowed-tools` whitelist plus `gh pr comment:*` /
  `gh pr edit:*` so @claude can update PR bodies and comment back with
  findings.

## .github/REVIEW_GUIDELINES.md (new)

Single source of truth for agentkeys review patterns, extracted from
PRs #18-#38 (fix/issue-10 through fix/issue-37). Documents:

- Test constraints (`--test-threads=1`, per-crate targeting)
- 10 canonical bug patterns that codex has flagged repeatedly
- Architectural invariants (master→agent single-hop; no users yet)
- Scope-control guidance (no speculative refactors, no backwards-compat
  shims pre-launch)
- Policy for codex-vs-claude disagreements

Each pattern has a PR/issue reference so reviewers (and future Claude
runs) can trace why the rule exists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Biometric gate: wire real macOS LAContext.evaluatePolicy (PR #27 follow-up)

1 participant